How Would You Handle a $1,000,000 Coding Error?
theodp writes "The Chicago Tribune's efforts to upgrade its computer system over the weekend turned into a fiasco when the system crashed, halting all printing operations and leaving about half of the Trib's subscribers without papers. The software contained 'a coding error,' according to a spokesman who estimated the cost to resolve the problem at 'under $1 million.' Any advice for the poor schmuck who's going to get the blame?"
Check out this link. Sorry, dude. Any of us could have done it.
> How Would You Handle a $1,000,000 Coding Error?
I would have to follow Dogbert's Top Secret Management Handbook, and take full responsibility for the bungle. That way when the next job comes up two or three rungs above me, I'll be at the top of the list of people with actual experience with massive projects, and it won't matter that it was a colossal screw-up because I will have jumped two or three pay-grades. Corporate fall-guys, if they take it right, always end up better off than quiet behind the scenes types.
So my advice is that you should take full responsiblity and sharpen that resume, but be sure to make it known that you have learned from your mistakes and you worked hard to correct them. Nobody gets anywhere without making big blunders along the way. Be a good sport and you'll jump at least two pay grades for this blunder.
The dangers of knowledge trigger emotional distress in human beings.
(It pays to use Splint)
Sigs cause cancer.
... and blame it on Microsoft.
[SIG] It's like putting a moose in the blender -- a recipe for disaster!
Time for plan B
Just have each of their coders chip in a dollar, problem solved.
*ducks*
Introducing the new Occam Fusion! Now with sqrt(-1) fewer blades!
Blame it on the company not supplying enough caffine?
Anyone else think it was poor 'theodp' ??!
Well, ok so that might not fly, but hey, it works when its true if you work for a modestly forgiving employer...
;-)
Now if the cause was insufficient testing, well then QA has to answer for it.
And if there's no QA, well that's managements fault...
Now if it all comes down to dumb circumstances, it's poor planning on the papers fault for not testing themselves
That said, fess up, worse comes to worse, you now have national infamy, and any fame is good fame, right??
-- (appended to the end of comments you post, 120 chars)
I would go out, and get so absofreakinlutely drunk that I wouldn't be able to remember my middle name, let alone that I made a $1M error. And then when the lawsuits are about to go to court and I started showing signs of severe alcoholism, I would put my head inbetween my legs and kiss my ass goodbye. 'Cause man, that would really suck.
Well, you asked.
Any advice for the poor schmuck who's going to get the blame?
;)
Well my first advice is to come clean, yes I mean you theodp, I think we all know who this poor schmuck is
I stole this Sig
He should blame the requirements.
There's always a mistake in the requirements.
Don't know; Don't care; Don't ask
Where was the pre-install testing?
A good test should have identified some errors, especially if it blew up IMMEDIATELY.
That isn't a bug - its a feature!
Even heroes have the right to dream
23:44:03 up 48545 days, 6:15, 1 user, load average: 0.00, 0.00, 0.00 Blink. up 0 days, 1:00, 1 user, load average: 0.00, 0.00, 0.00 I hope they got a SS of that massive uptime.
I didn't get my paper this morning and was angry until I read this.
I'm not angry anymore, I'm sympathetic for the poor schmuck as well as all the customer service people who probably got yelled at this morning.
-- Kevin J. Rice
Unitarian Church: Freethinkers Congregate!
People, it's called QA.
Toss his newspaper subscription and egg his car. Other than that, leave the poor geek alone.
How many people here have fucked LILO into the ground the night before a java assignment on a laptop with no floppy? anyone?
yeah. i thought as much.
ZERO
Management frequently makes mistakes which cost much more. The difference is that their mistakes are not as easily identified or attributed to a single person.
The culprit should just admit it. Shit happens, it's unavoidable even if you take all precautions. Don't make the same mistake again, though.
LIMITED LIABILITY
Software provided as-is. Softare developer/company is not liable for any physical, financial, or any other loss or damage arising from use of software.
Doesn't all software come with things like this? (nevertheless, thank-goodness I'm not a software developer)
$cat
" .. I must have missed a zero somewhere ... damn I always do that!"
"Any advice for the poor schmuck who's going to get the blame?"
My advice: Prepare three envelopes
Change your name, and switch to a "skills" based resume rather than an experience based one...
Karma: 0 (But I wield a mean +10 Vorpal Apathy)
And this is why you don't use an Access database for a job like this.
-- I could tell right away that she was impressed with my HUGE Slashdot Karma.
Down, not across. (motto of alt.sysadmin.recovery referring to best method of slashing one's wrists).
Blame the users, of course.
Or The journalists that work at the outfit the link went to. Did you notice it took 3 of them to write that article? Talk about overstaffed.
With any large roll out, if only one person is at fault for a fiasco like this, then the project mas mismanaged. They should have had a plan in place to backout the change.
Well, if I was in management.. I would find the programmer responsible, and have him snipped!
Do daemons dream of electric sleep()?
Simple enough.
Take responsibility and ownership of the problem. Don't make excuses, but give real reasons.
Fix it..do whatever it takes, even if it means working over a weekend.
Write a good post mortem, explaining how th e fix is different from the original problem.
And hope to god that your management is understanding enough to keep you on.
This is comong from a guy, who in 1997 blew a $100,000 test weekend by kicking off the systems tests by loading the wrong generation of tapes.
I took the blame, and expected to lose my job. But I knew that the right thing to do was to try to recover from the problem. I stayed in the office from 1:00AM Sunday to 10:00AM Monday morning rerunning every job and report and proving out the results.
Not only did I keep my job, but I got promoted a year later. I made a name for myself that weekend....sure I could f*k up, but I work hard to keep things right for the company.
wbs.
Huh?
Where was the phased or parallel deployment?
You don't just change a system like in a weekend. There WILL be problems, so you have to have ways of dealing with it. Maybe that means flicking the switch back to the old system if it fails, or maybe it means running with degraded capacity a while, but whatever it is, it's dead-in-the-water is not your Plan B.
Forget thrust, drag, lift and weight. Airplanes fly because of money.
The foreman responsible for the error wasn't fired, to the surprise of almost everyone. The owner was asked why the guy wasn't fired. He answered something along the lines of,
I've had coworkers who made major bugs that crashed servers and workstations and caused a lot of downtime. This is because they wrote sloppy code in a hurry and never bothered to check it. Management usually wants faster turnaround time on projects.
So your choices:
Plan A: Blame managers for forcing you to work under stressful conditions that lead to a workplace hazard (stress) that caused you to make the error. Cite that you had to work a lot of overtime and the lack of breaks and sleep caused you to miss a major bug.
Plan B: find someone like me who takes their time coding and have them look over the code and fix the problem for you. Sometimes another pair of eyes helps to find things you've missed.
Plan C:
Go to work in flip-flops, a Hawaiian shirt, sunlasses and tell everyone you are on vacation. Make Pacman noises, and talk to your invisible friends. Claim insanity and see if that works.
Plan D:
Start looking for another job ASAP.
Remember, Slashdot does not have a -1 disagree moderation, and no, troll, flamebait, and overrated are not substitutes.
One of the benefits of working for a big company is a QA/UAT department. You have an entire department of people lined up just to test your shit. And, usually this type of job makes a person very anal. They log defects for just about everything.
The person writing the code can unit test to his or her best ability, but it is really the job of someone else to put it through the wringer testing thousands of simulated real-world scenerios. Sure, a coder could do this testing. But a QA guy or gal is doing really well if he makes 3/4 the salary of the guy who wrote the code- so a divison of labor only makes sense.
Not to mention the person writing the code makes the worst tester in the world. You only test it the way you THOUGHT people would use it. So, while a coder is perhaps the one who created the original problem, the real fault is in whoever let this slip through to production. Assuming, of course, that it wasn't some kind of time-bomb easter egg that would have been impossible to test. Although, good QA testers should alter their system date/time when testing date sensitive routines.
Good planning would have had an abort procedure, so the show would go on. Everything changed should be undone if it did not work. They could figure it out after the paper was printed.
Errors are inevitable. Good planning and implementation keep you from falling on your face even when you publish seven days a week. It's not the coder's fault.
Friends don't help friends install M$ junk.
Send the coder to the Open Source world because no one is going to pay him to code anymore.
And send his supervisor too for not testing the system properly before trying to roll it out.
George Bush + Linux = "I will not let information get in the way of the fight against Windows"
Mind you, here in Perth we only have one daily newspaper and it sucks, so I can't imagine getting worked up about a failed delivery.
How Would You Handle a $1,000,000 Coding Error?
Frankly, I can't believe anyone would pay $1M for a coding error. Hell, the guys I work with make coding errors all the time, and practically for free!
(That's free, as in beer.)
Noone [in their right mind] orders a brand new paper publishing system from a single consultant. The software probably was priced in several million dollars. Somewhere between the components something broke. For example, the file format that the publisher produced was rev. 2.1, but the software at the presses side was only aware of rev. 1.7 and below... If the coder only tested his code with the "other" piece of latest revision, he would never see any problem; and it is not his guilt that in real life the real customer uses some obsolete stuff that isn't compatible...
This kind of problem is clearly of administrative nature, of a system design and of checking which pieces work with which other pieces. Clearly, blame should be assigned to non-existent QA procedures, insufficient unit testing and [obviously] inadequate integration of components. The coder is nowhere here, it's all system design and QA stuff, realm of managers.
I'm a programmer for a large, (US) national newspaper chain and screwing up the publication cycle is somewhat more common that you might think.
/dev/null was deleted and the backup systme had been down for 6 mos. and take out $50,000 - $100,000 in advertising.
Most daily newspapers produce various editions, between 2 and four, and I've seen a couple of times, where only one edition is printed due to "codeing errors" (like the 1 billion seconds from the epoc thing - my personal favorite).
Of course the vendor had to be called at the $500/hour emergency rate to fix their own error.
Once I saw a print pre-processor go off line because
The call daily newspapers "the daily miracle" and when you look at some of the computer band-aids they have producing them, you can see why.
True story. I was working an assignment as a tester for Microsoft. I apologize for the use of variables, rather than names, but I don't want to get sued for breaking NDL. There was a deadline on the release, and if we missed it, there was a penalty of $1 per copy shipped. 20 million copies were due to be shipped on date X. The day of date "X", we realize there's a fatal bug that causes Product "Y" to crash after running any segment that lasts longer than "Z" minutes. Somehow, I'd completely missed this bug. I have no idea how, don't ask, but I completely missed it. We even checked back 3 months worth of revs...the bug was sill there in each one. Of course, the product was late, costing Microsoft a whopping $20 million. What did I do?
I was "allowed" to resigned gracefully, quietly, and have learned a valuable lesson about software testing: It's not whether you miss something, it's whether or not someone else will find it in time to cost you your job. (nods sagely)
-The Libra
"Please be patient--The future will begin momentarily."
They're always hiring. And if you screw up a burger, it only costs the company about $0.17.
You insufferable ass -- you just slashdotted Illinois.
May we never see th
As long as I keep checking in my code as someone else, I won't have to.
Show me on the doll where his noodly appendage touched you.
Everyone makes mistakes, good teams try to find those before deployment.
Where was the testing?
Who decided that there would be no testing?
Who decided that they would simply deploy the thing with no plan B?
Obviously the PHB, you just need to point that out to the VP and you can have the PHBs seat.
Less look fast, more go fast.
So the paper can deliver every day for 158 yrs using mechanical printing presses ~ except where natural disasters occur ....
The printing problems at the Chicago Tribune were related to efforts to upgrade computer equipment used to produce the newspaper, Malone said. The Tribune acquired customized software for the upgrade from an outside provider, and it contained a "coding error," he said.but as soon as computers are involved their printing press has morphed into a computer system. I wonder what provisions to *test* the upgrade before use where made?
fail to recognise newspaper as computer system?it would be easy to blame the developers and company and there should be some recognition of responsibility for technical accuracy. but what about the newspaper. they have made a fundamental mistake in not recognising that printing press + computer = computer and let their newspaper system fail at the mercy of coding mistake.
It seems while the paper can handle *mechanial* failure (158 yrs, 1 non delivery) it has yet to grasp *software* failure.
peterrenshaw ~ Another Scrappy Startup
It doesn't make any difference if it's a broken punch or a whole set of dies cracked down the middle ($4000 for a 6 inch section, over 60 inches you do the math)...
"If you say 'oops', it's OK."
Did he say Oops?
Seriously though...shit happens. That's why you don't bill employees directly for the mistakes they do. Suck it up, learn, and move on.
--
BMO
Bah, this is absolutely nothing compared to the coding error that brought down Canada's Royal Bank last month, leaving millions of customers without paychecks, access to their accounts, etc.... And this too was attributed to human error, but had far more drastic repurcusions than not getting your morning paper, and cost RBC a heck of a lot more than a million dollars.
It's better to burn out than to fade away
The old system was working! But, they didn't follow the rule, and look what happened!
Next time your PHB dismisses testing as an "unnecessary waste of time and money, just write your code carefully and you won't need to test," resign.
My new
That gives new meaning to /.
Unfortunately, I can't find the orignal source, so here's my versions:
:)
A high level minister of the USSR is on his way out and comes to his replacement to offer advice. He hands him two letters and tells the man "If you ever get in a situation that you cannot figure out how to get out of, open the first letter. If you ever get in another, open the second letter."
Well time passes and the new minister discoveres himself in a position from which there is no escape, so he opens the first letter. It says: "Blame everything on me." He does as it says and blames everything on his predicessor, and all is well. Some time later, he is again stuck with no means out so he opens the second letter. It says: "Get a pen, sit down, and write two letters."
So I guess it just depends on which letter applies to you
In all seriousness I'm not sure what to do in a situation like that. My level of responsibility doesn't afford me the ability to make mistakes of that magnitude.
Works everytime:
http://www.sanecomputers.com/articles/humor4.htm
1. Make a dopey "error" that costs a million. Get mentioned on Slashdot. 2. Make heroic effort to get them back up and running. Get recognized for brilliant skills. 3. Write book about the whole affair. Get book mentioned on Slashdot. 4. Profit!
The difference between spam and poop is that you don't have to dig through septic tanks looking for real food. -- Me
In the aerospace industry we deal with very expensive space equipment. As a result there are procedures that must be followed so nothing terrible happens. You can see where this story is going...
Imagine a satellite, nearing completion, bolted down , and ready for final inspection. Joe Blow forgets to write in the change log that he took the un-bolted the satellite from the base. Workers come in the next day, do some work after checking the logs, and... the satellite tips over. OOPS... a billion dollars well spent. That is 1,000,000,000.
"Uhm, boss, the good news is we finshed the satellite yesterday and... I don't know how to say this, but our last two years of work... well, I sort of... well, I tipped it over and it's destroyed.... "
Or how about the contracts guy who forgets one Zero on a contract. Instead of ten million, the contracts reads one million. Of course everyone misses the zero... except the people PAYING. Contracts are signed and oops... "we want to start a new contract, we sort of forgot to add a zero." To which they reply, "Fuck off, you signed it..." and prompty save the company 9 million dollars.
It truly is a site to see. The speed at which they print is fantastic. A minimum run on many of them is 20,000 copies, in the time it takes to spin up and spin down, that many will have come off.
This is necessary too, if we wish to efficently print the massive quantity we desire. There are a lot of daily newspapers. Even in my small city there is at least 8 I know of. An old mechanical pres simply wouldn't be able to keep up. Never mind printing speed or anything else, setup time was a bitch. You had to have plates made to stamp your text on the page. These then had to be loaded and calibrated for each run that was to be done.
Now it's all electronic. At the minimum, you place the reference prints under a camera, and normally the layout files themselves are loaded in to the press. It then can go to work right away.
I know it's kind of retro-geek cool to bag on how much harder technology makes everything and how much better it was in "The good ol' days" but that's not usually the case. Old nechanical presses simply cannot compete with the speed of computerised presses, which are necessary to operate with the speed and efficency that is demanded today.
Bad news: We missed printing half of our papers.
Good news: Rainforest saved.
paintball
Software testing is boring boring boring. You have to try things out again and again after each change. Modules that haven't changed gain confidence in the face of changes and might not be tested, but omitting tests can end up being the Achilles heel. There can be an overwhelming desire when a project nears completion to just get things done and over with. After all the hard problems may well be solved and it's all down to seemingly inconsequential details.
These days programmers have a Sword of Damocles hanging over them. Once they finish a major piece of code they may have a hard time finding new work. The economy has not lived up to forecasts of more jobs. Outsourcing has reduced computer opportunities. Management of many companies do not see new uses for computers. Off-the-shelf programs abound for almost every aspect of computerized work.
Stress may distract software engineers enough that someone will make a major mistake.
Know your pads. One time pad: good for cryptography. Two timing pad: where to take your mistress.
Most project managers (especially ones with no technical experience... who shouldn't be let near a technical project) plan their projects with timelines with rose colour glasses. They assume there will be no coding issues discoverered in testing. Or worse, they do, but then let scope creap come into it, and borrow time from testing for the new items introduced in the scope creep. Bye bye testing time.
Mind you, I have also seen QA managers who believe that the testers only need to understand the software, and not the business where the software is to be used. This has sometimes leads to problems in end use. In any case, I tend to blame poor management before I blame the little guy. Projects like this are big enough that the process should have been able to catch things like this... unless the process was flawed.
My opinion... ready, set, slag away!
-- I ignore anonymous replies to my comments and postings.
I'm willing to bet there will be an opening for IT manager soon.
Well.. maybe. Or Maybe not. But Definitely not sort of.
Do as the BOFh would do! :-p
With a shotgun.
But there is another kind of evil that we must fear most... and that is the indifference of good men.
As you pointed out, QA should have caught something this basic. There had to be a lot of careless decisions made here, and none of them are necessarily any one coder's fault. Blaming a "coding error" is simple, and makes people forget that a manager didn't do their job correctly. I've seen this particular scenario played out a dozen times before:
Last Monday Suzy Manager shouted at her team, "The schedule says we install on July 18th, so this damned product damned well better be installed on July 18th, you all got that?!"
But the vendor's ship dates slipped, and testing dates got pushed back, even though there was nothing particularily important about July 18th; except for Suzy Manager's promise to the CIO that she'd get WhizBang 2.0 installed by July 18th. And she would, too -- she had 25 points on her review riding on that very promise.
By the 14th, when a new patched version arrived that fixed the bug they discovered on the 10th, Suzy was visibly distressed. "They damn well better have that transmit bug fixed, they've been dragging their feet long enough."
Perhaps the testers just kept testing the version from the 10th instead of upgrading to the version of the 14th. It was beautiful on Saturday, so maybe the tester called in with a bad case of 'weekend flu.' Perhaps they got the patch late Friday afternoon, and the vendor swore up and down that it was just one little bug, our guy knows it's fixed, don't worry, it's better now. Whatever -- Suzy was under the gun, so she simply said "ship it."
Regardless, some nameless coder is flapping in the breeze today. Suzy is probably running around the IT department at the Tribune screaming, "we'll never buy code from those bastards again, I swear!" in a vain attempt to deflect criticism from her department.
But the CIO usually knows better, and Suzy knows the CIO knows better, and she's already sent out her interview suit to the cleaners. Even so, she'll feign total surprise to her department as she boxes up the little wooden carving she picked up during a drinking cruise to Mazatlan a couple years ago. A couple of tears later, she's interviewing over at Microsoft Consulting Services.
Or, maybe I'm completely off the mark. Perhaps they've been testing the code for a month and it's worked fine, but they installed the new code with the old libraries, or the new libraries with the old code, or the destinations were SP2 with some new security turned on. Of course, the QA department should be testing the installation packages as well, but we all know that in hindsight, right? As Yogi Berra might once have said (were he an IT manager,) "In theory, there's no difference between the lab and production, but in production there is."
John
If I ever was on the specifying end of software (instead of the coding end, where I do these things reflexively anyway), I would demand the following of the team I hired:
And that is why you're on the coding end instead of the decision making end - you'd have a compact, bug-free, featureless product that hit the market three years too late that nobody could afford to buy anyway.
paintball
The coding error, if it existed, was minor.
The serious error was in switching to a new system with such clearly inadequate testing.
I write software for a company that handles $45,000,000+ of client cash every week.
A mistake I made in May (discovered this very day, by yours truly) had backed up about $400,000 per week.
Did I get stomped?
No.
A bottleneck had been identified, repaired, and eliminated!
Behold the power of positive thinking.
Writers imply. Readers infer.
Or the guy who signed off on rolling the thing out without extremely thorough testing beforehand. The IT / Software end of companies is a lot like animals in this respect- cut the head off and you prevent the body from doing anything grossly stupid.
:P You get what you pay for! :|
Start with the CTO and work your way down. If it's a software problem, why wasn't it discovered sooner? Who was in charge of QA? Who was in charge of making sure QA did their jobs? Who said YES WE CAN DO IT!, lying out their ass?
The fun thing about capitalism is greed and/or the desire for profit leads to systems like this being built by the lowest bidder.
...as Communications Minister in the Australian government any day now
insecurity asks the wrong question irritation gives the wrong answer
It's never an individual's fault. It's a breakdown in the QA/FVT/review structure. Is it the person who coded it's fault? Is it the team that reviewed the code? Is it the author of the FVT tests? Is it the person in charge of QA?
What's that you say, this is all the same person? No wonder you had the bug to begin with...
int func(int a);
func((b += 3, b));
Here is the full text of the article in the Tribune:
A story we never thought we'd print
By James Coates
Tribune computer columnist
Published July 19, 2004, 6:40 PM CDT
Nothing built by humans can go wrong in as many ways or with as nasty an outcome as a computer system.
The people who create the Chicago Tribune started relearning that fact about 4 p.m. Sunday when they noticed that nothing was getting through as they attempted to beam the stories, artwork and ads from Tribune Tower to the Freedom Center printing plant.
About 13 hours later, they finally started printing a 24-page version of Monday's Tribune that should have already been landing on their readers' porches.
It was a misfortune that most people in the news business don't ever expect to experience. Newspapers do not miss days -- and Monday was close.
The only time the Tribune failed to print was during the Great Chicago Fire of 1871. That time, the lesson was that nature can be fickle and dangerous.
Now, the paper has learned that the same goes for the computer technology that has graced the industry with unparalleled productivity since the 1990s.
Business computer systems are cobbled together as row upon row of workstations, each running an operating system based on an estimated 50 million lines of instructions. In turn, the worker bee desktop computers connect to the queen machines with their own millions of lines of code in a different language.
An endless nest of wires, cables and even radio signals move instructions at light speed between the central computer and the workstations. The main computer also talks to all the peripheral devices needed to accomplish the mission.
The peripherals can be banks of hard drives, storage bays, printers, scanners, cameras and specialty devices as diverse as a pager or a printing press several stories tall.
The certainty that each and every one of these massively complex systems will crash haunts the people charged with keeping this thoroughly digital world up and running.
Those people are engineers, and so they often reduce it to numbers.
An often quoted study by Carnegie Mellon University computer scientists studied 30,000 software programs and found five to six defects per 1,000 lines of code.
And this is for finished software sent to customers.
When writing new programs, there is typically a defect in every 10 lines of code. About a half dozen defects per 1,000 lines remain after a process of checking, rechecking, cross checking, testing, retesting and finger crossing.
The hubris of computing becomes clear as one realizes that each of these errors in code branch out with instructions to millions of other lines of code. Quite often, they find pathways never before taken by that particular program.
Collisions occur on these pathways and trouble is spotted. Maybe it can be fixed or maybe technicians can only perform a "workaround" that can't be guaranteed.
Dick Malone, the Tribune's senior vice president and general manager, said that around 9:30 a.m. on Sunday technology crews started a planned upgrade to increase the newspaper's Sun Microsystems servers from so-called 10K models to 15K machines.
To do this, experts from the company that makes the newspaper's core Windows-based publishing software, Denmark-based CCI Europe A/S, needed to install upgrades of its Newsdesk brand software that the Tribune and other clients use.
Malone noted that they checked and rechecked, tested and retested all day. Everything seemed to be working without a hitch. Then, they punched the button that was supposed to send all of the content for the newspaper to the printing plant.
Nothing arrived.
Frantic hours went by as deadline after deadline slipped while crews struggled to find a fix. Malone said he went so far as to start setting up the newspaper's pages on the art department's Macintosh desktops, hoping to get at least something printed.
... as many places, you're not allowed to fire someone for alcoholism or mistakes made as a result thereof, without first offering a rehab program...
"Why the hell didn't you see this bug?!"
"You smell funny!" *puke*
start looking in the help-wanted ads right away. Go get a newspaper and... Oh wait.
If they had a clue, they would grow 10000 acres of canabis which;
A) grows 10000x faster than trees
B) makes 10x more pulp per acre
C) uses 100x less water.
D) stick it to the govt.
But would they ever do that? NOOOO coz there are no patents in the process to expoit and oh the trouble of the govt wackos like bush n old guys being so anti-canabis (to protect their buddies profits)
I guess they wouldnt want 100s of pot heads heading up to the 100000s acres of weed to take a few home, but what is so wrong with that OTH?
Liberty freedom are no1, not dicks in suits.
Grab all the eMails where someone in management told you to cut a corner, or replied that they didn't want to spend too nuch time designing, or authorized fewer QA hours then should have been done, and print it all out, with headers, and forward it all to another account.
When they come after you, present it as if it you were trying to do it right, but somebody wouldn't let you.
If they fire you, sue.
Unless:
a) you work for one of the few companies that actually supports a real team atmosphere, or
b) Everything was done by the book, and you still screwed up.
When someone in an industrial field is forced to work 16 hour a day, 7 day a week, and has a mistake the company suffers the ramaifications, not the worker(or the workers faimly).
The Kruger Dunning explains most post on
Well, although can't say the guy did a great job... if the DB was so important, why was there not a regular backup?
You are pointing out two problems taking place simulataneously.
One is a minor human error, but it is obviously an unintended act.
NOT having a recent-enough backup IS a serious issue. This issue has been pending for, as you say, 6 weeks, and it is a critical issue (if the data is valuable as you seem to imply).
You do not go around deleting all entries in your DB for fun, but you know some software is going to go bananas on you one day and start messing up with your DB, whether it is in such an obvious form as deleting all the records or simply altering them all in a subtle way that takes a while to notice... (change all prices from euros to dollars?).
A succesful project or business is much more than the sum of little individual acts. There is such thing as planning for things going wrong. And in this day and age, a database backup is no longer a problem.
E) Pulp does not require hardly any bleaching or even a tiny fraction of the toxic chemicals wood-pulp requires to process.
E.1) No toxic chemicals to expensively dispose of (less pollution).
F) Pulp requires a fraction of the processing compared to wood-pulp.
G) Same (non-THC-producing) hemp grown for rope and clothing can be used... existing/established farming methods.
H) Requires _much_ less fertile ground (no fertilizer) for growing... technically it _is_ a weed (not just a nickname).
H.1)
I) Requires much less expensive processing equipment to farm (ground requires drastically less/no tilling, collection can be done with hay-baling equipment instead of heavy trucks and tree-cutting machinery, etc.).
I'm sure I'm forgetting some.
Note the reference to a non-THC-producing strain... I'm not into pot, but I certainly can see a phenomenal idea when I see one (seen this one many years ago).
- Preferences: Solaris 10 (servers), Ubuntu (desktops), Solaris 11 (personal servers) -
This is pretty much what happened with the first launch of the shuttle. Remember when the Columbia was to first time lift off, and it was just around the final 10 count when they abandoned the mission due to a software error. The problem was then searched by many programmers to find what happened, and it was finally found by the guy who made the mistake! Of course this guy got a huge bonus for finding it, although no one seemed to care that he was the one that made it. But that's the life of a programmer :-)
Steven Rostedt
-- Nevermind
This shows why truly redundant systems should be build using a mixture of different hardware and software developed by independend teams. This would reduce the risk of all devices being hit by the same problem.
"The poor schmuck" will, in my experience, have spent the last 18 months hearing phrases like:
"Time / Quality / Functionality: Choose Two"
"You can't test quality into a system"
"Measure twice, cut once"
"We need to parallel run the UT system"
"Engineers shouldn't be testing their own code!"
"I wouldn't be using NT for that, mate"
and so on.
These are the words technical people use to warn management of impending doom. Managers on the other hand have other things to worry about like delivery dates, sales, penalty ratchets and so on. When the "go" decision was made it will have been made by senior managers who get paid the big bucks to take the big decisions and the big sh*t when it all goes pear shaped.
The question is how the management handled mitigation by way of backups to manual processing, rollbacks to the old system or risk analysis during project planning.
Automation of an entire printing plant is a big job and it is probable they planned for a failure as a worst case scenario and will just put the 1M loss down to experience.
I wish at was Friday, but I dont want to wish my life away. So I wish it was last Friday.
He'd have included an EULA with a "I'm not responsible for anything yadda yadda" at the end. Yes, even for inhouse software. It's not like anyone has to read it, you only have to include in it "by installing the software you agree with this".
---- Take the Space Quiz!
seppuku
Take a look at some K code (there are examples in the user manual) and then come back and say that. If K is too exotic, then try looking at some macro-heavy LISP code -- it has the same problem just slightly less so.
Code density can be good when you're trying to see the big picture (fewer screenfulls of code is a good thing in this case), but it can work against you when you're trying to understand the little details.
Regular expressions are nothing more than a hack to make up for the fact that generalized LR parsers were quite inefficient up until a few years ago. Just compare a reasonably complex regular expression to the BNF form of a grammar for parsing the same input to see how much easier GLR is to use -- you can see some examples of just how easy GLR parsing is to use here. And it can actually handle more general patterns with nesting, etc. I really think regexes are really just a question of premature optimization -- with GLR you just start out with an incredibly readable and simple grammar, and if it proves to be slow (i.e. if there are lots of points of ambiguity along certain parse trees) you can optimize it towards a purely LR(k) grammar.
HAND.
The newspaper I work for recently purchased a production system (server, archive, Workstations, etc.) the problems we saw came about because management went about looking for a new system the same way parents go about looking for their kids first car, ie. "how much is this going to cost us?" verses "Will this system be easy to migrate to both from an IT standpoint as well as end users and will it cover all our needs?"
The consequence of the managements decision is that my newspaper now has a system that parts of it are still being beta tested (at the expense of my work not the companies we bought it from), a system that before hours of hard work by our IT staff just kind of randomly lost files, and still does on occasion, and a system that has an archive thats database is limited by the number of entries and not how big its hard drives are, what this means is that our archive will be full after a year and a half, two years if we're careful.
all of this came after our IT department and all of their advisors let our management know that the wise decision would be to go for the next least expensive system that has show itself to be a good relyable system through use at multiple large news papers. Did Management listen, no they just saw a pricetag. what it got us is a system that after all the extra work that had to be done by our IT staff is mostly usable, and cost us more than managements second choice system. On top of all that the new system almost kept us from putting out our paper. The final effects of putting in this system is that to cover the extra costs jobs had to be cut.
So like theshowmecanuck says yea the code may have been flawed, but that should not have been a problem had management gone about impimenting the code properly having done proper testing before implimentation this problem may have been avoided, the same goes for the system I'm forced to use, sure there are bugs in the code, but we got what we paid for when my company purchased a system that still has parts of it in beta.
"Napalm is nature's toothpaste" - Chef Brian
Ah, yes, but now you are a step higher on the corporate ladder, and while in conversation with colleages the finger of blame always points up, in conversation with the boss however the finger always points down that ladder. Management is never to blame for bugs.
Which time? I'm the guy who (unintentionally) wrecked the first Saturn ever wrecked (job #65). Since then I've wrecked one other (job 2 million and something), so my track record isn't that bad :)
:)
:)
Most of the time you don't actually break something (be it product or be it equipment), but fixing the bug and getting everything rolling again takes time.
And since the "value" of the product that is running on the line is about $5000 a minute, time is indeed money.
I've probably had a couple 1+ hour breakdowns, but this doesn't even compare to the time my buddies plant went down for three days x 2 shifts per day ($14M).
They were Lear-jetting parts in on a daily basis (they kept blowing up the new stuff and didn't seem to have the sense to order spares). Ron would show up at the service entrance at the airport to pick them up and it got to the point where the guys would just open the gates when he drove up
My most recent one was when we changed the line speed of the skillet line and the thumbwheel switch messed up and opened up the 8's bit in the ten's digit (faulty thumbwheel switch) so that instead of running at 42 jobs an hour it was trying to run at 80 JPH (it would have tried to run at 122 but it's limited in the software to 80 JPH)
Zoom zoom.
Oh wait, that's the other guys
John
I dream in binary.
I cost the Times $1,000,000 and all I got was this lousy T-Shirt
You never begin a software upgrade without backout plans. At multiple checkpoints during the upgrade, backout plans are fully documented and if the upgrade has gone well, you move onto the next checkpoint.
Of course, none of that matters if you haven't placed a "production-like" loan on the system before attempting to upgrade production. It is about risk management. You can't remove all risks in an upgrade, but you should be able to manage them assuming the risks are all documented and provided to stakeholders. If you've told the stakeholders all the risks clearly and in writing and pointed out that there was no test system, no DR system or no good backout plan that wouldn't impact half the customers **AND** they still decided to go forward, oh well. Their decision.
I work as a system administrator for a newspaper since 7 years back. 5 Years ago we were out-sourced to another company, my job stayed the same (save for extra work needed) but the decision paths and cost terms has changed a lot. -- More management, less money, cutting corners, less contact with customers has actually led to an increase in costs by 25% for the newspaper.
:)
For 5 years we have worked on cutting costs instead of doing what we originally did; produce a newspaper. This has led to a lot of cut corners, patchy systems and above all stupid decisions. Now we have to spend most of our time with our hands tied behind our backs because there's no way to prove a _direct_ profit we can put on the price-tag we show to a (non-technical) customer when we are suggesting a change. It's always cost > functionality.
Companies that only sell services to customers has no goal, does not work. There has to be something you produce, something to live for instead of just being a money making machine.
Management cannot be just management to be management. A good manager is someone involved working with something they have a passion for. My boss didn't create this newspaper, nor did the boss of the actual newspaper and they probably don't have a special interest in media, it's just a career pushing money making machine for them.
Oh, I guess this turned into a rant
From the OS X man page for "sticky (8)":
/usr/include/sys/stat.h for an explana-
/tmp which must
NAME
sticky - sticky text and append-only directories
DESCRIPTION
A special file mode, called the sticky bit (mode S_ISVTX), is used to
indicate special treatment for shareable executable files and directo-
ries. See chmod(2) or the file
tion of file modes.
STICKY DIRECTORIES
A directory whose `sticky bit' is set becomes an append-only directory,
or, more accurately, a directory in which the deletion of files is
restricted. A file in a sticky directory may only be removed or renamed
by a user if the user has write permission for the directory and the user
is the owner of the file, the owner of the directory, or the super-user.
This feature is usefully applied to directories such as
be publicly writable but should deny users the license to arbitrarily
delete or rename each others' files.
Any user may create a sticky directory. See chmod(1) for details about
modifying file modes.
Tell them you got the code from SCO..
The problem isn't the coding error. Such errors will inevitably happen. They also happened when we designed mechanical printing presses, which might break down unexpectedly because of built-in design errors. The problem here is poor management.
In the past, creating single points of failure was hard: you had lots of men working on lots of printing presses. You couldn't do something as stupid as replacing them all in a single night--it just wasn't physically possible. Computers have just given greedy management the freedom to make more serious mistakes in a shorter amount of time. In this case, the mistake was upgrading a whole infrastructure at once and believing, naively, that that would necessarily go smoothly.
With impeding deadlines and cost overruns due to poor management forecasts, it is no wonder a lot of software these days have bugs in it. Not because code writers today are any worse (or better) than those of yesteryear, but because projects are cutting more-and-more 'fat' from their projects.
Translation: not enough testing.
Testers are often looked upon as the bottom rung of the overall software life cycle. Their duties are perceived by many to be hum-drum and easy to take out of the cycle. Unfortunately, cases like this show exactly why testing is one of the most importants facets of the software life cycle.
Remove, or severely limit, testing in your product, and you have only yourself to blame when problems arise out in the field. For this particular mele, if testing was removed from the project, I would blame the project manager and whoever made the decision to remove it. If testing was to blame, I would instil better procedures, beefed-up test cases, and possibly hire test engineers who ARE test engineers and not some developer who has a few cycles to burn.
That originally pushed pot onto the restricted list in the 30's. They were trying to promote newly-invented nylon rope, and did not want competition from hemp rope, which was dominant at the time. Purchased congressmen got on the floor of the House and spouted nonsense about "pot makes black men violent and makes them desire white women". Then, in the 50's, when passing further restrictions, same purcharsed congressmen argued that "pot makes people into pacifist communists". Never let facts get in the way of your dogma (see Partnership for the Truth-Free America).
Blame Vinay.
take the blame for the mistake. If you are the programmer and it was a programming error, the fault clearly lies with the QA people who didn't catch it. If you are the sysadmin or the QA guy, whatever happened was clearly a problem with management settting unrealistic timelines or expectations. If you are a middle manager the problem is definitely your inadequate budget.
Now before I get modded down, I be to remind whoever might read this that what I am saying is FACT. - bogaboga
I was in a 'special educational' environment where one factoid stood out - they tested some volunteers' abilities at motor skills, then got them drunk/stoned (where do I sign up for this stuff?). Then they tested the volunteers motor skills again, and also asked them how they felt. Surprise, surprise, they said they felt drunk/stoned and stunk up the tests.
Then, days later, they repeated the tests/questions. The alcohol recipients said they felt normal and tested normal. But the pot group said they felt normal, but still tested as impaired.
This is why the teacher said MJ will never be legalized in the US - too difficult to set a legal limit on a DUI level the way BAC% works for ethanol consumption.
Unfortunately, the insurance has a deductible of $1,000,000.00.
I do not deploy Linux. Ever.
plenty of jobs where people on the ground are working with kit worth more than that. Easy for a forklift or truck driver to cause a lot of damage when moving stuff around.
It happened where I worked once.
Auto company using a "Sel 32" computer as the central tool for automating distributor testing-calibration. Systems Engineering Labs warned them that they were going to discontinue the line so they needed to stock any spares while production was still happening, so the company they were leasing it from jacked up the price to pay for stocking a bunch of spares.
Company decided to buy their own to save a few bucks and be sure the plant kept running and wasn't hostage to future price rises on an irreplacable, mission-critical machine. Cost was a few hundred short of a million. (The 98 cents pricing phenomenon, no doubt.)
Box showed up on the loading dock. One rack, floor to ceiling. Forklift operator picked it up, took it down the asile, took a corner too fast, and it fell off the forklift. Hit so hard it not only set off the tip/shock detectors but BENT THE RACK.
SEL, of course, wouldn't warranty it. The auto company was self-insured. So they buoght ANOTHER one (and kept the "clunker" for spare boards if anything failed in the future.)
Forklift driver was NOT fired. (Union, hadn't been notified he was toting a megabuck this time, and lift drivers are allowed a quota of oopsies.)
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
One guy alone does not make a mistake like that. I'm a big believer in: If it's possible to make a mistake (particularly one of this scale) then preventive procedures and quality assurance are not up to scratch. This was a team effort, not a 1 man screw-up. If one guy loses his job or suffers as a consequence of this event, then I urge you all to action. Action similar scale to our support of Kevin Mitnick. How would you like to write code and (say) forget a comma, have it pass cleanly through all peer reviews, unit testing, system testing etc etc, only to find that it caused a $1m problem and your career and livelihood was now on the line ? One of the biggest reasons you are constantly subjected to security audits and process audits and finance audits is not because you have been a naughty boy and someone from up on high wants to catch you out, it's to find holes in processes and procedures that might at some point in the future cause a problem (like this one) (I'm tipping that the auditors don't get their arse kicked over this.) If one guy loses his job, you'd all better stand up and make with the noise, otherwise no one will come to your rescue when you need it.