Slashdot Mirror


How Would You Handle a $1,000,000 Coding Error?

theodp writes "The Chicago Tribune's efforts to upgrade its computer system over the weekend turned into a fiasco when the system crashed, halting all printing operations and leaving about half of the Trib's subscribers without papers. The software contained 'a coding error,' according to a spokesman who estimated the cost to resolve the problem at 'under $1 million.' Any advice for the poor schmuck who's going to get the blame?"

16 of 878 comments (clear)

  1. More common than you think... by John+Whorfin · · Score: 5, Informative

    I'm a programmer for a large, (US) national newspaper chain and screwing up the publication cycle is somewhat more common that you might think.

    Most daily newspapers produce various editions, between 2 and four, and I've seen a couple of times, where only one edition is printed due to "codeing errors" (like the 1 billion seconds from the epoc thing - my personal favorite).

    Of course the vendor had to be called at the $500/hour emergency rate to fix their own error.

    Once I saw a print pre-processor go off line because /dev/null was deleted and the backup systme had been down for 6 mos. and take out $50,000 - $100,000 in advertising.

    The call daily newspapers "the daily miracle" and when you look at some of the computer band-aids they have producing them, you can see why.

  2. Re:Just one by wo1verin3 · · Score: 5, Informative

    Google Cache as per your request.

  3. Re:from Office Space by Mattwolf7 · · Score: 3, Informative
    Nice try...

    #56 Michael Bolton: I must have put a decimal point in the wrong place or something. Shit, I always do that, I always mess up some mundane detail.

  4. Re:My advice by ISPpfy · · Score: 4, Informative

    http://www.tech-sol.net/humor/people61.htm A new manager spends a week at his new office with the manager he is replacing. On the last day the departing manager tells him, "I have left three numbered envelopes in the desk drawer. Open an envelope if you encounter a crisis you can't solve." Three months down the track there is major drama, everything goes wrong - the usual stuff - and the manager feels very threatened by it all. He remembers the parting words of his predecessor and opens the first envelope. The message inside says "Blame your predecessor!" He does this and gets off the hook. About half a year later, the company is experiencing a dip in sales, combined with serious product problems. The manager quickly opens the second envelope. The message read, "Reorganize!" This he does, and the company quickly rebounds. Three months later, at his next crisis, he opens the third envelope. The message inside says "Prepare three envelopes".

  5. You've never seen a modren web press, have you? by Sycraft-fu · · Score: 4, Informative

    It truly is a site to see. The speed at which they print is fantastic. A minimum run on many of them is 20,000 copies, in the time it takes to spin up and spin down, that many will have come off.

    This is necessary too, if we wish to efficently print the massive quantity we desire. There are a lot of daily newspapers. Even in my small city there is at least 8 I know of. An old mechanical pres simply wouldn't be able to keep up. Never mind printing speed or anything else, setup time was a bitch. You had to have plates made to stamp your text on the page. These then had to be loaded and calibrated for each run that was to be done.

    Now it's all electronic. At the minimum, you place the reference prints under a camera, and normally the layout files themselves are loaded in to the press. It then can go to work right away.

    I know it's kind of retro-geek cool to bag on how much harder technology makes everything and how much better it was in "The good ol' days" but that's not usually the case. Old nechanical presses simply cannot compete with the speed of computerised presses, which are necessary to operate with the speed and efficency that is demanded today.

  6. Re:My advice by harikiri · · Score: 4, Informative
    If you're referring to the quote from Traffic - the quote in full refers to two letters (not three):

    GENERAL LANDRY
    When Kruschev was forced out, he sat
    down and wrote two letters and handed
    them to his successor. He said "When
    you get into a situation you can't
    get out of, open the first letter
    and you'll be saved. And when you
    get into another situation you can't
    get out of, open the second." Soon
    enough this guy found himself in a
    tight place. So he opened the first
    letter. It said, "Blame everything
    on me." So he blamed the old guy
    and it worked like a charm.
    (beat)
    He got into another situation he
    couldn't get out of, so he opened
    the second letter, which read, "Sit
    down and write two letters."

    They stare at each other a beat. Then Landry smiles.
    --
    Man watching 6 MSCE's around a sun box, looks alot like the opening scene's of 2001:space odyssey...
  7. Re:Bad News, Good News..... by mc6809e · · Score: 4, Informative

    Bad news: We missed printing half of our papers.

    Good news: Rainforest saved.


    Actually, most of the wood pulp comes from trees grown in managed forests where trees are replanted to replace the old ones.

    So it's a bit like growing corn or wheat to eat.

    Strangely, we don't see many people shouting "save the corn!".

  8. Re:Just one by Nefarious+Wheel · · Score: 5, Informative

    The book was "Big Blues", a NYT columnist's documentation of IBM's travails around the days of the rise of Microsoft. Speaker was TJ Watson Jr. I think.

    --
    Do not mock my vision of impractical footwear
  9. Tribune's version by Anonymous Coward · · Score: 5, Informative

    Here is the full text of the article in the Tribune:

    A story we never thought we'd print

    By James Coates
    Tribune computer columnist
    Published July 19, 2004, 6:40 PM CDT

    Nothing built by humans can go wrong in as many ways or with as nasty an outcome as a computer system.

    The people who create the Chicago Tribune started relearning that fact about 4 p.m. Sunday when they noticed that nothing was getting through as they attempted to beam the stories, artwork and ads from Tribune Tower to the Freedom Center printing plant.

    About 13 hours later, they finally started printing a 24-page version of Monday's Tribune that should have already been landing on their readers' porches.

    It was a misfortune that most people in the news business don't ever expect to experience. Newspapers do not miss days -- and Monday was close.

    The only time the Tribune failed to print was during the Great Chicago Fire of 1871. That time, the lesson was that nature can be fickle and dangerous.

    Now, the paper has learned that the same goes for the computer technology that has graced the industry with unparalleled productivity since the 1990s.

    Business computer systems are cobbled together as row upon row of workstations, each running an operating system based on an estimated 50 million lines of instructions. In turn, the worker bee desktop computers connect to the queen machines with their own millions of lines of code in a different language.

    An endless nest of wires, cables and even radio signals move instructions at light speed between the central computer and the workstations. The main computer also talks to all the peripheral devices needed to accomplish the mission.

    The peripherals can be banks of hard drives, storage bays, printers, scanners, cameras and specialty devices as diverse as a pager or a printing press several stories tall.

    The certainty that each and every one of these massively complex systems will crash haunts the people charged with keeping this thoroughly digital world up and running.

    Those people are engineers, and so they often reduce it to numbers.

    An often quoted study by Carnegie Mellon University computer scientists studied 30,000 software programs and found five to six defects per 1,000 lines of code.

    And this is for finished software sent to customers.

    When writing new programs, there is typically a defect in every 10 lines of code. About a half dozen defects per 1,000 lines remain after a process of checking, rechecking, cross checking, testing, retesting and finger crossing.

    The hubris of computing becomes clear as one realizes that each of these errors in code branch out with instructions to millions of other lines of code. Quite often, they find pathways never before taken by that particular program.

    Collisions occur on these pathways and trouble is spotted. Maybe it can be fixed or maybe technicians can only perform a "workaround" that can't be guaranteed.

    Dick Malone, the Tribune's senior vice president and general manager, said that around 9:30 a.m. on Sunday technology crews started a planned upgrade to increase the newspaper's Sun Microsystems servers from so-called 10K models to 15K machines.

    To do this, experts from the company that makes the newspaper's core Windows-based publishing software, Denmark-based CCI Europe A/S, needed to install upgrades of its Newsdesk brand software that the Tribune and other clients use.

    Malone noted that they checked and rechecked, tested and retested all day. Everything seemed to be working without a hitch. Then, they punched the button that was supposed to send all of the content for the newspaper to the printing plant.

    Nothing arrived.

    Frantic hours went by as deadline after deadline slipped while crews struggled to find a fix. Malone said he went so far as to start setting up the newspaper's pages on the art department's Macintosh desktops, hoping to get at least something printed.

  10. One-line CODE ERROR $60 million - AT&T phone c by mdrejhon · · Score: 5, Informative
    History....one line coding error cost $60 million dollars!

    AT&T Failure of January 15, 1990

    Link 1, Link 2, Link 3

    On January 15, 1990, 114 switching nodes of the AT&T long distance system went down. The published cause of the crash was a bug in the failure recovery code of the switches. When a node crashed, it sent "out of service" message to the neighboring nodes, which are supposed to re-route traffic around it. However, the bug (a misplaced "break" statement in C code) caused the neighboring nodes to crash themselves upon receiving the "out of service" message, and further propagate the fault by sending an "out of service" message to nodes further out in the network.

    The crash lasted 9 hours, while programmers searched for the cause of the bug. An estimated 60 thousand people were left without telephone service, and 70 million phone calls went uncompleted. AT&T estimates at least $60 million in lost revenue and damage to its reputation; reliability was a central point in AT&T's marketing campaign against other long distance providers at the time. The incidental damage to businesses that were unable to operate due to lack of telephone service is hard to estimate, but is presumably much larger. The public safety and national security implications of such a large telephone system outage are distressing as well.

    This fault happened despite fault-tolerant design principles which were present in the phone system's design. The nodes failed fast, reporting their outage to neighboring nodes, and there was enough redundancy in the system to route around the failures. The crashed nodes recovered quickly, rebooting themselves and coming back up; however, they would immediately crash because of the messages received from neighboring nodes. The failure happened on an error-recovery path, which is poorly tested. The presence of decentralized distributed control, necessary for scaling, allowed this failure to propagate. The outage demonstrates that a bug in the software can cause a widely correlated failure.

    The possibility of a malicious attack on the system was seriously investigated as a cause for the crash. The investigation came up dry, but most sources acknowledge that this accidental fault could have just as easily been activated on purpose by a knowledgeable attacker. The social implications are investigated in detail in Bruce Sterling's The Hacker Crackdown.
  11. Re:Bad News, Good News..... by killjoe · · Score: 5, Informative

    Actually that's not quite true. The big paper companies do have large forests that they try to manage but they cut trees much faster then they are being replenished. This is why there is relentless pressure to log the national forests. If the harvest from private acreage was sustainable they would never need to log the national forests.

    These days companies like champion and plum creek are finding that it's more profitable to sell the logged areas then to replant them. For example in maine and montana.

    It's more profitable to sell land (especially waterfront land) and then log the federally subsidized national forests.

    Your tax dollars at work!

    --
    evil is as evil does
  12. You forgot... by Kevin+Burtch · · Score: 4, Informative


    E) Pulp does not require hardly any bleaching or even a tiny fraction of the toxic chemicals wood-pulp requires to process.

    E.1) No toxic chemicals to expensively dispose of (less pollution).

    F) Pulp requires a fraction of the processing compared to wood-pulp.

    G) Same (non-THC-producing) hemp grown for rope and clothing can be used... existing/established farming methods.

    H) Requires _much_ less fertile ground (no fertilizer) for growing... technically it _is_ a weed (not just a nickname).

    H.1) ...as a side-effect to H, will grow in much less expensive land. Heck, add water and it'll grow in a desert.

    I) Requires much less expensive processing equipment to farm (ground requires drastically less/no tilling, collection can be done with hay-baling equipment instead of heavy trucks and tree-cutting machinery, etc.).

    I'm sure I'm forgetting some.

    Note the reference to a non-THC-producing strain... I'm not into pot, but I certainly can see a phenomenal idea when I see one (seen this one many years ago).

    --
    - Preferences: Solaris 10 (servers), Ubuntu (desktops), Solaris 11 (personal servers) -
    1. Re:You forgot... by Tony+Hoyle · · Score: 4, Informative

      AFAIK it's mostly down to the paper industry that hemp is illegal anyway... they wanted to produce their more lucrative wood based paper (it's difficult to make a profit when your raw materials are a weed that grows anywhere very quickly.. better to standardise on a limited resource that takes 30 years to grow). Lobbyists were very powerful in the US even 50 years ago.

      The US actually managed to eradicate a weed that grew on the roadside from their shores by agressive burning along with a demonisation campaign to try to turn people off the (then popular) drug... a bit like the 'war on terror' but with even fewer facts behind it :)

      There are many strands of non-THC containing Hemp (given that the social effects of introducing wide availability of another drug are undesirable - alchohol is bad enough). In Europe at least there are fields full of the stuff, as hemp rope and linen is still very popular. Even hemp paper is available, given it's cheap/easy to produce...

      Medical Hemp (the THC kind) is grown under license and given to selected patients to treat certain conditions, although that's mostly still under trial (and is the motivation for the reclassification of cannabis possesion in the UK, so that the drug companies could legally do their trials).

  13. Set the Unix sticky bit on the directory by AnEmbodiedMind · · Score: 4, Informative

    From the OS X man page for "sticky (8)":

    NAME
    sticky - sticky text and append-only directories

    DESCRIPTION
    A special file mode, called the sticky bit (mode S_ISVTX), is used to
    indicate special treatment for shareable executable files and directo-
    ries. See chmod(2) or the file /usr/include/sys/stat.h for an explana-
    tion of file modes.

    STICKY DIRECTORIES
    A directory whose `sticky bit' is set becomes an append-only directory,
    or, more accurately, a directory in which the deletion of files is
    restricted. A file in a sticky directory may only be removed or renamed
    by a user if the user has write permission for the directory and the user
    is the owner of the file, the owner of the directory, or the super-user.
    This feature is usefully applied to directories such as /tmp which must
    be publicly writable but should deny users the license to arbitrarily
    delete or rename each others' files.

    Any user may create a sticky directory. See chmod(1) for details about
    modifying file modes.

  14. Re:grow canabis, stupid morons.... by aastanna · · Score: 3, Informative

    Not to rain on your parade, but you won't get stoned off that.

    "Fibre hemp is an annual herbaceous plant which flourishes in temperate regions. All cultivars tested in Alberta have been low-THC (delta-9 tetrahydrocannabinol) cultivars. Canada has adopted the 0.3% THC standard established by the European Union as the concentration which separates non-psychoactive strains suitable for legal fibre production from those which are illegally grown for their properties of intoxication. The 0.3% THC designation is very conservative. Most narcotic strains range from 3-5% THC, with cleaned, high potency material reaching as high as 15% THC."

  15. Re:Bad News, Good News..... by Himring · · Score: 4, Informative

    Actually, most of the wood pulp comes from trees grown in managed forests where trees are replanted to replace the old ones. So it's a bit like growing corn or wheat to eat.

    You couldn't be more wrong. I live near a large paper mill that produces products for news paper companies. I've lived here all my life. I've seen first hand how they rape the forests, the mountains, etc. Sure, they plant yellow pine because yellow pine grows fast and fits their purposes, but where they plant the yellow pine was once a lush hardware forest of oaks, maples, etc. They take out the large hardwoods that provide acorns for deer and other small animals and replace them with pine, so now the pines grow unabated. The animal populations suffers. Also, any smaller hardwoods they cannot use they slash or poison so it will die. Next, since there are so many pines we recently had a plague of pine beetles. Huge tracts of pine forest (man-made pine forests) lay in waste in the mountains, hills and along the highways here. This is partly the fault of the paper company. Also, the chemicals they use creates an artificial/chemical fog that wreaks havoc. I kid you not. We had one of the largest traffic accidents in US history here some years back where 100s of cars piled up on I75. It made national news. I think the paper company paid off the victims families nicely enough though. Finally, the workers in this mill are exposed to harmful chemicals such as chlorine that takes a toll over time. Usually, late in life there are massive respiratory problems.

    It's easy to arm-chair quater-back where your news paper comes from, but I for one don't subscribe to anything but online sources. You should too....

    --
    "All great things are simple & expressed in a single word: freedom, justice, honor, duty, mercy, hope." --Churchill