Windows Upgrade, FAA Error Cause LAX Shutdown

Repent, Sinners! by mfh · 2004-09-21 09:49 · Score: 5, Insightful

The recent shutdown of LAX due to an FAA radio outage was apparently caused by a Windows 2000 integration flaw, possibility related to an old Windows 95 bug.

Okay... a Win95 bug leads to the LAX shutdown because the *same* bug was later found in Win2k? Yup, closed source is the answer, Mr. Gates. I hereby repent my sins of Open Source Freedom and agree that security by obscurity is the answer! /sarcasm

a technician didn't reboot the system monthly as he should have

You have to love a system that requires downtime as part of uptime. How many Linux users have this problem? (Please press the Start button to shut down (stop) the computer.)

--
The dangers of knowledge trigger emotional distress in human beings.

Re:Repent, Sinners! by LostCluster · 2004-09-21 09:54 · Score: 3, Insightful

I've seen AIX-based database systems that require an overnight downtime to do reindexing, since non-SQL formats like DBase have always been a little funky when they start having to deal with million-record tables. It's amazing how ugly legacy databases can be compared to today's tech.
Re:Repent, Sinners! by Da+Twink+Daddy · 2004-09-21 09:57 · Score: 5, Funny

You have to love a system that requires downtime as part of uptime. How many Linux users have this problem? (Please press the Start button to shut down (stop) the computer.)

Sure,
init 6
doesn't sound like it should start (initialize) anything...
Re:Repent, Sinners! by Phillup · 2004-09-21 10:05 · Score: 2, Informative

doesn't sound like it should start (initialize) anything

So... it should not initialize (begin) run level 6?

--

--Phillip

Can you say BIRTH TAX
Re:Repent, Sinners! by SoSueMe · 2004-09-21 10:06 · Score: 2, Funny

... the LAX shutdown...

Would that be 'exLAX'?
Re:Repent, Sinners! by (H)elix1 · 2004-09-21 10:08 · Score: 5, Insightful

You have to love a system that requires downtime as part of uptime. How many Linux users have this problem?

All right, I cannot throw the first stone here. I can raise my hand as a AIX C programmer back in the day...

We inherited a huge ball of spaghetti wire, nasty stuff that had memory leaks. Rather than taking the time to fix it, the powers that be determined it was better to keep working on new features rather than hash out the issues. At first it happened once a quarter, then once a month, and as time ticked by a weekly 'fix' to recycle the server. Lord knows I added to the mix as well, as they picked 'cheap' and 'build it fast' (not to be confused with running fast), skipping the entire do it right. That is how it happens... stuff gets rushed before its time. OSS is more immune than the typical commercial gig, but anytime a deadline comes without enough time to finish something is going to give. Downtime is just duct tape.

--
+++ UGUCAUCGUAUUUCU
Re:Repent, Sinners! by pchan- · 2004-09-21 10:21 · Score: 5, Insightful

where do you want to go today?

dear microsoft,

the above question was posed in a line of your advertisements well, after spending an hour and a half on a plane on the runway in oakland, and another hour on the runway in l.a. (sunday night), i think i have the answer. i want to go home. sounds like a simple enough request, or so i thought.

but here is what i really want: i would like you (microsoft, inc.), to stop selling your products to mission critical and infrastructure operations until such a time as they are ready to do so. when my desktop computer at work crashes (admittedly a rare occurance nowadays), i am inconvenienced. when hundreds of thousands of travellers in airports across the world are delayed because one of the busiest airports in the world is shut down due to a 10 year old known bug in your operating systems that has not been fixed, that is simply not acceptable. i realize that buyers of software and IT systems are easily suckered or bribed into using your systems, that is why i am appealing directly to you. please exit this market before we are forced to legislate you out.

thanks,
pc
Re:Repent, Sinners! by claar · 2004-09-21 10:29 · Score: 4, Insightful

Bah, what a cop out. If "we" won't accept criticisms similar to our own, we have no right to criticize in the first place..

Yes, init 6 is counter-intuitive. I remember that it actually did confuse me a bit the first time I heard of it. Does that mean we need to remove or change it? Nah, let 'em use `shutdown -r` or `alias restart="init 6"`. But just don't be an apologist for Linux, it just makes "us" look hypocritical.

--
I'd give my right arm to be ambidextrous...
Re:Repent, Sinners! by 47Ronin · 2004-09-21 10:33 · Score: 4, Insightful

Personally, I use "reboot".

"shutdown -r now" also works (r stands for reboot). To shut down, use -h (for halt).

Personally i use sudo reboot because I would never login as root for security/safety reasons.

--
Those who laugh at you for you having a Mac.. are the people who constantly call you to fix their PC.
Re:Repent, Sinners! by admdrew · 2004-09-21 11:02 · Score: 2, Funny

Personally, I use an axe.

--
LegendMUD
Re:Repent, Sinners! by jurv!s · 2004-09-21 11:17 · Score: 2, Informative

in my labs- users logged in on the console can reboot without sudo. Anything less would be uncivilized!

(ps man console.apps and pam_console)

--
sigs are for fools and trolls. no signature is *always* appropriate. you should turn them off in your preferences.
Re:Repent, Sinners! by Phillup · 2004-09-21 11:34 · Score: 2, Insightful

But just don't be an apologist for Linux, it just makes "us" look hypocritical.

I wasn't apoligizing. It makes perfect sense to me.

Then agian, I have a calculator that you turn off by pressing the "ON" key. ;-)

Seriously tho...

Many devices have a single power button. You push it... thing comes on... push it again... thing turns off.

If anyone should apologize, it is the person that decided on "Start" for the button label.

And, in *nix... init 6 does just what it says it does.

It initializes run level six. Run level six can do anything you want it to do. It doesn't have to shut down the system.

So... WTF would I even have to apologize for? The fact that the parent associates it in his mind with shutting down?

It doesn't shut down... it initializes run level six. If you don't want it to shut down when you init 6... change it.

If you don't want to go to the "Start" button in Windows to shut down... well... that one is your problem. Not mine.

--

--Phillip

Can you say BIRTH TAX
Re:Repent, Sinners! by neura · 2004-09-21 11:36 · Score: 2

Ya know, the day the poster of the first comment has actually READ any of the linked articles before posting, I may just drop dead from surprise. People apparently would rather get thier post in as soon as possible instead of actually READING WTF they're POSTING about. WORST OF ALL: This initial news item should have been moderated, since every non-factual suggestion made (about 75% of the post) is wrong. WHY CAN'T PEOPLE ACTUALLY POST REAL NEWS ANYMORE?!?!
Re:Repent, Sinners! by Hatta · 2004-09-21 11:39 · Score: 4, Funny

Personally i use sudo reboot because I would never login as root for security/safety reasons.

Funny, those are the only reasons I ever log in as root.

--
Give me Classic Slashdot or give me death!
Re:Repent, Sinners! by Awptimus+Prime · 2004-09-21 11:42 · Score: 4, Insightful

You have to love a system that requires downtime as part of uptime. How many Linux users have this problem? (Please press the Start button to shut down (stop) the computer.)

Well, in the past 10 years I have had a number of clients who have had Linux, Unix, Windows, and Mac systems that were critical to their day to day routine and they did nightly/weekly/monthly reboots as part of their maintenance.

I guess when you grow up and get out of high school, you will find that your linux box running as a DSL router is not a good example of a production server.
Re:Repent, Sinners! by Phillup · 2004-09-21 11:42 · Score: 3, Informative

Only if that is what you have run level 6 configured to do.

All the init 6 command does is initialize run level 6. You can have run level 6 configured any way you want.

It isn't hard wired to shut down. (On debian run level 6 does a reboot... run level 0 halts the system.)

--

--Phillip

Can you say BIRTH TAX
Re:Repent, Sinners! by Turmio · 2004-09-21 12:21 · Score: 3, Interesting

You have to love a system that requires downtime as part of uptime. How many Linux users have this problem?
Actually I was hit by the max 497 days uptime bug of Linux 2.4 (and with a desktop machine no less). The box at work did run for about 650 days but anyway well after the mile stone of half way journey for 2nd consecutive uptime reset. Then it was time for me to change rooms. I wasn't at office that day and my co-worker just unplugged the box. Was I pissed or not? Yes I was.
Re:Repent, Sinners! by n3k5 · 2004-09-21 12:22 · Score: 3, Insightful

If anyone should apologize, it is the person that decided on "Start" for the button label.
Originally the button just showed the Windows flag, so it basically the choice of a label was the same as in Gnome and KDE today. However, the average Windows user didn't figure out that this logo isn't just there for decorative purposes, but you actually have to click it in order to accomplish just about anything. So someone had to come up with a short piece of text that clues newbies in, and it worked rather well (in usability tests). 'Start' may not be optimal, but has anyone thought of something better? (Not that is matters anymore.)

--
but what do i know, i'm just a model.
Re:Repent, Sinners! by secolactico · 2004-09-21 12:30 · Score: 4, Funny

Golly gee-whiz, if someone is too stupid to migrate a million-record dBase table to SQL, he only deserves a real good whacking (and a career re-orientation into would you like grits with that ???)...

Most of the time it is not because the inability of the database tech, but the "hey, it's been working so far" attitude of the decision makers.

Maybe the powers that be are allergic to Open Source solutions and commercial databases can be expensive. Maybe the client applications are tied to the current system and porting them would be too expensive (example, POS systems).

I can imagine the conversation:

- "We are closed at night anyway"
- "Yes, boss, but recovering from a failure (knock on wood) can be too difficult in the current system"
- "Well, that's what we are paying you for"
- "Yes, sir. Thank you, sir. Would you like grits with that?"

--
No sig
Re:Repent, Sinners! by Anonymous Coward · 2004-09-21 12:38 · Score: 2, Informative

But just don't be an apologist for Linux, it just makes "us" look hypocritical.
>>I wasn't apoligizing. It makes perfect sense to me....
>>So... WTF would I even have to apologize for? The fact that the parent associates it in his mind with shutting down?
Down boy! Heel!
apologist n. A person who argues in defense or justification of something, such as a doctrine, policy, or institution.
All words that sound vaguely alike don't necessarily mean the same thing.
Re:Repent, Sinners! by valkraider · 2004-09-21 12:53 · Score: 2, Interesting

Our college did batch runs for all sorts of stuff. We only had about 3000 students, but between faculty and staff it worked out to around 5000 people in various systems. Things had to run to calculate and process dorm room phone bills, cafeteria plans, accounts payable and recievable, invoices, transcripts, and DOOM wads...

And we were small. I can only imagine what a big school with 30 to 50 thousand students would need done... Not to mention all the DOOM3 wads nowadays. ;)
Re:Repent, Sinners! by autopr0n · 2004-09-21 12:58 · Score: 4, Insightful

Yes, but maybe that was controlled by a cron-job and not some poor person manually initiating it every night? Just like an automated reboot is also not too scary on any decent Unix, but a manual action in MS-world?

a) This could easily been done as a sheduled task in windows 2000.

b) This could have been done by their code, in windows 2000 and windows 95.

c) Windows 2000 does not require a reboot after 49.7 days. Maybe their software relied on gettickcount() or something.

The problem lays with the developers of the software, not microsoft.

--
autopr0n is like, down and stuff.
Re:Repent, Sinners! by multipartmixed · 2004-09-21 13:06 · Score: 3, Informative

> since non-SQL formats like DBase have always been
> a little funky when they start having to deal
> with million-record tables.

Oh, yes, SQL the magic bullet. I have a database problem! No matter what it is, I can solve it by migrating to a database system which uses SQL!

> It's amazing how ugly legacy databases can be
> compared to today's tech.

Yes, today's tech! SQL, the magic bullet! Why, we should use Oracle! It's SQL and thus must be modern! It's only been around since 1979!

Wait!

1979 was a long time ago.

Oh, dear?

Could it be that Oracle is not modern tech? But, how could it not be? It uses SQL, the magic bullet!

Hint: query language and scalability are not related.
Hint II: RDBMS is no magic bullet, either.

--

Do daemons dream of electric sleep()?
Re:Repent, Sinners! by ckaminski · 2004-09-21 13:50 · Score: 4, Insightful

Thankfully, Chicken Little, planes do NOT fall out of the sky during a total air traffic control outage, but control regresses to pencil and paper.

Your plane *WILL* land. It may be at a different airport, and sooner or later than planned, but you will get on the ground in one piece.
Re:Repent, Sinners! by Awptimus+Prime · 2004-09-21 13:54 · Score: 4, Insightful

and these are heavy used mail servers.. no need to reboot on a nightly basis!! good grief (charley brown)

Right, the code used for mail serving is some of the most mature server code out there. This is far more reliable than say a Linux box set up with proprietary, closed src, business applications with their own bugs.

My feelings are the article may have mistakenly blamed Windows for a problem with one of the server applications running on it. It is not typical for even Win2k to hang unexpectedly when running good hardware and well-written code.

I say fuck it. There is no point in ever trying to defend logic when it stands in the way of the Microsoft bash-fests on /..

Just to clarify, I am not saying Windows servers can and will run as reliably as a properly configured BSD, Solaris, or Linux box. I am just trying state that Windows is reliable, if properly configured, but will probably not win an uptime competition. Big whoop. Reboot your shit during maintenance windows, regardless of OS, you run a much better chance of finding pending hardware failures. It is much better to powercycle that database server and get an error detecting the SCSI bus during a maintenance window than for it to happen at 5:30AM on a Monday or during your vacation.

Then again, I could be overly anal. I just like to avoid the reputations gained by those before me. :)
Re:Repent, Sinners! by pchan- · 2004-09-21 13:58 · Score: 5, Insightful

see what you've done, now i had to go and rtfa just to respond. here's a choice quote:

The servers are timed to shut down after 49.7 days of use in order to prevent a data overload, a union official told the LA Times. To avoid this automatic shutdown, technicians are required to restart the system manually every 30 days.

now, let's do a little math. the number of milliseconds in 49.7 days = (49.7 * 24 * 60 * 60 * 1000) = 4,294,080,000. recognize that number? that's right, it's 2^32 (actually, this is: 4,294,967,296, but it's pretty damn close). and why is that significant, you ask? because at 2^32, the unsigned int used by some versions of windows to keep the time since boot overflows back to zero, and bad things begin to happen.

is the problem microsoft's fault? goddamn right it is. in software that runs A MAJOR AIRPORT and controls the flight control and radar systems that affect thousands of lives in the air, an error like this just not an option. the people who put this system into production ought to be fired. i don't know what the right os for this task is. solaris? aix? vms? something with provable uptime and reliability, something that can deliver uptime of longer than a month and a half, that's for sure.

I'm sure Linux doesn't store time in an infinite bit counter either.

i don't recall advocating linux for the job. maybe it can do it, maybe not. and in regards to being free, when my life is on the line, they better spend every god-damn dollar they can to make sure that critical systems do not fail under any circumstances. microsoft was absolutely the wrong choice in this case.
Re:Repent, Sinners! by dgatwood · 2004-09-21 14:06 · Score: 3, Interesting

As another link in this discussion noted, an unpatched Win2k does, in fact, require a reboot every 47.5 days because a certain process goes nuts and eats 60% of the CPU. The fact that MS has a patch for the problem does not mean that the problem does not exist.

--
Check out my sci-fi/humor trilogy at PatriotsBooks.
Re:Repent, Sinners! by Atzanteol · 2004-09-21 14:15 · Score: 4, Insightful

You don't work with other people much do you? It's probably for the best.

These things cost money. Migrating apps that use the old DB to the new one, testing, bugs introduced in the migration, etc. If it works most companies will stick with it and not risk spending large amounts of money for no 'gain' (in their mind).

--
"Ignorance more frequently begets confidence than does knowledge"

- Charles Darwin
Re:Repent, Sinners! by nathanh · 2004-09-21 14:47 · Score: 3, Insightful

So someone had to come up with a short piece of text that clues newbies in, and it worked rather well (in usability tests). 'Start' may not be optimal, but has anyone thought of something better? (Not that is matters anymore.)

Click Me. Menu. Actions. Tasks. Open Here.
Any of those make more sense than "Start".
Re:Repent, Sinners! by Bush+Pig · 2004-09-21 15:09 · Score: 3, Funny

I'm still having a bit of trouble with the notion that moving from UNIX to Windows was regarded as an upgrade.

--
What a long, strange trip it's been.
Re:Repent, Sinners! by Vinson+Massif · 2004-09-21 15:17 · Score: 3, Funny

Personally I:
% su -
# uname -n
and MAKE SURE I'M ON THE RIGHT MACHINE !!
# shutdown -r 120 'go away!!'

Most system's reboot invoke a `shutdown -r now`.

--
"Remember, any tool can be the right tool." -- Red Green
Re:Repent, Sinners! by GMFTatsujin · 2004-09-21 15:19 · Score: 2, Funny

I'm sure Emilia Airhart said the same thing before she patched her Windows 3.11!
Re:Repent, Sinners! by tulare · 2004-09-21 15:54 · Score: 2, Insightful

Thankfully, Chicken Little, planes do NOT fall out of the sky during a total air traffic control outage, but control regresses to pencil and paper.
Or, more appropriately, to the hands of the pilots, including the one who had to take evasive action. What's glossed here is that a stupid application flaw very nearly did result in serious loss of life. Kudos to the pilot who knew what the fuck to do when the time came.

--
political_news.c: warning: comparison is always true due to limited range of data type
Re:Repent, Sinners! by adamfranco · 2004-09-21 16:16 · Score: 2, Interesting

I personally like "Commence".

Commence writing.
Commence listening to music.
Commence shutdown procedure.

It works for everything!

Usage of "Start" instead of "Commence" probably has something to do with the majority of the population wondering who was graduating when they clicked the button...

--
"When ideology and theology couple, their offspring are not always bad but they are always blind." -- Bill Moyers
Re:Repent, Sinners! by ckedge · 2004-09-21 17:02 · Score: 2, Interesting

> a) This could easily been done as a sheduled task in windows 2000.

Uh, no, no it could not.

Scheduled Tasks in Microsoft Windows have never been reliable. Quite frequently mine have their security credentials "screwed up" somehow and stop working until I notice and "touch" them so I'm forced to re-enter a user/pwd.

I have never EVER heard of Solaris cron failing to run on time.

> and not some poor person manually initiating it every night?

It's windows, you have to have a person present to ensure that the system actually a) goes down b) comes back up as intended.

I've done a half year consulting gig and spent a month walking 5 blocks through the downtown core of San Francisco at 5am every single FUCKING morning to hit the power button on a 4 way 400 MHz $50,000 Compaq windows box at one of the biggest banks in the world. Database held holdings information on around half a trillion dollars in equities.
Re:Repent, Sinners! by Shadowlore · 2004-09-21 23:51 · Score: 2, Interesting

Well, in the past 10 years I have had a number of clients who have had Linux, Unix, Windows, and Mac systems that were critical to their day to day routine and they did nightly/weekly/monthly reboots as part of their maintenance.

I guess when you grow up and get out of high school, you will find that your linux box running as a DSL router is not a good example of a production server.

Yeah they did that to the Linux boxes here, because they didn't know better. Now, with real Linux experts, our Linuxen are not rebooted or taken down for routine maintenance. And no we aren't talking about "DSL Routers". We are talking about systems that process email to the tune of a million message per server per day.

Critical? You bet it is. Merril Lynch, HP, APL, and many others. Planned downtime for "regular maintenance"? Nope. The only time we plan downtime is for hardware replacement/upgrade and kernel upgrades, the occasional (rare) server moves, and full data center shutdowns to perform data center failover verification.

I guess when you grow up and get out of community college, you'll find that running a dormitory quake server is not a good example of a business critical production server. /pointed sarcasm.

--
My Suburban burns less gasoline than your Prius.
Re:Repent, Sinners! by jones948 · 2004-09-22 00:49 · Score: 2, Funny

Click Me. Menu. Actions. Tasks. Open Here.
Any of those make more sense than "Start".

Ah, but then you couldn't tie in a catchy Rolling Stones song with your product launch.

Anyone want to clue them in to scheduled jobs? by FyRE666 · 2004-09-21 09:50 · Score: 3, Insightful

It's obviously lunacy for any company to replace a proven system, which has given years of reliable service with some piece of trash that crashes if left running for over a month. That said, I was under the impression that a simple "at" job could be used on a Windows machine to run a script periodically (at is similar to cron, except far less capable, of course). Such a script could, if I'm not mistaken, be used to reboot the machine. One would think this would be an ideal way to hide the problem very nicely.

We use a similar system to reboot all of our NT servers every weekend to help prevent crashes during the week (doesn't work of course, but still).

--
Code, Hardware, stuff like that.

Re:Anyone want to clue them in to scheduled jobs? by TykeClone · 2004-09-21 09:54 · Score: 3, Interesting

at sucks. Very, very much.
I've got an NT server that would hang after 2 weeks. I set up an at job to restart that service nightly and do not have that problem.
I've also got several linux servers that just plain run (and some NT/2000 servers as well).
That being said, rebooting sometimes does clear up many evils. We have a speakerphone (around 10 years old - no OS) that just wouldn't work one day. After looking at it, I unplgged it and plugged it back in (I rebooted it!) and it worked. No good reason, it just helps.

--
A fine is a tax you pay for doing wrong and a tax is a fine you pay for doing all right.
Re:Anyone want to clue them in to scheduled jobs? by dbottaro · 2004-09-21 09:58 · Score: 5, Informative

Agreed. A well written AT script something like this: Each M T W Th R S Su 12:45 AM shutdown /l /r /y /c
Would do the trick... We have used that exact script for YEARS to nightly reboot a troublesome NT4 BDC at a remote location.
While we knew that this was not a great solution, no one needed to access the server at that time of night. Any right minded IT person should be able to see the flaw in the FAA's logic.

--
Coding my way to the next BSOD!
Re:Anyone want to clue them in to scheduled jobs? by mekkab · 2004-09-21 10:10 · Score: 4, Interesting

It's obviously lunacy for any company to replace a proven system, which has given years of reliable service with some piece of trash that crashes if left running for over a month

What if that proven systen is decaying out from under you? HD's failing, memory going bad... Tell you what, can you get me new boards for an IBM RT pc? I highly doubt it.

What about "olde" mainframes running assembler code? The pool of expertise is drying up... sometimes you need to pitch the hardware.

--
In the future, I would want to not be isolated from my friends in the Space Station.
Re:Anyone want to clue them in to scheduled jobs? by Ann+Elk · 2004-09-21 10:11 · Score: 4, Insightful

It's obviously lunacy for any company to replace a proven system, which has given years of reliable service...

It's obvious you have never toured an ARTCC (Air Route Trafic Control Center). The system that is being replaced was barely hanging together by voodoo and chicken wire. It was designed back in the 60's to handle maybe 1/10th the current capacity. It is in dire need of replacement.

That said, I'm not convinced Windows (or Linux for that matter) is an appropriate OS for an application that practically defines the phrase "mission critical".
Re:Anyone want to clue them in to scheduled jobs? by FyRE666 · 2004-09-21 10:23 · Score: 2, Interesting

What about "olde" mainframes running assembler code? The pool of expertise is drying up... sometimes you need to pitch the hardware.

Yeah but maybe they should have replaced it with something that, you know, actually works...

I'm all for change, but I wouldn't swap my car for a brand new sparkling wheelchair, my haircut for a mullet, or my soul/self respect for a job writing VBScript. It just doesn't seem right, you know?

--
Code, Hardware, stuff like that.
Re:Anyone want to clue them in to scheduled jobs? by LifesABeach · 2004-09-21 10:27 · Score: 2, Interesting

Well, I guess I've seen a first here. The system was 'upgraded' to Windows 2000? The manager that made that decision has done more than any staff member at Bin-Laden University for the Scrambled of Brains.
Re:Anyone want to clue them in to scheduled jobs? by Billy+the+Mountain · 2004-09-21 10:29 · Score: 5, Funny

Each M T W Th R S Su 12:45 AM shutdown /l /r /y /c

We have used that exact script for YEARS to nightly reboot a troublesome NT4 BDC at a remote location.

Does it work on Friday? You might want to check on that...

BTM

--
That was the turning point of my life--I went from negative zero to positive zero.
Re:Anyone want to clue them in to scheduled jobs? by mekkab · 2004-09-21 10:33 · Score: 2, Insightful

The cost of having a trained monkey reboot the system every month for 10 years is probably less than the cost of maintainance on the old hardware.

It makes sense on paper. It doesn't work out when the human element "screws the pooch" (they rarely show you that slide in the powerpoint, do they?!)

--
In the future, I would want to not be isolated from my friends in the Space Station.
Re:Anyone want to clue them in to scheduled jobs? by Dun+Malg · 2004-09-21 10:37 · Score: 4, Funny

Such a script could, if I'm not mistaken, be used to reboot the machine. One would think this would be an ideal way to hide the problem very nicely.
For a real-time application like air traffic control, you really can't automate reboots like that. You need someone standing there to say "crap! crap! crap!" and take the necessary actions when the system decides it doesn't want to reboot properly.*
*even if they don't know what to do, they can at least shout "crap!", which is more than a system stuck at the BIOS screen with an "elbow parity error" can say.

--
If a job's not worth doing, it's not worth doing right.
Re:Anyone want to clue them in to scheduled jobs? by thrills33ker · 2004-09-21 10:49 · Score: 4, Funny

"This wasn't an ARTCC. Besides, the ARTCC's are all on DSR now, and a bunch have URET on top of that."

Well, I'm glad you cleared that up!
Re:Anyone want to clue them in to scheduled jobs? by drinkypoo · 2004-09-21 10:56 · Score: 2, Insightful

I bet I could get you a replacement board for an IBM RT PC. I gave some Model 135s to a guy I used to work with, and I bet he's still got them or knows who has them. Since there's nothing better than a 135 I can't imagine you'd evince any significant dismay over that idea. There's a lot of that kind of crap running around assorted towns where IBM's got offices, like Austin - which is where I got them. I had AOS 4.3 and BSD-4.3-lite... More or less the same thing really.

Er anyway back to the point, you don't replace an old workhorse with a new POS. You get a newer workhorse than the last workhorse, and maybe not even a new one. I'd rather go dig up some Sparcstation 10s with supersparcs in them to replace (for example) your RT PC. Running SunOS or perhaps netbsd, you should be able to port your software from BSD. If you are running AIX on your RT, maybe you'd be better off with an old RS6k, they're available very cheaply. Hell, I once sold a 603e laptop RS6k (thinkpad power series) to a guy for like nine hundred bucks or so. That little bastard would make a better server than your average wintel box, given it was SCSI, assuming that you were replacing an antique.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:Anyone want to clue them in to scheduled jobs? by Anonymous Coward · 2004-09-21 11:26 · Score: 5, Informative

I used to write aviation message handling systems. We migrated from Tru64 (now extinct) to Linux and have had much better: performance, maintainability, hardware support, and reliability.

Of course, the code leap from Tru64 to Linux is quite small, which is the biggest reason why Linux was chosen.

Aviation expects 99.9999% uptime with absolutely no message loss, and we would achieve that with hot-standbys and MySQL mirroring. All circuits were split and would simultaneously enter both servers. Only the primary server would route the message.

No, we didn't require the customer to reboot. The system could run for years at a time.

Putting mission critical applications on Windows 95 is just plain stupid.
Re:Anyone want to clue them in to scheduled jobs? by 0x0d0a · 2004-09-21 11:39 · Score: 2, Funny

Aviation expects 99.9999% uptime with absolutely no message loss, and we would achieve that with hot-standbys and MySQL mirroring.

Yes, that was a jab at you, Postgres fans. ;-)

--
May we never see th
Re:Anyone want to clue them in to scheduled jobs? by agallagh42 · 2004-09-21 12:02 · Score: 3, Informative

"Since when does Windows 2000 include a "shutdown" command?"

Uh, since about 2000 I believe.:)
C:\>shutdown /? Usage: shutdown [-i | -l | -s | -r | -a] [-f] [-m \\computername] [-t xx] [-c "c omment"] [-d up:xx:yy] No args Display this message (same as -?) -i Display GUI interface, must be the first option -l Log off (cannot be used with -m option) -s Shutdown the computer -r Shutdown and restart the computer -a Abort a system shutdown -m \\computername Remote computer to shutdown/restart/abort -t xx Set timeout for shutdown to xx seconds -c "comment" Shutdown comment (maximum of 127 characters) -f Forces running applications to close without warning -d [u][p]:xx:yy The reason code for the shutdown u is the user code p is a planned shutdown code xx is the major reason code (positive integer less than 256) yy is the minor reason code (positive integer less than 65536) C:\>

--
Carpe Cerevisi - Seize the Beer
Re:Anyone want to clue them in to scheduled jobs? by agallagh42 · 2004-09-21 12:35 · Score: 2, Informative

"Nope. Windows 2000 server:
C:\>shutdown /?
'shutdown' is not recognized as an internal or external command, operable program or batch file."

Well, you have to install the resource kit tools. You wouldn't want everything installed by default would you?

--
Carpe Cerevisi - Seize the Beer
Re:Anyone want to clue them in to scheduled jobs? by DarkVader · 2004-09-21 13:53 · Score: 2, Interesting

A nightly reboot seems like a sledgehammer approach to me.

I've got a script that pings my upstream router every 10 minutes. If it misses a ping, it waits 30 seconds and tries again. 2 missed pings, and it power cycles my DSL router, using an activehome box and an x10 appliance module.

And the lesson is... by jcr · 2004-09-21 09:50 · Score: 2, Insightful

Don't use this stuff in mission-critical applications.

-jcr

--
The only title of honor that a tyrant can grant is "Enemy of the State."

Re:And the lesson is... by LostCluster · 2004-09-21 09:56 · Score: 2

"This stuff" being all of IT. HDs will fail within 5-7 years no matter what OS you put on them...

Good IT is so hard to pull off because you have to convince people that events that strike once every few years have to be prepared for otherwise a disruption in service will occur.
Re:And the lesson is... by Dun+Malg · 2004-09-21 10:54 · Score: 2, Insightful

Good IT is so hard to pull off because you have to convince people that events that strike once every few years have to be prepared for otherwise a disruption in service will occur.
Like the PHB at the office where my wife works said after announcing that the IT guy was to be laid off and not replaced: "I don't see why we need an IT guy-- we never have any computer problems" (cluebat time!)

--
If a job's not worth doing, it's not worth doing right.

"Upgrade"? by thelenm · 2004-09-21 09:50 · Score: 5, Funny

"Upgrade" from Unix to Windows, eh. You keep using that word. I do not think it means what you think it means.

--
Use Ctrl-C instead of ESC in Vim!

Re:"Upgrade"? by upsidedown_duck · 2004-09-21 12:09 · Score: 3, Insightful

It depends on how bad their previous UNIX system was. Any operating system can be neglected into oblivion. Also, if they got all new hardware to run Windows 2000, when the old hardware might have been ten-year-old 50MHz SMP boxes, then upgrade would be the right term. It's unfortunate that they didn't decide to upgrade to faster UNIX boxes, but that's politics for you.

--
-- "Makes Little Debbie look like a pile of puke!" - Moe Szyslak

Why is the FAA using off the shelf software? by Samir+Gupta · 2004-09-21 09:51 · Score: 4, Informative

This is not an attack on Microsoft.

But most off the shelf software have disclaimers expressly stating they are not to be used in mission critical situations. Eg:

"technology is not fault tolerant and is not designed, manufactured, or intended for use or resale as on-line control equipment in hazardous environments requiring fail-safe performance, such as in the operation of nuclear facilities, aircraft navigation or communication systems, air traffic control, direct life support machines, or weapons systems, in which the failure of Java technology could lead directly to death, personal injury, or severe physical or environmental damage."

--
-- Samir Gupta, Ph. D. Head, New Technology Research Group, Nintendo Co. Ltd., Kyoto, Japan.

Re:Why is the FAA using off the shelf software? by pyro101 · 2004-09-21 09:59 · Score: 2, Informative

I don't know about using windows 95, but here at the nuclear facility that I work at we use not only Java but also windows. Have been using windows for some time and have to use java because that is the way Oracle is going. We have more problems with hardware issues then with the off the shelf software , but no matter what problems we get from any of it we as software developers are supposed to anticipate it and prove that we can, within reason catch the user/machine/other devices before screwing stuff up. But most of all we go through huge testing on any small addition or change to the code base, even changing color on menus requires a 10-20 signitures (never know what else could have been added on accident).

What?! by ottergoose · 2004-09-21 09:51 · Score: 5, Funny

I thought switching to Windows from *nix saved time, money, and hassle! Haven't you guys seen those banner ads here?

Re:What?! by drew · 2004-09-21 10:41 · Score: 2, Informative

Funniest thing is that was actually the ad i saw when i read one of the linked articles :)

--
If I don't put anything here, will anyone recognize me anymore?
Re:What?! by tool462 · 2004-09-21 10:48 · Score: 2, Funny

Nope. :)

I Hate to Say It by DarkKnightRadick · 2004-09-21 09:51 · Score: 2, Insightful

But I'm going to.

It's M$'s fault. Why do I hate to say it? Because it'll just be seen as more anti-MS crap from another /.er.

All I have to say is if the shoe fits, wear it.

In this individual case a PHB made a decision to scrap the old, stable OS to a new, known-to-be-unstable OS. That screams PHB.

--
"There is a way that seems right to a man, but its end is the way of death." Proverbs 16:25 (NKJV)

Re:I Hate to Say It by multimed · 2004-09-21 09:58 · Score: 4, Funny

No way is it Microsoft's fault. It even says so in their EULA...
I'm still amused & suprised the poster left off the quotes as in "upgrade" from Unix to Windows.

--
Vote Quimby.
Re:I Hate to Say It by AstroDrabb · 2004-09-21 12:15 · Score: 4, Informative

Funny, no where in the doc for GetTickCount() does it say it is deprecated and not to use it. The only thing it does say is "If you need a higher resolution timer, use a multimedia timer or a high-resolution timer." I don't know what the program needs since I did not write it nor have I seen the code. Maybe they didn't need a high-res timer and wanted a tick count for how long the system has been up? I don't think that is too much to ask from on OS.
The GetSystemTimeAsFileTime() function retrieves the current system date and time. The information is in Coordinated Universal Time (UTC) format. It doesn't tell you how long the system has been up.
Oh, and if MS did not think this is a problem why did they fix it in a WinNT service pack? Also, right in that link MS says
Microsoft has confirmed that this is a problem in Windows NT 4.0 and Windows NT Server 4.0, Terminal Server Edition. This problem was first corrected in Windows NT 4.0 Service Pack 4.0 and Windows NT Server 4.0, Terminal Server Edition Service Pack 4.

MS also didn't seem to fix it in Win2000 Server and their own engineers got hurt by it, specifically with Rpcss.exe which according to MS
SYMPTOMS
The Rpcss.exe process consumes 60 percent or more of CPU time, and system performance and network performance are affected. This symptom typically occurs 49.7 days after the server is started.
CAUSE
This problem occurs because a call to the GetTickCount timer function causes the function to overflow 49.7 days after the server is started.
If GetTickCount is "deprecated" as you state, why in the world is MS's own programmers using it in rpcss.exe? According to this site
rpcss.exe is an executable of Microsoft Windows Opearting System. It is reponsible for Remote Procedure Call services on the local machine. These are public services available to the local network. This program is important for the stable and secure running of your computer and should not be terminated.

Still not convinced and want to appologize for MS? Well here are some more of MS's software that are affected by it in Windows 2000 servers (what this FAA project is using).
Print Spooler Stops Scheduling Print Jobs
The Print Spooler service may stop scheduling print jobs to specific Simple Port Monitor (SPM) ports. Although incoming jobs are queuing into the spooler, print jobs may not start. Note that this symptom occurs 49.7 days after you start the Print Spooler service.

There are a bunch of MS apps affected by this logic flaw that has been passed from version to version of MS OSes. If this flaw affected all these MS developers who have far more access to proprietary docs, I don't see how other developers would not stumble over it as well since they do not have access to the proprietary OS.

--
If Tyranny and Oppression come to this land,
it will be in the guise of fighting a foreign enemy. -James Madison
Re:I Hate to Say It by Christopher_G_Lewis · 2004-09-22 03:59 · Score: 2

From the SDK (bold face by me):
GetTickCount

GetTickCount

The GetTickCount function retrieves the number of milliseconds that have elapsed since the system was started. It is limited to the resolution of the system timer. To obtain the system timer resolution, use the GetSystemTimeAdjustment function.

DWORD GetTickCount(void);

Parameters
This function has no parameters.
Return Values
The return value is the number of milliseconds that have elapsed since the system was started.

Remarks
The elapsed time is stored as a DWORD value. Therefore, the time will wrap around to zero if the system is run continuously for 49.7 days.

If you need a higher resolution timer, use a multimedia timer or a high-resolution timer.

To obtain the time elapsed since the computer was started, retrieve the System Up Time counter in the performance data in the registry key HKEY_PERFORMANCE_DATA. The value returned is an 8-byte value. For more information, see Performance Monitoring.
...

So the offical SDK tells you *not* to use GetTickCount for uptime, but to use HKEY_PERFORMANCE_DATA.

Just a case of RTFM, for all parties involved, including the Microsofties...

--
www.christopherlewis.com

A hit for the other team... by LostCluster · 2004-09-21 09:52 · Score: 3, Interesting

When a ball drops on a baseball field at the midpoint between two positions, it's scored a "hit" for the opposition rather than an "error" against either player. Still, a hit for the other side is a bad thing for the entire team.

This mess was big enough that there's a large enough supply of blame to give some to everybody involved.

- No system should require a manual reboot on a regular basis... there should at least be a script capable of accomplishing that. But somehow, one got implemented. Blame whoever bought it.
- Windows shouldn't have had a faw that required monthly reboots. Blame Microsoft.
- Somebody should have done the reboots like they were told to. Blame that poor smuck.

Bottom line is that everybody's at fault because had any one piece in the chain done their job properly the failure wouldn't have happened, but a cascade of mistakes lead to the ball hitting the grass instead of a glove.

Re:A hit for the other team... by PPGMD · 2004-09-21 09:59 · Score: 5, Insightful

The patriot missile system had a similar problem. It's timing broke down after a period of time without a reboot (it was a much shorter cycle, either one day or one week).
Microsoft isn't the only one to have issues like that. But it has been patched and there should have been more than enough time for the FAA to test and deploy the patch on the few legacy machines running Windows 95.
I simply blame the FAA for wasting money away every year, billions are sunk into the system, but rarely does anything come out of it, Lockheed can deploy a complete new system to every airport for the amount of money that is being dumped into the old TRACONs and towers for MX.
Re:A hit for the other team... by oGMo · 2004-09-21 10:00 · Score: 2, Insightful

Bottom line is that everybody's at fault because had any one piece in the chain done their job properly the failure wouldn't have happened, but a cascade of mistakes lead to the ball hitting the grass instead of a glove.

An error is scored against a player if the player is determined to have been negligent in their position according to the rules. If someone hits a line drive right past the first baseman, it's still a hit. If the first baseman catches it, then drops it instead of making a tag, it's an error.

If multiple players are negligent, then multiple errors are scored. We've all seen "blooper" videos where there are cascading errors; one guy drops a catch, throws it to the next guy who drops it in turn, etc.

This is what happened here; it's not a hit, it's a cascade of errors. Everyone is to blame, because they all did something stupid. That doesn't make it "OK," it doesn't make any particular party less at fault.

I don't think this contradicts what you're saying here, I just wanted to emphasize the point. ;-)

--
Don't think of it as a flame---it's more like an argument that does 3d6 fire damage
Re:A hit for the other team... by LostCluster · 2004-09-21 10:05 · Score: 2, Insightful

If multiple players are negligent, then multiple errors are scored. We've all seen "blooper" videos where there are cascading errors; one guy drops a catch, throws it to the next guy who drops it in turn, etc.

Only one error can be scored per base advanced by the runner, and if the runner took first by a "hit" before the errant throw, then there is only one "error" for his advancement to second. If two players crash into each other and the ball drops, it's usually a hit because it's hard to say either would have been able to make the catch "with normal effort" which is the real standard for an error.

Heh by GypC · 2004-09-21 09:52 · Score: 3, Insightful

upgrade from Unix to Windows

AKA, "The PHB Special"

Of course, the guy who was supposed to reboot the box will get all the blame. Shit rolls downhill.

Re:Heh by Nuclear+Elephant · 2004-09-21 09:55 · Score: 4, Funny

It's an upgrade because it helps to create thousands of jobs for full-time system power cycling engineers.
Re:Heh by Michael+Woodhams · 2004-09-21 11:42 · Score: 5, Informative

There is a rather more extreme case of this with the FAA - when first deployed, the cargo doors of the DC-10 were unsafe, with a failure mode that was likely to make the plane uncontrolable in flight.

This occured in flight, and through luck (which allowed some degree of control) and extraordinary airmanship, the plane was landed safely. (This is known as "The Windsor Incident.")

McDonnell-Douglas didn't want to do a proper redesign of the door mechanism, and the FAA head was a 'companies know best' political appointee, so the result was McD added little windows to the door so that the guy closing the door could look to see it had all engaged properly. (This was over vigourous opposition by the NTSB, who recognized the inadequacy of the fix.)

The situation: A single failure (not looking, or looking but not noticing an unsafe condition) by a non-safety trained close to minimum wage employee could cause the deaths of hundreds of people.

Result: over 300 dead when a Turkish Airlines DC-10 crashed near Paris. The guy who closed the door hadn't even been told he was supposed to check the little windows.

Safety critical systems must be tolerant of human error. If a single omission by a human leads to a hazardous situation, this is primarily the fault of the system, not the human.

--
Quattuor res in hoc mundo sanctae sunt: libri, liberi, libertas et liberalitas.
Re:Heh by InfiniteWisdom · 2004-09-21 11:49 · Score: 4, Funny

Surely you mean Microsoft Server Cycling Engineers (MSCE)
Re:Heh by Michael+Woodhams · 2004-09-21 15:38 · Score: 2, Informative

Once (Chicago O'Hare, c1980.) Due to faulty maintenance procedures (now discontinued), lack of locking on slats (now fixed) and engine-out-on-takeoff procedures that sacrificed air speed for altitude.

There are three DC-10 crashes (that I can think of off hand) that could reasonably be blamed at least partially on the design of the plane: we've mentioned two (Paris, Chicago). The third is Sioux City, where an uncontained engine failure in cruise disabled all three hydrolic systems. The plane crash landed with (from memory) about 110 deaths and 180 survivors.

Other planes of similar size and age (Lockheed L1011 tristar, 747) had four hydrolic systems. Had the DC-10 had four *and* (that is a big 'and') the fourth had not been disabled, it is unlikely there would have been any deaths. (A 747 once had 3 out of 4 hydrolic systems disabled on takeoff, and landed safely.)

In terms of safety, I'd be more worried about any model of airplane less than a few years old than I'd be about a well maintained DC-10. Let other people find the surprises first.

--
Quattuor res in hoc mundo sanctae sunt: libri, liberi, libertas et liberalitas.

Heather Locklear by Billy+Donahue · 2004-09-21 09:53 · Score: 4, Funny

To the rescue!
http://www.nbc.com/LAX/

--
-- The Funk, The Whole Funk, And Nothing But The Funk

Uprgrade from UNIX to Windows.. by Anonymous Coward · 2004-09-21 09:53 · Score: 4, Funny

"This happened after an upgrade from Unix to Windows."

Thats the funniest thing I heard all day. Windows is an upgrade from unix. I almost choked on my coffee.

Re:Uprgrade from UNIX to Windows.. by Mateito · 2004-09-21 10:11 · Score: 4, Funny

I almost choked on my coffee.
Try preparing the coffee with some sort of liquid. I recommend water.
You don't get the instant caffiene high like you do with chewing the beans*, but it does go down easier**
*Yes, I do this. Chocolate coated coffee beans rock
**Unless its Starbucks, which needs shot of snotberry flavoring to make it tolerable.

--
Norman Cook's Ode to Sl

humans rule by Doc+Ruby · 2004-09-21 09:53 · Score: 3, Insightful

It is human error: those bugs didn't write themselves. Nor did the operations protocol that required "rebooting LAX" every 49.69(!) days. Nor did the upgrade procedure that ignored that bottleneck. Nor did the upgrade decision that moved from Unix to Windows. Those were all human errors, as was the decision to keep a job at LAX that would face blame for shutting down the airport (or risking lives) if the reboot was missed, or unsuccessful.

"Not I," says the referee,
"Don't point your finger at me.
I could've stopped it in the eighth
An' maybe kept him from his fate,
But the crowd would've booed, I'm sure,
At not gettin' their money's worth.
It's too bad he had to go,
But there was a pressure on me too, you know.
It wasn't me that made him fall.
No, you can't blame me at all."
- Bob Dylan, "Who Killed Davey Moore?"

--

--
make install -not war

integration flaw exposed: by overbom · 2004-09-21 09:54 · Score: 3, Funny

sleep 4294080
shutdown /s

Re:Why not automate it? by Embedded2004 · 2004-09-21 09:54 · Score: 2, Informative

Well, if it is running windows, and somehow someone made a mistake and desided to run it on some mission critical system, they should reghost it as often as they can.

Windows has an odd tendancy to corrupt it self.

Why 49.7 days? by FirstTimeCaller · 2004-09-21 09:56 · Score: 4, Informative

Because there are 4294080000 millisconds in that time period. Just enough to cause a roll-over when using a 32 bit counter (and yes, 49.7 is an approximate value).

Very few Win95 systems ever made it that long without a reboot... but you would've thought that it would've been fixed by Windows 2000.

--
Wanted: witty unique signature. Must be willing to relocate.

Re:Why 49.7 days? by Holi · 2004-09-21 10:02 · Score: 4, Informative

It was this issue has nothing to do with the Win95 bug, It was just the submitters opinion (which happens to be very wrong)

--
Sorry, teleporters just kill you and then make a copy. A perfect, soul-less copy.
Re:Why 49.7 days? by PhrostyMcByte · 2004-09-21 10:15 · Score: 5, Insightful

It sounds to me like an application they were running was badly designed to use GetTickCount() as a long-term counter. If so, it's not Win2k's fault.
Re:Why 49.7 days? by caluml · 2004-09-21 10:26 · Score: 2, Insightful

I think they solved it by Windows 98 - however, maybe there is an old app running on said Windows 2000 server that uses 32 bit milliseconds. Come on guys - we're going to get nowhere by harping on about issues that were fixed years ago. If we stand still, and laugh, Windows is going to sneak up, and run past.

--
Get your own free personal location tracker
Re:Why 49.7 days? by AK+Marc · 2004-09-21 10:33 · Score: 5, Informative

and yes, 49.7 is an approximate value

The exact value is 49 and 59,929/84,375 days, or 49 days, 17 hours, 2 minutes, and 47.296 seconds (exact).
Hey, news for nerds, what did you expect...

--
Learn to love Alaska

Before the torrent of "windows sucks" posts... by rasafras · 2004-09-21 09:56 · Score: 3, Insightful

...keep in mind that we have established numerous times that windows is not suitable for systems that need reliability and stability. It is not the operating system's fault that this happened, it is the FAA's for choosing to use it instead of considering the better alternatives. If you get run over on a bicycle while riding on the highway, don't blame the bike.
Quick addition: it seems that the fault does not belong entirely to windows, but rather a combination of the software running on it and the system architecture.

With that said, Windows could stand to improve a lot. It has too many bugs, too many flaws, and so on. And it definitely does not have a stable, secure, reliable base. So don't expect it to.

--
webpage

They said Windows 98 or Better by www.sorehands.com · 2004-09-21 09:56 · Score: 4, Funny

So I installed Linux.

--
Fight Spammers!

Now even the submitters aren't reading the article by Holi · 2004-09-21 09:57 · Score: 2, Insightful

From the submission
possibility related to an old Windows 95 bug

From the Article.
The shutdown is intended to keep the system from becoming overloaded with data and potentially giving controllers wrong information about flights, according to a software analyst cited by the LA Times.

The shutdown is not a crash but a scheduled event to bring the servers down to flush data.
So it does not seem to be a problem with Windows (Ok now I get marked as troll) but with the FAA's own software.

--
Sorry, teleporters just kill you and then make a copy. A perfect, soul-less copy.

32 bit timer by charnov · 2004-09-21 09:57 · Score: 5, Interesting

This old error was from the use of a 32 bit 1 ms increment timer (comes out to 49.7 days until rollover). AFAIK, this was fixed in Win2k and above when the timer got bumped to 64 bit. Maybe whoever set up LAX was using some ancient legacy middleware that used the old timer. This is just bizarre. In both locations that I have worked the last three years, none of the Win2k or Win2k3 servers went down ever. Sounds like bad consultants.

--
[RIAA] says its concern is artists. That's true, in just the sense that a cattle rancher is concerned about its cattle.

Re:32 bit timer by Draknor · 2004-09-21 10:06 · Score: 4, Informative

Parent is right - its not a bug in Windows itself, but rather a piece of software running on Windows - from (one of the)FA's:

Richard Riggs, an advisor to the technicians union, said the FAA - the American aviation regulator - had been planning to fix the program for some time. "They should have done it before they fielded the system," he said.

(emphasis added)
Re:32 bit timer by djwolf · 2004-09-21 10:34 · Score: 3, Informative

The timer has not been incremented to 64bit. The reason is for api compatibility it hasn't been changed. Microsoft does give you some warning though:

GetTickCount

The GetTickCount function retrieves the number of milliseconds that have elapsed since the system was started. It is limited to the resolution of the system timer. To obtain the system timer resolution, use the GetSystemTimeAdjustment function.

DWORD GetTickCount(void);

Parameters
This function has no parameters.
Return Values
The return value is the number of milliseconds that have elapsed since the system was started.

Remarks
The elapsed time is stored as a DWORD value. Therefore, the time will wrap around to zero if the system is run continuously for 49.7 days.

If you need a higher resolution timer, use a multimedia timer or a high-resolution timer.

To obtain the time elapsed since the computer was started, retrieve the System Up Time counter in the performance data in the registry key HKEY_PERFORMANCE_DATA. The value returned is an 8-byte value. For more information, see Performance Monitoring.

Example Code
The following example demonstrates how to use a this function to wait for a time interval to pass. Due to the nature of unsigned arithmetic, this code works correctly if the return value wraps one time. If the difference between the two calls to GetTickCount is more than 49.7 days, the return value could wrap more than one time and this code will not work; use the system time instead.

DWORD dwStart = GetTickCount(); // Stop if this has taken too long
if( GetTickCount() - dwStart >= TIMELIMIT )
Cancel();
Example Code
Note that TIMELIMIT is defined as the time interval of interest to the application, in milliseconds.

Requirements
Client: Requires Windows XP, Windows 2000 Professional, Windows NT Workstation, Windows Me, Windows 98, or Windows 95.
Server: Requires Windows Server 2003, Windows 2000 Server, or Windows NT Server.
Header: Declared in Winbase.h; include Windows.h.
Library: Use Kernel32.lib.

--
---- I like compilers

Check out this little pile of bullshit by Trailer+Trash · 2004-09-21 09:58 · Score: 5, Interesting

The system offers unprecedented voice quality, touch-screen technology, dynamic reconfiguration capabilities to meet changing needs, and an operational availability of 0.9999999

Okay, bullshit. If I have to reboot a server every month, .0000001 of a month is- oh, let's be generous and only count months with 31 days- about .26 seconds. That's a damned fast boot time for Win2K.

Maybe they left off a percent sign?

--
Do you have ESP?

Re:Check out this little pile of bullshit by k4_pacific · 2004-09-21 10:02 · Score: 2, Insightful

"Maybe they left off a percent sign?"

Or maybe there's some kind of failover to a backup system (Which they also forgot to reboot)?

--
Unknown host pong.
Re:Check out this little pile of bullshit by larien · 2004-09-21 10:23 · Score: 4, Informative

Welcome to planned vs unplanned downtime; in many cases, a 10 hour outage can still give you a 100% availability if you planned that outage. What they're probably quoting is 0.0000001 unplanned downtime.
Lies, damned lies and availability stats...

We used to joke by multiplexo · 2004-09-21 09:59 · Score: 3, Interesting

that no one would ever run into the 49.7 day bug on a Windows system because the chances of having that much uptime were slim to none. Having a system where you know that things are broken and you have to reboot it every 30 days to keep it from breaking down is a bad thing, deploying such a system into a production environment is even worse (but it's been done, I don't know how many times I wrote cron jobs to kill bad pieces of software and restart them) but deploying such a system in an environment where lives are at stake is completely inexcusable, regardless of whether or not it is closed or open source. This is similar to having a circuit in your house that overheats because occasionally too much load is placed on it. The idiot solution is to reset the breaker when it trips, the correct solution is to put in a bigger circuit that can handle the peak load. This vendor provided the idiot solution to this problem and should be punished for it, this never should have been deployed, I can only hope that they won't blame the technician for failing to do something that he wouldn't have had to do if the system had been designed properly.

I also love the statement that the system was upgraded from UNIX to Windows. Isn't this kind of like upgrading from being in very good health but not being good looking to being somewhat good looking but suffering from cancer, AIDS and heart disease?

--
cheap labor conservatives - they want to keep you hungry enough to be thankful for minimum wage.

49.7 days by k4_pacific · 2004-09-21 09:59 · Score: 5, Funny

I remember back when that bug was announced. Seems it was at least a couple of years after Windows 95 had been out. I guess they had to work through a lot of other bugs to get Windows 95 to make it long enough for this bug to occur.

--
Unknown host pong.

Flaw left unfixed for too long? by Astro-pilot · 2004-09-21 10:01 · Score: 2, Interesting

Was the flaw left unfixed for too long because they did not have access to the source code? Or was it because it was too expensive? If this is such a critical system that it can cause loss of life (on a massive scale, no less), the root cause should have been fixed, rather than the workaround. I remember reading somewhere that this flaw has now been fixed. Smells like a managerial issue within the FAA, not just a technician problem. Remember NASA and the space shuttles?

Re:Migration by legirons · 2004-09-21 10:01 · Score: 5, Funny

"Why did they move from Unix to Windows in the first place?"

Maybe they didn't want to have to reboot on January 19, 2038

Re:In a related story by databank · 2004-09-21 10:01 · Score: 2, Interesting

Actually there's a lot of truth to that..I once flew in an airliner overseas which had the tv screens built into the back of the seat in front of me.

In the middle of the movie, the screen did the classic "blue screen of death" and rebooted with the Windows logo. There were quite a few chuckles in the aircraft when the movie was restarted and then the jokes started flying about the plane running on Microsoft Windows....(uh..oh..we're going to crash!..no wait, that's just Microsoft Windows)

Re:Ahh yes... by Qeyser · 2004-09-21 10:02 · Score: 2, Insightful

Moreover: why do you have a critical system that hasn't been patched in over five years?

Check the date on that news.com article linked in the main story -- it's from March of 1999. The bug is that old, and as I recall the fix didn't take that long to get out.

If LAX was trying to upgrade to/integrate win2k with ancient, unpatched Win95 systems, its no wonder that they're having problems . . .

-Q

Don't be so hasty to blame the OS... by Ann+Elk · 2004-09-21 10:03 · Score: 5, Insightful

OK, I know it's violation of /. policy to actually read a referenced article. My bad. But, according to the software.silicon.com article:

Richard Riggs, an advisor to the technicians union, said the FAA - the American aviation regulator - had been planning to fix the program for some time. "They should have done it before they fielded the system," he said.

This sounds to me like more of a problem with the application, not the OS. The "system" crashed after 49.7 days, which is about 4 million seconds, which is about 4 billion milliseconds, which is (obviously) MAX_ULONG. I suspect the application is using a ULONG to store a timeout value and got pissed-off when it rolled over.

49.7 Days - A New Record for Windows 95! by akiy · 2004-09-21 10:05 · Score: 3, Funny

I believe the 49.7 days of uptime for a Windows 95 box is a new record, shattering the previous record in Norway of 27.9 days back on January through February of 2001. Congratulations!

--

--
http://www.aikiweb.com - AikiWeb Aikido Information

windows update anyone? by roadrunnerro · 2004-09-21 10:06 · Score: 3, Insightful

and office update while you're at it too...

Wouldn't want to spoil a nice MS bashing session, but I think the bug was in the ported application, not in the OS - probably someone used the wrong data type to hold timestamps somewhere within the program (win95 had the same bug) - I've seen win2k last more than 47 days without reboots...

Lessions from other Aviation Authorities by MosesJones · 2004-09-21 10:06 · Score: 5, Interesting

I worked for around 5 years in Air Traffic Control projects, both in delivery of radar processing and displays and in R&D for next generation systems.

Let me give you an overview of the failure approach of just one of those systems.

1) Everything on Unix, ruggedised releases of UNIX

2) Every box must be able to FAIL ON ITS OWN

3) Every box must have a direct replacement, or replacements, which carry the SAME LOAD.

4) ZERO total system downtime allowed, partial systems failures are allowed, but core systems must keep running.

5) 5 stages of power supply failure, double mains, double generation and lastly a great big warehouse of car batteries if all else fails.

6) 4 Years of testing of FULL system before live.

This is what is normal when safety is the primary concern. What the FAA decision sounds like is a cost driven process which chose the cheapest solution that "could" meet the requirements.

The idea of a safety critical (if it fails people could die) system that requires a reboot is fine in only one case... if it can be non-operational on a regular basis, in which case it should be done EVERY non-operational window (say every week) , this is therefore okay for some hospital scanners that are certified for 12 hour runs. Its not okay for a 24/7 system that controls objects flying around at 500 miles an hour.

Welcome to the US... we will be landing slightly quicker than expected.

--
An Eye for an Eye will make the whole world blind - Gandhi

depends by Tsiangkun · 2004-09-21 10:06 · Score: 2, Insightful

I think it depends on what the company rep said when they convinced them to replace Unix with Windows.

If they advertised a consumer OS as an OS suitable for mission critical applications . . . then this flaw should not be in the software. It's could the software companies fault for agressively marketing their product where it should not be.

Maybe we should throw some blame to the PHB who ordered the switch. Purhaps there was no hard sell from MS, and a PHB saw a product brochure and got a hard on to switch.

I see your point though, the tech knew about the problem and failed to do his job.

I guess my question is, should the problem have been addressed before now, or is it common practice to wait for a catastrophic success like this to occur before addressing the problem ?

Re:Uhm, THE TECHNICIAN by Kwil · 2004-09-21 10:07 · Score: 2, Funny

Are you kidding?

Think about it.. the Tech managed to keep Windows up and running for almost 50 days. The guy's a hero!

--

That Jesus Christ guy is getting some terrible lag... it took him 3 days to respawn! -NJ CoolBreeze

Seen this week at various airports by whoever57 · 2004-09-21 10:07 · Score: 4, Interesting

This week, while flying, I saw:
1. Windows-based terminal used by the public to print tickets (I think) with a "you have chosen to download a file, what do you want to do with it: save, open" or similar (I don't recall the exact wording).

2. A windows-based machine that was part of the baggage scanning setup at Chicago-O'Hare going through a scandisk process. OK, this may have been due to operators turing the machine off using the power switch, but should not such a machine use a read-only boot drive/partition?

Do you feel more secure?

--
The real "Libtards" are the Libertarians!

No proof the old system was stable. by rdunnell · 2004-09-21 10:07 · Score: 2, Insightful

A system running UNIX doesn't necessarily mean it was stable. It could have all sorts of flaws in the code, hardware failures, etc.

Sure, Windows 95 in particular and Windows in general is often less stable than modern counterparts. But an upgrade from an old, obsolete UNIX to a new Windows system could have had significant benefits and made a lot of sense at the time. Without the full information behind the decision, how can you judge whether the decision was bad or not?

no such thing as a Windows 2000 49.7 day bug by art123 · 2004-09-21 10:08 · Score: 4, Informative

There is no such thing as a Windows 2000 49.7 day bug that causes an OS problem.

The problem here is the software made by Harris does not handle a rollover of the GetTickCount() function turning back to 0. This function counts the number of milliseconds since the OS was last booted so it should be obvious to anybody that the returned unsigned 4 byte integer cannot go on forever.

So the badly written Harris software has this bug and their solution (which was really not that bad of a work around) was to manually reboot the system every 30 days, but as a fail-safe, they had a scheduled task to do a reboot on the 49th day just in case. The 49th day came because of procedural error.

There is nothing Microsoft could do to prevent this.

Re:no such thing as a Windows 2000 49.7 day bug by Ahnteis · 2004-09-21 13:05 · Score: 2, Insightful

"There is nothing Microsoft could do to prevent this."

But this is slashdot so we won't let little things like facts get in the way of a good MS bashing session.
Re:no such thing as a Windows 2000 49.7 day bug by SuiteSisterMary · 2004-09-21 13:29 · Score: 2, Insightful

Nonsense. That would be like saying 'warning: you're taking a step, and might trip.'

Typing naught but 'GetTickCount()' into Google lands me right onto the MSDN page and clearly says:

The elapsed time is stored as a DWORD value. Therefore, the time will wrap around to zero if the system is run continuously for 49.7 days.

and goes on to suggest alternative timing capabilities.

This was a major fuckup by the application programmers, incorrectly using a clearly defined API call.

--
Vintage computer games and RPG books available. Email me if you're interested.

An urban legend... by eddy · 2004-09-21 10:08 · Score: 2, Insightful

.. is what I'm going to consider this for the time being. I've seen it reported everywhere, but it's just too absurd to take at face value.

--
Belief is the currency of delusion.

I don't feel redeemed, I feel cheated... by jbwolfe · 2004-09-21 10:08 · Score: 2, Informative

Hey, I submitted this two days ago. What makes it slashdot worthy now?

--
Have you ever noticed that anybody driving slower than you is an idiot, and anyone going faster than you is a maniac?

You insensitive clod by rutledjw · 2004-09-21 10:09 · Score: 4, Interesting

As a PHB, I resemble that remark! Clearly you do not appreciate the fine art which is combining management and technical decision-making. Neither does my parent corp.

I have the distinct, but sadly not unusual, pleasure of watching my company execute a brilliant strategy of:

Outsouring Data Center Operations (systems that used to down for seconds a year are now down for days and in some cases weeks per year)
Outsource development to India (which has been a mess I won't use the foul language to describe) _AND_
Squeeze remaining people to make up for items 1 and 2!

Since becoming a PHB (although I still do architecture work - thankfully), I've found that mindless boneheaded, sweeping decisions, are usually driven by some empty-suit, bean-counting, incompetent, barely literate, sh!t-for-brains syncophant who found themselves in an executive position purely by accident. We're "encouraged" to support their "strategies". Indeed...

It's a much higher order PHB. Kinda like a 4th degree black-belt, but not.

--

Computer Science is Applied Philosophy

"Who's really at fault?" by switcha · 2004-09-21 10:11 · Score: 3, Funny

You guessed it.

Frank Stallone.

--
You know what? ... A little club soda *did* get that out!

Yea, but... by HaeMaker · 2004-09-21 10:13 · Score: 2, Interesting

That information had been filtered at least three times, can't count on that either...

Software analyst -> LA Times reporter -> TechWorld reporter.

gettickcount maybe? by plopez · 2004-09-21 10:13 · Score: 4, Funny

http://msdn.microsoft.com/library/default.asp?url= /library/en-us/sysinfo/base/gettickcount.asp

Sounds like who ever wrote the software/OS module they were relying on used this gem. I hereby dub who soever was so silly as to do this as a 'code monkey, first class'.

--
putting the 'B' in LGBTQ+

poor guy.. by joeldg · 2004-09-21 10:14 · Score: 2, Interesting

Having to shutdown a system to maintain it's uptime is first a ridiculous idea.

Second, it took several years to find that bug because most windows machines never made it to that 49.7 days and if they did the users just assumed it was the normal because it is considered normal for windows to "lock up", freeze or whatever.

Third, replacing unix, known for it's stability, with any variant of windows (known for instability) in a system where peoples lives are at stake and then having this happen, the guys at LAX who decided to do this should be fired because they just risked a lot of lives and cause massive delays for travellers. In a political situation they would have to resign.

I remember a similar story about a aegis class cruiser stuck out in the ocean for three days because they decided to use windows. "Yea, that will work great during a war.."

*sigh* Microsoft has good lobby power and hires a fleet of sales people to keep selling their shod-ware that really should just be kept to mom and pop living rooms.

But then, this is the opionion of a guy who works only with linux and is sitting on an uptime on an openmosix cluster-leader (that also is my dev box) that looks like this:
19:03:06 up 319 days, 5:20, 3 users, load average: 1.28, 0.73, 0.37

eat your heart out LAX.. you got punk'd

--
anime+manga together at last.. in real time.

Ouch, poor ad placement by Eric+Seppanen · 2004-09-21 10:15 · Score: 5, Funny

Headline:

Microsoft server crash nearly causes 800-plane pile-up
failure to restart system caused data overload

giant advertisement:

Make a name for yourself with Windows Server System

I'm thinking that maybe "the guy that almost crashed a bunch of planes" is not the name they were looking for.

(I'm not making this up- that's really the ad I'm seeing.)

--
314-15-9265

Space Shuttle accidents and software bugs by BlueUnderwear · 2004-09-21 10:17 · Score: 4, Interesting

Was at JAOO today, and on the closing panel discussion for the Test-Driven Development track, Mr Kevlin Henney was praising NASA's rigorous software testing procedures. He was so proud of them that he let out a "and in both space shuttle crashes, software was not to blame". Well, this may be correct if he was thinking only about the flight software... but there is other software than what rides in the shuttle itself...

--
Say no to software patents.

Re:Space Shuttle accidents and software bugs by GlassHeart · 2004-09-21 17:13 · Score: 4, Insightful

The only regret you'll have from paying for too much quality is the money. You'll have everything to regret from spending on too little quality.
That's a nice thing for a professor to advocate, but real world projects like the space shuttle do not have an infinite budget to accomplish the assigned task. Therefore, spending too much money on one aspect can mean that another is sacrificed and becomes the point of failure. Therefore, while being responsible for the part that never failed is an understandable source of pride, it may actually reveal a misallocation of resources.
Engineering is about spending the least amount of time and money to achieve the required quality. Nobody said anything about spending too little.

Re:Why not automate it? by bstone · 2004-09-21 10:18 · Score: 2, Interesting

I don't see the logic in a system being so critical to be working 24/7 that they force it to crash if the maintenance is missed. Does anyone else see a problem with this logic?

Re:2K is based on NT kernel by gl4ss · 2004-09-21 10:18 · Score: 4, Insightful

so what if it is "completely different os"? that's the whole point, if it were continuation of the win95 line it would have been fixed!

now the bug was present in both codebases, but fixed just in one.

that's at least how the article and the writeup make it sound like.

--
world was created 5 seconds before this post as it is.

Microsoft's new slogan... by TWX · 2004-09-21 10:19 · Score: 3, Funny

... should be:

"Microsoft: Writing the software to prevent SkyNet since 1981."

--
Do not look into laser with remaining eye.

Windows Bug by nwbvt · 2004-09-21 10:19 · Score: 2

Is there any evidence that this was caused by a Windows bug or is this just more /. anti-Windows FUD? None of the articles support such a hypothesis, they seem to put the blame on the integration and maintence of the system, not on the design of the operating system.

And I hardly see how the Windows 95 bug is relevant to this issue as that clearly isn't what caused the shutdown.

Editors please learn how to do your fucking jobs and reject crap like this. Just because it bashes MS doesn't mean its newsworthy.

--
Mathematics is made of 50 percent formulas, 50 percent proofs, and 50 percent imagination.

Try telinit by TheScienceKid · 2004-09-21 10:21 · Score: 2, Informative

What you may not have taken the time to observe is that when you run init with a name of telinit or with a process ID other than 1 it runs in 'telinit' mode. In this mode it passes a message via /dev/initctl (a FIFO) to tell the running copy of 'init' (the process responsible for initialising services and managing them thereafter) to perform a specific action (eg shutdown, reboot... etc)

Re:If it's in the job description... by serviscope_minor · 2004-09-21 10:22 · Score: 5, Insightful

How can you intimate blaming the software company here?

You are joking, right? The majority of accidents happen due to human error. This is supposed to be mission critical software (and there's more than just money at stake). Yet, it relies on needless human intervention once a month! This is simply unacceptable for a piece of software in such a position. The main blame lies in the hands of the comany that provided it, the person who decided to switch to it and the person who decided to bring the new system online and remove the old one despite this flaw. The tecnician is almost irrelevent, since this happening was an inevitibility. It would have happened sooner or later because the system left room in there for human error to happen.

And yet, you still don't blame a company which ships mission critical software which leaves such a huge hole open for human errors. I hope our nuclear power plants are running on better designed stuff.

--
SJW n. One who posts facts.

Who's really at fault? by mcguyver · 2004-09-21 10:24 · Score: 3, Insightful

Whoever approved this process of manually rebooting a machine should be at fault. The fact that it was a windows operating system, or a unix OS or a purple OS is irrelevant. The problem here is someone thought a valid solution was to reboot a machine once a month.

A few remarks by bmajik · 2004-09-21 10:24 · Score: 2, Informative

1) this is not a windows OS bug

GetTickCount() will rollover. An _application_ which assumes it is a strictly increasing value will misbehave after the 40 some odd days expire. That appears to be what is happening here.

Note that nowhere in the article is there a distinction between the "system" and the "OS" or the "application".

2) Regardless of where the fault is (hint: it's not in Windows), it is not unreasonable for a machine to need servicing. Aircraft engines are serviced at hour based intervals, wether they need it or not. It's better to just tear the thing down and rebuild it than to have it tear itself apart. software doesn't _have_ to be this way, but it sometimes is.

Making a complete hardware -> app layer stack 100% failsafe is.. tricky. For some applications, designing the system with a known restart point.. i.e. a reboot of the app or the entire machine, can be more cost effective.. (see earlier the paper on crash-only software design)..a periodic shutdown/restart in complicated systems can be a valid operational practice.

The fault here is two fold - one, the application/system had a known issue that is probably avoidable, but for whatever reasons, it still has the issue.

Knowing that the issue existed, the proper maintennace was not observed with the expected result - a failure.

Only in america do you get away with blaming Audi for oil sludge problems when you dont change your oil every maintenace interval.

If the system called for a 48th day restart, thats what it requires, and deviation from that has consequences. Luckily no one was hurt.

--
My opinions are my own, and do not necessarily represent those of my employer.

Re:A few remarks by evilviper · 2004-09-21 11:19 · Score: 2, Insightful

You just can't talk about computers like you talk about machines. The analogy does not work.

If the fault was going to happen every 48 days, they should have scheduled a reboot for every 22 days at most. Just like everything else, it's insane to have a single point of failure like this.

If you know a machine needs to be rebooted regularly, there is no reason not to automate the process. Windows task scheduler should do the job quite well.

There's no reason the computer could not have reported an error, by whatever means, to an administrator when it detects it is operating in excess of it's design parameters. Send a barrage of e-mails, IMs, Faxes, SMS messages, etc. I can guarantee this life-or-death system would get somebody's attention, and it would be restarted as it should be.

--
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant

Re:Liability. by Sloppy · 2004-09-21 10:27 · Score: 3, Insightful

You know, if strict product liability were applied to Microsoft, they'd be paying big time.

If duct tape a wing to an airplane and then the wing falls off and the plane crashes, you don't sue the duct tape maker. You sue the idiot who decided to use the duct tape.

The grossly negligent party in this situation, is the contractor who built a real-life system on top of Windows. And the FAA idiots who didn't spot this glaring flaw in the proposal. Microsoft shouldn't have to pay a cent.

--
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.

Re:2K is based on NT kernel by LostCluster · 2004-09-21 10:28 · Score: 4, Informative

As many others have pointed out here, it's the same bug that brought down Windows 9x reappearing.

Just like the "Y2K glitch" was a platform independant problem based upon the 2-digit-year shorthand causing logical flaws, if you store time in a 32-bit variable by the microsecond... you'll hit the hard limit after about 49.7 days which is why that number can show up in kernels other than Win9x. If there's no proper handling of that rollover, things go haywire.

Uptime: From one of the artticle links by Mateito · 2004-09-21 10:28 · Score: 5, Interesting

The system offers unprecedented voice quality, touch-screen technology, dynamic reconfiguration capabilities to meet changing needs, and an operational availability of 0.9999999.

Whoah! 7 nines uptime!

22 seconds of downtime per year.

Somebody is on drugs if they sold that. Somebody is on even stronger drugs if they bought that story.

"5 nines", for all intents and purposes, is as good as it gets, with "6 nines" seen as the holy grail. The top HA system I've ever dealt with (running a Telco's billing operation spanning 4 countries!) quoted a figure of 0.999996. To nobody's suprise, it did not run Windows.

Wonder how much their failure clause is going to set them back?

--
Norman Cook's Ode to Sl

Not necessarily Windows' fault by DunbarTheInept · 2004-09-21 10:29 · Score: 4, Interesting

While I hate MS as much as the next guy, this might not really be directly their fault. Unix systems are often installed with the instruction taht they get reboots regularly. Often there is a problem that is caused by application code not the OS. If you have a memory leak in an application that runs and stays up all the time, it's going to cause the system to get horribly unusalbe in the long run regardless of whether it's UNIX or Windows. While a reboot might be overkill when it was just one application misbehaving, a reboot is a guaranteed way to kill and reset the responsible program no matter which one it is. At a previous place of employment we told the customer to do monthly reboots mainly because we didn't trust *our own* code to be that perfect.

--

Don't label something "offtopic" unless you know the topic well enough to tell what's on topic.

Re:Not necessarily Windows' fault by meme_police · 2004-09-21 12:14 · Score: 2, Informative

Spoken by someone who obviously hasn't adminned any enterprise UNIX servers.

--
The meme police, They live inside of my head

Re:Retard by Keith+Russell · 2004-09-21 10:32 · Score: 5, Informative

Search Microsoft's Knowledge Base for "49.7 days", and you'll find a few bugs, all of them related to storing uptime in milliseconds in an unsigned 32-bit integer. Two were reported in Windows 2000:

That rpcss.exe issue looks like a prime suspect. The OS doesn't crash, but, given the time-sensitive nature of air traffic control data, it's quite possible that the applications running on that server would degrade to the point of failure.

Both look like they were found, or at least entered into the KB, after the release of Windows 2000 Service Pack 4 (Nov. 2003), and hotfixes are available for both.

Note to Microsoft (or anyone else storing milliseconds, for that matter): unsigned 64-bit int! Instead of having to reboot every 49.7 days, you'll have to reboot every 213,503,982,334 days, give or take a leap-second.

--
This sig intentionally left blank.

Re:If it's in the job description... by pfleming · 2004-09-21 10:32 · Score: 2, Interesting

How many guns have you seen that fire on a monthly basis unless you 'prevented' it?

Re:Seen this week at various airports by Anonymous Coward · 2004-09-21 10:33 · Score: 3, Interesting

It probably should. My company uses XP Embedded for a few systems, and doesn't have any software-related problems on them. Ever. The only problems we have are when people snap off antennae that we use for the wireless connections, or something similar. There's no reason that they shouldn't be using something like this to scan baggage. It sounds like someone at O'Hare didn't do their homework.

The only drawback to XP Embedded, for my company at least, is that the Windows license costs us more than the solid-state drive that we run it from. Looking into Linux for new installations as an alternative, but it doens't make much sense to replace strong, stable XP systems that never fail.

Re:Wait, I know this one.... by Codebender · 2004-09-21 10:38 · Score: 3, Insightful

No, the FAA is responsible for maintaining the safety of that system. They failed bigtime by allowing Windows to be used for a mission-critical system. Technically, a contractor was the one who made the decision, but the final responsibility for oversight rests on the FAA.

64 bit int by Alien54 · 2004-09-21 10:42 · Score: 4, Funny

Note to Microsoft (or anyone else storing milliseconds, for that matter): unsigned 64-bit int! Instead of having to reboot every 49.7 days, you'll have to reboot every 213,503,982,334 days, give or take a leap-second.

That's every 584,942,417 years. Which is simply not going to be good enough in my book.

--
"It is a greater offense to steal men's labor, than their clothes"

Re:64 bit int by Dmala · 2004-09-21 13:29 · Score: 2, Funny

That's every 584,942,417 years. Which is simply not going to be good enough in my book.

What are you? A geologist?

The article is light on details... by Ayanami+Rei · 2004-09-21 10:48 · Score: 4, Informative

It's probably not a Microsoft problem if the system is running on NT, it uses a 64-bit time.

It _could_ be that an important part of the system is running Windows 95 interfaced to a 2k domain that implements the rest of the system.
That really isn't Microsoft's fault that they didn't patch that critical machine to fix the flaw... or that they felt they needed to run Windows 95 (gag) in such a critical portion of the system.

It _could_ be that a user-land air traffic control related application itself calls an depricated API to return the time in microseconds, which
overflows/wraps around, causing the software to crash.
OR
It _could_ be that the user-land air traffic control software just mis-casts the time from the modern API into a 32-bit data structure, which wraps around, causing the software to crash.
In the latter two cases the article writer or LAX's press staff may have incorrectly drawn the connection to the famous Windows 95 problem... even when it wasn't Microsoft's fault in that case.

I really don't see how Microsoft could be the blame here at all...

--
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON

Pray You Never Hear This by craXORjack · 2004-09-21 10:48 · Score: 5, Funny

Ladies and Gentlemen, at this time the Captain would like to ask you to remain seated with your seatbelt firmly fastened, however if there are any computer technicians flying with us today, especially if they know what to do when a 'Fatal Exception has occured at 0029:C02FDEC6', would that person please come forward to the cabin immediately?

--
Liberals call everyone Nazis yet they are the closest thing to it.

Another mouse wiggler bites the dust.... by Proudrooster · 2004-09-21 10:50 · Score: 2, Insightful

Let this be a lesson out there to all the mouse wiggling MSCE's who scorn the uptime of UNIX and shun the power commandline. If you are running a critical Windows Server, REBOOT EARLY and REBOOT OFTEN. Remember, REBOOT-ing is part of the job description and it has to be done. Please protect our key infrastructure and reboot your servers WEEKLY! Just beacause the UNIX guys get 2 years of uptime, doesn't mean you can too. It just doesn't work that way.

Might I suggest this wonderful little tool. Poweroff. It's the only tool I know of which seems to be able to reliable reboot widows boxes, even when they are crippled due to worms and/or memory leaks. It can even close running apps. Also, you get get it to work over the network with a magic packet, in case Terminal Server crashes or is too slow to use.

The main article should get flagged as troll/flamebait due to the phrase upgrade from Unix to Windows. That wasn't an upgrade, that (as we now know) it was a disaster waiting to happen. Wait until the worm of the month comes through and shuts it down. When will people learn to use the RIGHT TOOL FOR THE JOB! If it has to run 24x7 forever, don't put it on Windows. Geez...

What failed? by AK+Marc · 2004-09-21 10:50 · Score: 5, Insightful

A system was deployed where the application (not the OS) failed after a finite time was deployed knowing it was faulty. An under-trained technician failed to reboot the server as scheduled. There was a backup which we don't have details on. It failed to work as well.

I don't see what the OS has to do with this. It could have been written for *NIX, OS/2, or any other OS. The lessons are two:
Don't deploy flawed software.
Make sure redundant systems work.

As an aside, since we don't know what the backup was, we could hypothetically say that it was the UNIX system that previously was primary that was relegated to backup duty. In that case, it would be a failure of Windows and UNIX at the same time. So, is it that UNIX sucks and is worthless for any important systems, or is it that the people that screwed this up would have screwed up something, no matter what OS they were working with?

--
Learn to love Alaska

Re:Please mod parent up by drsmithy · 2004-09-21 10:54 · Score: 2, Insightful

This is pure speculation of the editor. Nowhere in the article the blame is put on the OS. Linking the failure to an error in a previous version of the OS just doesn't make sense.

Particularly when it's not a "previous version" at all but a completely different Operating System.

Windows 95 and Windows NT (2000/XP/2003) are not the same OS. They're completely different. They share a common API and that's about it. Blaming this on "Windows 95" makes about as much sense as blaming an application bug under FreeBSD 5.x bug on Slackware 1.0.

It was the app, not the OS by Teahouse · 2004-09-21 11:01 · Score: 5, Informative

Pilot here, and this has been a well known pecadillo of the tracking system for SoCal Approach for a few years. It's an application problem that came into being after an upgrade of the application, not the OS. It's a memory allocation error that retains some of the old tracking on the system, thus, the whole box needs to be rebooted every 45 days or the memory overloads and crashes the OS. Look guys, I'm a Linux user and all, but let's not run around blaming M$ for problems with buggy software apps.

--
"Curiosity killed the cat, but for a while I was a suspect."- Steven Wright

Re:It was the app, not the OS by tbogart · 2004-09-21 15:53 · Score: 2, Interesting

Just curious - but how does being a pilot give you more insight into the system? I would particularly like to see the "memory allocation error that retains some of the old tracking on the system". That would be quite amazing in itself.
Re:It was the app, not the OS by tbogart · 2004-09-22 19:27 · Score: 2, Informative

Don't get me wrong - I am not questioning you seem familiar with the effect the problems have on operations. And of course it just shows good sense that as a pilot, you network (!) with the folks you depend on as you describe. But do you network with the programmers or the administrators? It still sounds like you are getting at least two levels removed information from the level any real dirt is available. Perhaps an analogy would be talking to someone who works in the next office to the folks who supervise Air Traffic Controllers rather than the controllers themselves. Sure, if those folks ar interested in aviation and ask the right questions they can gain reliable information, but it is not like going to the, er, appropriate end of the horse.

FWIW, my father was a machinist/aircraft mechanic and finally technical writer who worked with oil company research labs on improving lubrication, publishing articles in their company publications including doing his own photomicroscopy to analyse corrosion effects.

My first job out of school with a EE degree was at the Johnson Space Center training astronauts and sitting console. About 70% of the folks I worked with were either military pilots still flying in the reserves or private pilots (and I was fool enough to go do light aerobotics with some of them), plus of course the flight crews. While there, I started dealing with with computers as they first started appearing in offices, and eventually went into full time system administration/ systems engineering, primarily for development groups and test labs.

Now, the reason I blabbed on like that was to try to establish

1) I am somewhat familiar with the aviation community from both the 'user' and 'support' aspects.

2) I am somewhat familiar with the computer community, starting as a user, and moving into the support realm.

3) I would claim that both the classes I wrote and taught - as well as the time spent on console, directly gives me a somewhat initmate knowledge of translating information from one community into another. You generally don't explain an onboard system to a pilot the same way you would a PHD in EE, or a medical experiment to a pilot as you would an MD.

One particular conclusion based on my experience in those worlds (and I know this is a bit of a generalization) is that when a pilot or any member of an air crew tells me something about their aircraft or it's surrounding operations, I can probably bet on the information being pretty good.

If a programmer or administrator tells me something about their program or system, before I put any stock in what they say (beyond my own experience in similar veins), I probe their background and quiz them as much as possible.

If I wanted to be glib, if programmers/administrators had to go thru the kind of training programs as pilots or even support personell, about 85% would not cut it. Or if these folks made it into the sky, they would be weeded out by the flaming holes in the ground they made.

If, as I expect, your information is based on what an ATC heard from a guy down the hall, or maybe even was touching a computer, or even from a distilled briefing from the contractor - I would first have to ask how much that ATC knew about systems and programming and see how critically s/he processed (!) that data.

If you even got the information directly from and admin/programmer, (as you might guess by now), the same set of questions would apply.

In either case, the point is to wonder aloud if you take that information as if it were coming from folks who are the caliber of the people you are used to relying on.

Consider your description of the memory issue:
"It's a memory allocation error that retains some of the old tracking on the system, thus, the whole box needs to be rebooted every 45 days or the memory overloads and crashes the OS."

The typical memory allocation error doesn't have anything to do with old data still being in the system, but simply that m

Re:But DON'T get into the habit of using reboot. by drinkypoo · 2004-09-21 11:02 · Score: 2, Interesting

The funny thing is that halt used to halt the system RIGHT GODDAMN NOW on most Unixes, and famously on Xenix. They called it haltsys and you typed sync twice before running it. The second one was just to give the system time to sync while your fingers were moving. Most Xenix systems didn't have much of a buffer (I had Xenix on a 286 with 1MB RAM, but the 386 product was of course much more popular) but they don't have much of a filesystem either. Anyway other elderly Unixes and Unix derivatives are simple like that too. Halt just halts, it doesn't stroke you first.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"

...Blame the API instead by tyler_larson · 2004-09-21 11:05 · Score: 5, Insightful

This sounds to me like more of a problem with the application, not the OS.

Three words:

GetTickCount()

Returns the number of milliseconds since the machine was last booted.

From reading the article, one would surmise that this function is used to assign a timestamp to a particular flight plan or other record. After the machine has been running for 49.7 days, the GetTickCount() function rolls over to zero, which could cause a whole plethora of problems. Almost certainly those problems would include things like corruption of data, lost records, old records showing up as new, application crashes, and, of course, swarms of locusts. The only fix is to reboot.

The developers cleverly noticed the potential disaster before it crashed any planes, and as a workaround, instituted a policy requiring the servers to be rebooted at monthly intervals. Failure to do so would result in the calamities described above.

So while the problem wasn't the old Win95 bug, it was the same crappy windows API that caused both. The POSIX-compliant gettimeofday() function uses a 64-bit structure and does not suffer from the same flaw, and can be relied upon for at least the next 30 years or so (which isn't amazing, but it's a lot better than 50 days).

Note that the FAA insists that they're currently implementing a better solution than "reboot every month". Better hurry, guys, you've only got 47.3 days left.

--
"With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea...."
RFC 1925

Since We're Being Tehcnical About the Answer by techsoldaten · 2004-09-21 11:09 · Score: 3, Insightful

Since we are being technical about the answer, does this mean Microsoft or the software vendor qualifies as a terrorist organization?

Consider the fact that an entire airport was shut down, lives were disrupted, major economic harm was caused our airlines as a result of flights not getting out on time. LAX is a major hub that connects travelers throughout the country, it is conceivable traffic patterns throughout the U.S. were put out by this problem.

Think of it like a car bomb that went off without anyone dying, and you see my point.

M

Pilot or no... by juuri · 2004-09-21 11:19 · Score: 4, Insightful

... how does a single app bring down the entire OS? You mean the app can't be restarted and brought back up with the same state at a moments notice in a mere minute or two?

Crappy design, regardless of who is at fault.

--
--- I do not moderate.

Re:If it's in the job description... by Dun+Malg · 2004-09-21 11:31 · Score: 2, Insightful

You design this sort of system _expecting_ that a reboot or two will be missed. Okay.. blame the tech if he didn't follow procedure.. but what if the reboot didn't happen because the tech's wife was in labor or if his kid got hit by a truck? You design systems thinking of the _worst_ case scenario.

You don't run a fucking air traffic control system with a "one truck" vulnerability.

Exactly. If you find a bug that requires a restart before a 49.7 day timer runs out, you are indeed an idiot if you decide a restart once a month is good enough. At the very least I'd have tech down there on the 1st and 15th of the month, so they'd have to miss three scheduled restarts to cause this problem. Better yet, have two guys there every damn Wednsday at noon. If they both miss seven Wednsdays in a row, well, you got bigger problems than bad software. Whoever decided once a month was adequate needs to have his head handed to him.

--
If a job's not worth doing, it's not worth doing right.

I don't blame the OS per se... by WebCowboy · 2004-09-21 11:37 · Score: 3, Interesting

...but I blame a lot of people for carelessness and incompetence (except for the actual techie that forgot to reboot last month--that is an honest mistake).

* Bill Gates and developers of Win2000 for the convoluted, kludgy API they designed for their OS

* Product managers at Harris--the crap-for-brains who actually thought changing out robust UNIX servers that weren't really THAT old with consumer-grade PCs running an unproven OS was an UPGRADE to a critical, safety related system. WHAT THE HELL WERE THEY THINKING? In one of the article links (the Harris press release), Harris touted SEVEN NINES reliability! If that was a criteria they should've NEVER considered Windows...Not even BillG himself would say Win2k could provide that sort of uptime!

* Retarded developers at Harris who used an API call that tracks milliseconds in a 32 bit integer despite the fact that bugs related to the use of said function call were WELL KNOWN by that time.

* Dough-heads at LAX and the FAA who, upon finding the error early in development, decided it was OK to rely on MANUAL MONTHLY REBOOTS as a workaround to a potentially fatal problem. They should've run the "upgraded" windows machines in parallel with the UNIX servers for much longer, and failing that they should've IMMEDIATELY restored the old UNIX servers to service as soon as the problem was discovered, and to refuse the upgrade (and revoke payment to Harris) until the problem was properly resolved (and NOT just worked around with a kludge like an email reminder to reboot, or a reboot script or a shutdown warning either).

I'm surprised that this sort of error got into such a critical system, and at the way it was handled. I would've certainly tested the new system in parallel for long enough to catch this sort of error and kept the old system around for longer as a standby (in my experience, replacements of critical systems were often tested in parallel for 3 months to a year). I also would've acted much more decisively in resolving the problem if it did slip through the cracks, given a system crash could put lives in danger.

Maybe my girlfriends fear of flying is more justified than I thought if these are the kind of clowns we trust our safety to...

IBM product support kicks all ass. by SvnLyrBrto · 2004-09-21 11:49 · Score: 2, Interesting

> Tell you what, can you get me new boards for an IBM RT pc? I
> highly doubt it.

I've actually dealt with IBM in the "we need support and replacement parts for legacy hardware" capacity before.

And yes, if you've bought IBM in a professional/enterprise capacity, you've also bought the support contract. And if you've bought the support contract (And if you didn't, you deserve to be fired. Why the hell would you pay the IBM premium except for their support?), you can get parts and expert support for damn near everything IBM's ever made; all the way back to card punches/readers, and farther I'd bet. Remember, when you buy IBM, you're buying a MTBF of thirty YEARS.

cya,
john

--
Imagine all the people...

Downtime vs Failure by burnin1965 · 2004-09-21 11:55 · Score: 5, Interesting

I'm not sure exactly what downtime for routine maintenance on an AIX system running DBase has to do with a Windows bug that causes a system failure. However, in response, there is a difference between planned downtime where a service is made unavailable while planned routine maintenance is performed and planned downtime or an unplanned failure due to a flaw in the system.

It appears that in this case Windows has a flaw which they try to work around with routine maintenance during planned downtime.

In your case I would say you have planned downtime for routine maintenance to work around the need for an appropriate system to handle the work load.

I suppose what is the same between these two cases is that you both need to change your system to something that is more appropriate for the task at hand. And to be more specific in the FCC case, Windows should not be allowed for use in any application where life, limb, or property is at risk. Hmm, I suppose that may rule out just about every use. :P

burnin

Fire the Department of the Interior's IT staff... by Dr.Dubious+DDQ · 2004-09-21 11:55 · Score: 4, Insightful

The FAA is under the auspices of the US Department of the Interior, aren't they? You know, the same department that was ordered by a court to take ALL of their systems off line because they were apparently unable to secure them? TWICE? (No, wait, the latter link says THREE times, most recently March 2004...!)

Is there some secret plot to make them look bad, or is the Department of the Interior riddled with incompetence? I certainly don't feel real secure about the safety of our airlines right now - and it's got nothing to do with "terrorists"...

(Not to say that terrorism isn't a real concern, but I'm somewhat less worried that their intentional plots will slip through observation by the authorities than "accidental" screwed up software being deployed by the FAA...)

--
Hacker Public Radio is our Friend

Lol, only on Slashdot by jayhawk88 · 2004-09-21 12:12 · Score: 3, Insightful

I don't think blame should be assigned to the technician who missed the task...

Boss: OK Tech, it's your job to see to it this computer is rebooted monthly.
Tech: Will do Boss!
*Time Passes, System Crashes*
Boss: The system crashed, why is that?
Tech: Well, it's because I didn't reboot the system like I should have.
Boss: Oh well, I guess it's not your fault, obviously I failed to realize maximum security synergy in my systems.

Wherever the submitter works, I wanna get a job there!

Re:Lol, only on Slashdot by reverius · 2004-09-21 18:05 · Score: 2, Insightful

it's the boss' fault for making a task like that necessary in the first place.

if i design a system in which someone has to press a button every 12 hours or the world blows up, would anyone want to use that system? no, you think? what if you could -order someone who works below you- to do it!?

that's just plain stupid management. the rebooting job is a waste of the tech's time (anyone competent could make it reboot automatically) and a completely unnecessary job (any competant operating system doesn't need to be rebooted every 30 days, or even every 3 years).

If the boss had scheduled maintanance (Windows Update, to get service pack 4) or had used an operating system that doesn't require that much maintanance to function correctly, the job wouldn't have needed to be performed.

the boss should be fired for general incompetence/negligence (since he had the responsibility to make the system stable), and the tech should be put to work carrying boxes or something (or just fired as well), since he isn't competent enough to put an automatic timer on the rebooting.

Patriot bug details by Animats · 2004-09-21 12:21 · Score: 2, Informative

That was a bad bug. It didn't cause system crashes. It caused missile misses. This bug was responsible for an interception failure which allowed an incoming Scud missile to hit a barracks in Saudi Arabia, killing 28 people.

The radar and the guidance system had separate clocks, and they'd drift out of sync.

Here's a detailed analysis by the General Accounting Office.

Re:Now even the submitters aren't reading the arti by AstroDrabb · 2004-09-21 12:25 · Score: 3, Interesting

The shutdown is not a crash but a scheduled event to bring the servers down to flush data.

That is MS PHB speek to "assure" other PHB's that it was not MS's fault. What _modern_ server OS needs to reboot to flush freakin data! Why do you think technical details are never released in these types of press releases?

The reboot was to reset the logic flaw in the MS system timer. Read my post here on it. It has affected other MS made apps on MS Windows 2000 servers. So if MS's programmers get affected by it, you can expect non-MS employeed programmers to get affected too since they do not have the same level of access to the proprietary OS.

--
If Tyranny and Oppression come to this land,
it will be in the guise of fighting a foreign enemy. -James Madison

Exactly by autopr0n · 2004-09-21 13:03 · Score: 2, Informative

windows 2000 can stay up for more then 232 milliseconds, but software that depends on GetTickCount() being correct can't. That's probably what happened. They could have rewritten the software to use a 64 bit time variable, or they could have worked around the bug.

They didn't, and that caused the crash. Not "buggy windows".

The fact that they couldn't even figure out how to run a sheduled task in windows to reboot the machine is just pathetic, and shows how incompitant they really are.

--
autopr0n is like, down and stuff.

Why replaced by jamesl · 2004-09-21 13:19 · Score: 2, Funny

The decision to replace the legacy system was made the same week RadioShack quit selling vacuum tubes. Coincidence? I think not.

Maitainance. by Zebra_X · 2004-09-21 13:41 · Score: 3, Insightful

it seems a gross oversight for the FAA to guarantee that such a critical system will crash after only one missed maintenance task.

Would you feel this way if the airplane that you were flying in missed it's engine overhaul time, the engined failed catastrophically and your plane crashed?

Critical System + Maitainance = Must Be Done.

The system was designed and setup in a particular manner. In fact, the reboot rule was added to the design of the system, so that this very thing would not happen.

Whoever's job it was to reboot the machine is at fault for not maintaining the system properly.

The discussion of whether the procedure of rebooting a machine every month is inane, is something different.

What if OSS gave them software? by Mustang+Matt · 2004-09-21 13:43 · Score: 2, Interesting

What would happen if a group of people out of the goodness of their hearts wrote them a new system that truly did everything they needed. Would they adopt it?

Or are the corporate powers that be so out of touch with reality that they wouldn't touch anything having to do with "open sores!"

--
The man who trades freedom for security does not deserve nor will he ever receive either. - Benjamin Franklin

Why is it that.... by jwcorder · 2004-09-21 13:47 · Score: 2, Insightful

no one has put the blame where it belongs....on the system admin. We can have a shit throwing contest all day about whether or not this is MS's fault. But the fact remains that the problem was addressed and fixed in SP 4 for Win 2000.

If the system had been updated the problem would not have occurred. How is this a microsoft problem? They cannot force system maintenance.

--
http://jayceecorder.blogspot.com

MTBF - what boggles my mind by apikoros · 2004-09-21 14:08 · Score: 2, Insightful

Forgetting all the talk about Microsoft and Win95/98 and the defect in the OS that has been well known for years and for which a patch has also been available for years....

If you have a system that has a known failure point at 49 days,when do you perform the mandatory reset?

For the failure that is described the scheduled reset must have been "every 30 days" which is, frankly, INSANE!

If they had scheduled a mandaory reset every 14 or 15 days, they would have had to have had three failures before disaster struck. As it seems, one failure was all it took.

Re:Seen this week at various airports by Anonymous Coward · 2004-09-21 14:17 · Score: 2, Interesting

Perfect timing for this comment. I was in the airport yesterday (Detroit). The screens over the metal detectors/ carryon xray machines do nothing except tell you whether the lane is open (a large arrow) or closed (a large X). 4 of the lanes had some sort of Windows error message. Apparently they couldn't handle the workload.

I wanted to say stupid - I say ??? by tuomoks · 2004-09-21 14:21 · Score: 2, Insightful

I started to write a long comment, no point, unfortunately this is the way today. Trust me - the more computer system decissions are made on manager level instead using people who know how to build systems - the worse it gets. Used to be that way - compare the financial / manufacturing systems running years to what we do today - any questions ? Some of my old systems are still running from 70's - none of my new systems can stay up more than 10-12 months AND I was told to build them that way. And no - CAD systems, CRM, protocols, world wide networks for finance / air lines / etc.. has been there since early 70's, so complexity is not any excuse. Just don't give up - maybe some day ( after my time.. ) And let's forget the Windows / *nix, Windows is more difficult to build reliable systems but it can be done - Windows is just more primitive, you have to design / code on lower level, it is harder than *nix but so what ?

They _do_ use Duct tape and baling wire by billstewart · 2004-09-21 14:34 · Score: 2, Interesting

Back when I was working on ARTCC replacement in the late 80s, during the daytime they were running the "modern" 1960s IBM System 360/90 system, which was an ugly undocumented unmaintainable hack job written mostly in JOVIAL. For about four hours a night, they'd run the backup system EDARC, which was an 1970s "Enhanced" version of the 1950s "DARC" radar controller. There were all sorts of parts you couldn't get back in the 1980s - IBM had stopped making the "Serpentine" cable connector, for instance.

I was on the lucky team that *lost* the bidding for the replacement system; IBM's team were the poor bastards who won, and were stuck investing seven years into building an unbuildable replacement, pouring billions of dollars down the drain while being micromanaged by the FAA, who didn't know much about software design or reliability in spite of having a methodology that required producing 175 design documents over the optimistically 3-year design period.

--

Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks

49.7-day bug not exclusive to Windows. by Temporal · 2004-09-21 15:23 · Score: 3, Interesting

It may seem suspicious that the max uptime of the LAX system is the same as the max uptime of a Windows 95 box... until you realize that 49.7 days is 2^32 milliseconds. If you have a piece of software that counts milliseconds using a 32-bit integer, it will inevitably roll over after 49.7 days and -- unless designed to compensate for it -- will probably crash. Windows 95 is certainly not the only piece of software that counts milliseconds in a 32-bit integer.

That said, the Windows GetTickCount() system call returns a timer value as a 32-bit count of milliseconds since the system was booted. Now, any good programmer knows better than to use GetTickCount() -- there are other, better, more robust ways to tell time in Windows -- but it would not surprise me if a newbie had made the mistake of using this system call in the LAX software, thus leading to the problems.

In other words, the Windows timer is not at fault, but it is possible that one of the programmers was confused by the convoluted Win32 API and made a programming error as a result.

Re:Fire the Department of the Interior's IT staff. by tbogart · 2004-09-21 15:41 · Score: 3, Interesting

Looking at the www.faa.gov home page, it says "Department of Transportation". However, having been a systems engineer and administrator in a couple of stints at one of the DOI Bureaus ... you don't want to know.

Which UNIX is that? by mangu · 2004-09-21 15:49 · Score: 2, Insightful

Unix systems are often installed with the instruction taht they get reboots regularly.

In 25 years working with Unix systems, I've never seen that instruction. That must be because I've never worked with any Microsoft Unix system...

Can you imagine? by dickens · 2004-09-21 16:00 · Score: 2, Interesting

Can you imagine knowing about this problem, putting it into production and not riding your MS rep like a pony until it was verified fixed ? ...with any other vendor.. sheesh.. but I guess it doesn't work that way with MS - even for the FAA.

Re:2K is based on NT kernel by omicronish · 2004-09-21 16:27 · Score: 2, Interesting

Just like the "Y2K glitch" was a platform independant problem based upon the 2-digit-year shorthand causing logical flaws, if you store time in a 32-bit variable by the microsecond... you'll hit the hard limit after about 49.7 days which is why that number can show up in kernels other than Win9x. If there's no proper handling of that rollover, things go haywire.

One interesting bit is that Quake 1 servers had problems running for more than 49.7 days for what I assume is precisely the same reason.

Unix to Windows an Upgrade? by CodeBuster · 2004-09-21 17:25 · Score: 2, Funny

Unix to Windows95? more like downgrade...big time

Incorrect. by jwigum · 2004-09-21 18:39 · Score: 5, Insightful

Part of being on the ball in any tech department means having the system up to date. If you don't have it up to date, and an error FOR WHICH A PATCH EXISTS gives you trouble, everyone else in the company should rip your head off. That's inexcusable.

If you install an unpatched version of an OS, and leave it as such, it's your own dumb fault. If a patch is out that fixes the problem, then the problem doesn't exist as far as anyone with half a brain is concerned.

My apologies for the abrasive manner of the response, but patches are around for a reason: to fix known problems.

Patches, do ya have 'em?

--

Look behind you...

Re:Incorrect. by fallen1 · 2004-09-22 00:59 · Score: 2, Insightful

Quote: My apologies for the abrasive manner of the response, but patches are around for a reason: to fix known problems.
Well, yes, this may be true BUT Microsoft patches are _notorious_ for breaking as many, if not more, things than they fix. How long can a critical system such as this one stay down for "routine" maintenance? WHEN would the breaks introduced by the patches show up? In the middle of routing 20 or more airplanes in the airspace around LAX?
Although the specific bug had a patch, perhaps this was a case of "do we patch and pray OR do we reboot monthly?"
*shrug* Maybe the heads of the department overrode the IT personnel and instead of paying the money to patch and test they told them to just reboot the system? No, I didn't RTFA but who knows exactly what went down? The department heads are all in a CYA mode right now and the "truth" may never be known.

--
Dream as if you'll live forever.
Live as if you'll die tomorrow.
~Anonymous~

Hypocrites? by coronaride · 2004-09-21 18:58 · Score: 3, Interesting

This is not addressed to the parent, but is for everyone who responded to the parent -

I'm throwing stones, now - especially after reading this incredibly long and geeky thread about shutting down your OS variants. God bless you for having multiple ways of shutting down/halting/suspending/restarting your computer in user/superuser/megauser/whosyourdaddyuser modes, but shame on you for being a stickler on MS's decision to place a Shutdown option on the "Start" menu when you can't even agree on how to shut your own damned computers down!

It's hypocritical, pharisitical, and parasitical (I like alliterations, even when they're not in context...makes me feel like Don King) to bring up such an argument as "Please press the Start button to shut down (stop) the computer". I'm not saying that "Start" is the most incredible choice for a button, but it makes sense. If you are shutting down your computer, you START THE SHUTDOWN PROCESS.

--
Those who can, do. Those who can't, go into business for themselves.

Re:maintenance task (yyeahhh, rrriiiiighht) by timerider · 2004-09-21 19:14 · Score: 2, Insightful

it WAS a human error... i mean, it must have been some form of human life form who decided to use windows for those systems...

Every coin has two sides by babybird · 2004-09-21 19:54 · Score: 2, Insightful

By that same logic, doesn't a Windows users "Start" the shutdown procedure?

And if you don't want to go to the "Start" button in Windows to shut it down, you could always hit ctrl-alt-del and click shutdown. Or press the power button if you have power management enabled in the bios. I don't really see a fundamental difference between the two, it's just semantics really.

When I first started using Linux, one of the things that baffled me for hours until I could ask someone who knew Linux was how the heck do you rename a file?? I searched and searched for anything resembling a rename command and found nothing. It never occurred to me that you might use the move command to rename a file by essentially just "moving" the file to a new filename. That's at least as illogical (to me and every newbie I've ever known) as clicking Start to Shutdown for someone who isn't familiar with the idiosyncracies of a particular operating system.

--
Keith D.

Well done Harris! by BouffeMoiLaChatte · 2004-09-21 20:02 · Score: 2, Funny

Now you've become a thrustworthly company!

you ass by RMH101 · 2004-09-21 20:54 · Score: 2, Insightful

big projects don't work like this. if you find a bug mid testing, then you don't throw the whole thing back at the vendor and chuck the baby out with the bathwater; you simply cannot organise big projects like this. you do risk analysis and if it's decided you can accept it with a constraint that you, say, boot it occasionally then you may be able to accept the system. if you have accepted it on this basis and don't do what you said you would when you signed the constraint off, it's your problem. yes, the vendor shouldn't sell buggy software, but *all* software has *some* bugs in it.

Old OS/2 Bug, Not Windows 95 by JohnThreePound · 2004-09-21 23:13 · Score: 2, Interesting

As I recall, since Windows 2000/NT was once the same product as IBM OS/2 (remember Microsoft OS/2, anybody?), this bug originated from the OS/2 side of the codebase.

IBM ran into the problem quicker, as OS/2 was adopted for various critical things like Automated Teller Machines (ATMs), while Windows NT was mostly used for simple file servers. As a result, the problem was fixed in OS/2 about 2 years before in Microsoft got around to fixing the problem in Windows.

Considering that I remember this patch existing for Windows NT and 2000 back in 1999, it is disheartening that the FAA did not feel it necessary to upgrade to something as simple and critical as Service Pack 2 or 3.

JPL had a working system for the FAA around 1985 by pdxChris · 2004-09-22 05:11 · Score: 2, Informative

In the mid 1980's, I knew a software engineer at Caltech's Jet Propulsion Laboratory who worked on a multi-year JPL project for the FAA. The project was to replace the obsolete voice communication system for air traffic controllers. The new system had touch screens with onscreen menus and buttons were dynamically reconfigured depending on the controller's workload. It worked correctly, and the engineer enjoyed describing to me how it worked. This was all before there was any version of Windows. If I recall correctly, they developed on MODCOMP minicomputers running VMS but deployed on an embedded system with an in-house design for task switching, not a complete OS. I might be fuzzy about the technical details at this time, but a FOIA request should be able to retrieve them for the intensely curious.

I do clearly remember that the working system was presented to the FAA in Moneterey, and the FAA then terminated the contract and hired IBM to start over from scratch on a new system. Rumor was that this was a political payback. I should emphasize that's just a rumor I heard. Looks like Harris eventually got the contract. I wonder if any of the original code from JPL was ever deployed.

Slashdot Mirror

Windows Upgrade, FAA Error Cause LAX Shutdown

189 of 862 comments (clear)