Passport Database Outage Leaves Thousands Stranded
linuxwrangler (582055) writes Job interviews missed, work and wedding plans disrupted, children unable to fly home with their adoptive parents. All this disruption is due to a outage involving the passport and visa processing database at the U.S. State Department. The problems have been ongoing since July 19 and the best estimate for repair is "soon."
The system "crashed shortly after maintenance."
Rollback plan? What is that?
They are hard drive experts!
"I say we take off, nuke the site from orbit. It's the only way to be sure."
Still, bet Sysadmin's the highest ranking head that'll roll.
Happiness in intelligent people is the rarest thing I know.
Ernest Hemingway
Sic the healthcare.gov guys on it. I'm sure it'll be right as rain in no time.
From their Q&A:
Q: Why wasn’t there a back-up server?
Back-up capability and redundancy are built into the system. The upgrade affected our current processing capability, in part because it interfered with the smooth interoperability of redundant nodes.
We don't need backups, the data is replicated, we're cool.
We call this being over improved. So much for testing.
I hope this caused some synapses to fire.
One Database to bind them.
One Database to keep them out.
And into the darkness send them.
I'm sure they have full copies of all the data already.
That these breakdowns are lame excuses. If computers fails, have people forgot how to do the same process manually? It is better to halt all the flights than letting people through and risk "terrorists" flying? Are we that terrified?
The whole US customs and immigration system is massively dysfunctional. Last year I flew into Minneapolis from Asia. I'd been traveling for twenty hours straight and then I got to stand in line for a full hour waiting for an immigration agent to spend ten seconds looking at my passport photo to make sure it matched my face. Even the third world airports I've been through aren't that bad. There were even empty stations without agents. How much would it have cost to add a few more agents - $100? At the time they were doing this ridiculous upgrade to the airport that must have cost millions - they were setting up all these silly little tables with ipads in the waiting areas. But somehow they couldn't manage to have enough immigration agents. It made me wonder if people in the state of Minnesota are as silly as their ariport - they did elect Michelle Bachmann to congress - so there may be quite a few of them who were dropped on their heads as babies or something.
I think I found the problem, from the Department of State's own website:
"The Department of State is working with Oracle and Microsoft to implement system changes aimed at optimizing performance and addressing ongoing performance issues."
They're running Oracle on Windows.
I have arrived at the point where any crashes experienced by whatever State Department of whatever so called and self proclaimed Democratic Country (traitor mark here) are welcomed by me with the utmost glee. The more disruption, the more chances for a turnaround.
And how many families will be disconnected because of this?
How many jobs will be lost when people can't get back to work?
It's all nice conveniently glossing over the fact that people can't get home but they have lives to live, schedules to meet and contracts their obligated to perform under. You can't just say "Oh, sorry. You can't come back. Try later." The real world doesn't accept "Try later" as an excuse.
The article tries to wow us with the hugeness of the database, like this is a reason for the issues.
Yet the numbers quoted are not that big. Any modern PC isn't going to get too upset handling 75 million things. A real data center is going to sit there wondering what to do with the remaining 500TB of storage.
I don't doubt that there is some horrible flaw in the way the system was conceived that rendered it fragile, but whatever it is, it's nothing to do with the enormity of the problem, because it isn't very enormous.
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
I understand that the database is very large by any measure: 100 million records, 75 million pictures. But social security databases in most countries (or income tax databases) are at least that large (ok, likely much larger). Its a fail to have a large database really tank like this. If you need to shut the whole thing down for a day to avoid corrupt data, then shut it down. Fixing a corrupt database is much more difficult than correctly shutting a (slow) one down and then bringing it back up again.
While it doesn't always go this way, often simple things like the User Experience of a business gives an indication of the ethos behind a whole lot of the processes and systems they are using. To wit, compare the US Arrivals card that all "aliens" need to use upon arrival into the US, with the one from Australia. A clear 1970s look-and-feel versus something from this millenium.
http://www.immihelp.com/visas/sample-i94-form.pdf
http://www.immi.gov.au/managing-australias-borders/border-security/travel/passenger-cards/_pdf/english-ipc-sample.pdf
Ultimately my example was about: "level of incompetence and lack of planning is strong in several levels", as you suggested but it was driven that way by the new vendor having far too much control over the situation and no risk to bear in the event of failure.
The government took them to court twice (outgoing and incoming - Queensland, Australia) and could not scratch that vendor (IBM) for any of the $500 million+ in estimated extra costs.
For another 840 million dollars they can probably get it to the point where only another 150 million is needed to get it running.