What Developers Can Learn From Healthcare.gov
An anonymous reader writes "Soured by his attempt to acquire a quote from healthcare.gov, James Turner compiled a short list of things developers can learn from the experience: 'The first highly visible component of the Affordable Health Care Act launched this week, in the form of the healthcare.gov site. Theoretically, it allows citizens, who live in any of the states that have chosen not to implement their own portal, to get quotes and sign up for coverage. I say theoretically because I've been trying to get a quote out of it since it launched on Tuesday, and I'm still trying. Every time I think I've gotten past the last glitch, a new one shows up further down the line. While it's easy to write it off as yet another example of how the government (under any administration) seems to be incapable of delivering large software projects, there are some specific lessons that developers can take away. 1) Load testing is your friend.'"
Nothing shows up the sheer arbitrariness of a government shutdown than some sites like Healthcare.gov being up, and others being forced to shut down at extra expense when they could have just been left running (and the servers that are there just to tell you the site is shut down are still consuming power and bandwidth).
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Let's have our great media investigate if this is poor planning...or good planning if once the initial load gets through then they didn't overspend on equipment they don't need.
Or if there is a secret effort by the people who want this to fail to hire botnets and hackers to DDOS it... I wouldn't put it past them.
Would be something to see a considerable amount of traffic going out from Newscorp ip addresses into the healthcare.gov servers.
nothing unusual, aside a few million malformed packets...
A feeling of having made the same mistake before: Deja Foobar
Canadian firm hired to build troubled Obamacare exchanges
A Canadian tech firm that has provided service to that country's single-payer health care system is behind the glitch-ridden United States national health care exchange site healthcare.gov.
CGI Federal is a subsidiary of Montreal-based CGI Group. With offices in Fairfax, Va., the subsidiary has been a darling of the Obama administration, which since 2009 has bestowed it with $1.4 billion in federal contracts, according to USAspending.gov.
The "CGI" in the parent company's name stands for "Conseillers en Gestion et Informatique" in French, which roughly translates to "Information Systems and Management Consultants." However, the firm offers another translation: "Consultants to Government and Industry."
The company is deeply embedded in Canada’s single-payer system. CGI has provided IT services to the Canadian Ministries of Health in Alberta, British Columbia, New Brunswick, Quebec and Saskatchewan, as well as to the national health provider, Health Canada, according to CGI's Canadian website.
much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
I went through the site and found it responsive. Possibly the time of day and my western timezone had something to say about it, but had no issues.
Even CNN looks bad when something major happens and everyone hits them at once, despite humming along for months without any issues.
A feeling of having made the same mistake before: Deja Foobar
"Launch" suggests that it actually, you know, worked.
When a quarter million people hit a game company's servers and only half of them get to play, it's a disaster of unrivaled proportions.
When millions of people hit billions of dollars in government investment and a few thousand of them actually get the site to work at all, it's a "learning experience."
Never attribute to malice that which is adequately explained by stupidity.
I'd have a hard time believing that the servers have been this consistently overwhelmed with traffic. A more likely explanation is that a poorly designed system was patched together from components hastily built from a thousand different vendors. The web-app equivalent of a diesel engine held together with duct-tape and baling wire was then rolled out without any real testing.
The only time, "Good enough for government work," has ever escaped my lips was when I was confronted with a marginally functional mess of spaghetti code.
An internal system operation returned the error "The operation completed successfully.".
I thought the consensus from the last story about the shutdown was that the web sites were closed because a server that's turned off is less likely to get 0wn3d without anyone there to fix it.
GTA V? Sim City? Final Fantasy? Battlefield?
Turns out millions of users who start using something on the same day often don't follow the expected and tested for behavior.
Anyone who launches a service like this should expect to spend the first week in triage mode, and the first month making adjustments. I'd like to say proper planning would mean that never occurs, but the only way to insure that would be to spend 10x what is really needed. People would hate the government even worse if they did that.
This is not news, yet. It will be news in a month if it is still fubared.
I've got a personal gripe about folks who think that 'developer' is code for 'guy who's expected to do everything in the project'. Outside of small projects, that's not how it should work in a healthy software development lifecycle.
Developers architect and write code, and some of the topics covered in that short editorial are relevant; use of AJAX necessitates good error handling on the front end, and synchronization of client and server side validations. Sure, they may have a broad skillset besides and understand databases, and graphical design, and so on, but there's no guarantee they're the ones meant to provide those skills.
For example, QA encompasses an incredibly large set of skills, familiarity with a wide range of products, and to be fair, seems to attract folks with a different life philosophy than those who identify themselves as developers. To talk about load testing - which itself is not a simple unit test to be added to a build - as a developer's responsibility, and ignore the vast, separate set of specialized knowledge and experience required to pull it off is ignorance. To include UX and UI design, and say these too are in the developers purview is equally misguided. (in fact, most developers are really, really bad at UI/UX, for some reason)
Not that a developer couldn't do those things, or will automatically lack the knowledge or skills, but those are separate roles and separate disciplines.
So, tell a project manager that they should make sure the QA team does load testing, and tell the project manager that the UI/UX team needs to provide descriptive error messages when validation fails, and so on. Very little of this is important to someone who's currently wearing the 'developer' hat.
The devs are in a pretty interesting situation that you don't see too often.. They're tasked with developing an application that generally can anticipate a low load level, except for one (and only one) extreme peak load. Do you develop for the general case, or the (very important) exception? Remember that the difference between these two options would make a difference in the basic structure of the app. Do you use a traditional RDBMS (perfect for the low load case), or some sort of no-SQL system (possibly necessary for the peak load case)? Remember that you can't leverage any commercial cloud resources either -- these are government records, and there are laws saying they'll have to be housed on government computers.
I didn't make it very deep into the web site. I was mainly interested in reviewing the rates for my county. What a surprise that there was a list with all the states's counties together! I was expecting to fill in my zip code possibly or enter the state and county to get a list of available policies. The resulting table was large enough to generate bandwidth problems. One stupid error in design could saturate their network! A good design would be easier on the users, the network and the servers. Now sometimes you have to trade server time and convenience for user time and convenience, but this was apparently not thought through. Surely someone in the government must realize that good design works better than bad design. If a web site is to be used by millions, it obviously needs a good design.
Ray Seyfarth, ray.seyfarth@gmail.com, http://rayseyfarth.blogspot.com
Did a little sleuthing and discovered they're using an F5 load balancer in front of it (at least my state exchange is). I'm rather shocked that they chose a classical client/server architecture and not say, a cloud architecture for this. This could have been written on Google's cloud or Amazon's or OpenStack even and probably done a much better job of handling this load.
I would surmise that HIPPA requirements may have made cloud architecture problematic.
If a web site is rushed into place on October 1st but there's no reason to sign up until January 1st, wait several weeks before you try use it.
It's not slashdot. There's no advantage to getting FIRST POST!!!
"Why would we believe they could accomplish something on this scale?"
Because they are the only ones who actually have successfully created healthcare systems on that scale, specifically medicare, medicaid, and the VA system.
Never attribute to malice that which is adequately explained by stupidity.
I'd have a hard time believing that the servers have been this consistently overwhelmed with traffic. A more likely explanation is that a poorly designed system was patched together from components hastily built from a thousand different vendors. The web-app equivalent of a diesel engine held together with duct-tape and baling wire was then rolled out without any real testing.
The only time, "Good enough for government work," has ever escaped my lips was when I was confronted with a marginally functional mess of spaghetti code.
You needn't source from multiple vendors to get a system that falls apart under load - single vendor solutions are also susceptible to such problems.. Even if you specify load testing in the contract, that doesn't mean that their load test had any relation to actual real-world load. Of course, the hard part is predidcting what load to expect, especially with a system that has a potential audience of 100+ million people.
Having worked in government offices, I can tell you this is the real problem.
Because there are so many laws about making the government use contractors instead of hiring employees (because private sector is allegedly so much more efficient), damn near everything has to be contracted out. Then the contractors fail to deliver, they go over budget and come in way behind schedule. The government has no choice but to pay them and accept their useless work, again, due to more laws about "helping the private sector".
There's no way to fire a contractor or even to hold them to their original contract. They agreed to do something for a certain price? Too bad, they're going to sue the government and use those biased laws in order to deliver less than half of what they promised at more than 3 times the price they quoted and agreed to.
-1 disagree is not a modifier for a reason. -1 troll, flaimbait, redundant, overrated are NOT acceptable substitutes.
Everyone goes on the assumption that scale is "just make it bigger". I'd like to add some of my own notes on why this launch was doomed from the start.
I used to work for an adult internet company who had massive traffic. We were serving millions of people daily before 2000. We would exceed 10M daily viewers about once a week. That fluctuated by rather consistent calendar influences, like the day of the week, part of the month, and part of the year. Sept 11, 2011 dropped 3/4 of our traffic for almost exactly 2 hours. So we knew how long huge news event would impact us.
To handle 10M customers without a hiccup, we had to consider a lot of things. We didn't do much dynamic content. That's a killer. There were some elements that had to be dynamic, such as the voting/polling systems, message forums, etc. Otherwise, we had to try to keep the pages (html and images) as light as possible.
The hardest abused system we had was user authentication and authorization. We only had a few million users that hit it, but there were thousands of hackers (and script kiddies) that wanted to try to get something for nothing. Come on, it was cheap porn, just pay for it. We could easily see over 10M auth requests per hour. In time, we fine tuned the system, and outright blocked abusive users at the firewall.
The advantage we had was, when I was first in control over the IT work, we'd only see about 1M/day, so we had the luxury of growing it out. We'd watch for the problematic parts, and fix them. What works on your test bed where 10,000 users try it, even if they try hard, it doesn't mean you can put it on 100 servers and expect it to work for 1M users.
healthcare.gov has some other severe disadvantages. From what I understand, they are hitting the SSA database. I don't know if that's an online query to the SSA, or if they're provided a static file to import periodically. I'd assume all kinds of government organizations have put their 2 cents in too. What are they checking identity against? Drivers licenses, SS cards, voter ID, green cards? That means they could be hitting 151+ more databases run by other organizations. Does DHS get the information? Is it fed back to them when a users accesses? Are the checked against law enforcement databases? Only those directly involved in the development will know. You can disregard anything in the privacy statements. You're not going to see a friendly note in the FAQ "If you're a wanted felon, information will be transmitted to the law enforcement organization looking for you." That kind of defeats the purpose.
Depending on load testing never replicates what real users will do. Real users do weird things, just because they can. No amount of planning and testing will give you everything. There is always a lot of reactive work to be done. Shit, everyone reads the FAQ 14 times before logging in? They 20% of the people go through the login screens, back out to the 2nd page, and try again?
I'm stuck on the same non-functional healthcare.gov site as everyone else is. I signed up. I never got an email confirmation or email address verification.
My girlfriend got the verification and signed up again. I was able to present my user:pass and it did seem to say it was valid, but stayed there until I was thrown the overloaded message. Later, it said my user:pass was invalid. Is it really invalid?
I tried to do the username and password recovery. Neither sent me anything, so I assumed my account wasn't made. When signing up again, it said my combination of email, username, and real name was not unique. Ok, so I'm at least partly there.
I signed up again with a different username. This time I received the email verification, and clicking it did say I was confirmed to be a user. I still can't get in. It says my user:pass is wrong. Is there som
Serious? Seriousness is well above my pay grade.
Why would you want to do this? If you had an income that fluctuated each year, would you not save in the good years so you could maintain a reasonable quality of lifestyle in the barren years? Or would you downsize your house and sell your car every other year as your income fluctuated.
Balancing the budget is not the challenge. The real challenge is finding a government that can save when the going is good, and convincing the US electorate of the need for a rainy-day fund, rather than giving it all back and more in tax breaks.
I'm not sure load testing alone would be the solution. For a site like this, I see little point in making the expenditure to handle all the day 0 traffic.
Rather they should have load tested to find out how many users they could safely serve. Then they should have simply restricted the number of active connections. Other users should have seen a static holding page. That way, everyone that gets through gets a good experience.
By adopting this approach, you can save money. And, given the publicity available pre launch, they could easily have explained how this would work so as to manage expectations. After the first few week or so, they would likely be able to manage the traffic comfortably.
It's not a challenge at all. Texas does it. We're required by our state constitution to have a balanced budget, and we only let our legislature meet for 150 days every other year. The result: once they are in session, they're working to hammer out the new budget and fix the real problems, instead of constantly being in session feeling the need to legislate something, messing things up, and wrecking the economy.
It works so great that our economy in Texas attracts a constant stream of refuges fleeing the charred ruins of California's economy and its legislature that occasionally takes a two week break between sessions of wrecking the state.
How about this one, hire an Indian firm to run a government level oracle database without actually testing it or including load-balancing and you're gonna have a bad time.
Blame your horrendous failure on user volume and then call it glitches and you're gonna have a bad time.
List of known issues in order of appearance:
01. security questions not loading.
02. security answers failing validation.
03. email validation tokens timing out instantly.
04. correct passwords failing
05. password reset emails not providing clickable link for reset
06. password reset link loads page which doesn't find the profile it just emailed to.
07. EIDM server crashing and throwing system down errors.
08. oracle server errors.
09. network gateway timeout errors.
10. oracle account manager loading towards public
All of this excluding the actual waiting pages for a website.
This is either gross incompetence or sabotage.
They're using their grammar skills there.
It's not a challenge at all. Texas does it. We're required by our state constitution to have a balanced budget, and we only let our legislature meet for 150 days every other year. The result: once they are in session, they're working to hammer out the new budget and fix the real problems, instead of constantly being in session feeling the need to legislate something, messing things up, and wrecking the economy.
Yeah. They never feel the need to legislate something, right? Only work to fix the real problems? They'd never decide that they needed a bit of extra time to legislate something just because they felt the need, right?
I'll just leave this here for people who maybe aren't absolute morons:
http://en.wikipedia.org/wiki/Wendy_Davis_(politician)#2013_filibuster
I just successfully logged in. to a blank page.
They're using their grammar skills there.
This is exactly what I have seen over the last couple of decades. Your comments seem to be directed at contracted projects, but I see ongoing federal contracts that hire minimum wage employees to replace skilled federal employees. The costs are more than the costs to hire federal employees and the corporation pockets a nice profit, but the services are substandard. Contractors are supposedly an overall cost savings because if the need for the work moves or disappears, there are no federal employees to move or RIF. The problem is that some of these contracts have been ongoing for decades, and are coming close to the length of a federal employee's entire career!
Federal contracts do NOT save money, but they do profit the corporations that donate to politicians' political campaigns.
...if I hadn't once lived in California and now live in a state with a functional state government. If you think Cali has anything but a horribly dysfunctional government with bottom of the barrel public schools, badly maintained roads, ridiculously high taxes (income, sales...) and unfair and arbitrary justice system, well, I think your standards are low.
I predict the way you're using two digits to count the errors is going to turn into a scalability limit.