What Developers Can Learn From Healthcare.gov
An anonymous reader writes "Soured by his attempt to acquire a quote from healthcare.gov, James Turner compiled a short list of things developers can learn from the experience: 'The first highly visible component of the Affordable Health Care Act launched this week, in the form of the healthcare.gov site. Theoretically, it allows citizens, who live in any of the states that have chosen not to implement their own portal, to get quotes and sign up for coverage. I say theoretically because I've been trying to get a quote out of it since it launched on Tuesday, and I'm still trying. Every time I think I've gotten past the last glitch, a new one shows up further down the line. While it's easy to write it off as yet another example of how the government (under any administration) seems to be incapable of delivering large software projects, there are some specific lessons that developers can take away. 1) Load testing is your friend.'"
No accountability of the contractors, no accountability of those who were to oversee the contractors and no accountability of the people who were to oversee those overseeing the contractors.
and I was ønce bitten by a møøse nø realli!
A feeling of having made the same mistake before: Deja Foobar
Nothing shows up the sheer arbitrariness of a government shutdown than some sites like Healthcare.gov being up, and others being forced to shut down at extra expense when they could have just been left running (and the servers that are there just to tell you the site is shut down are still consuming power and bandwidth).
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Let's have our great media investigate if this is poor planning...or good planning if once the initial load gets through then they didn't overspend on equipment they don't need.
Or if there is a secret effort by the people who want this to fail to hire botnets and hackers to DDOS it... I wouldn't put it past them.
Would be something to see a considerable amount of traffic going out from Newscorp ip addresses into the healthcare.gov servers.
nothing unusual, aside a few million malformed packets...
A feeling of having made the same mistake before: Deja Foobar
Canadian firm hired to build troubled Obamacare exchanges
A Canadian tech firm that has provided service to that country's single-payer health care system is behind the glitch-ridden United States national health care exchange site healthcare.gov.
CGI Federal is a subsidiary of Montreal-based CGI Group. With offices in Fairfax, Va., the subsidiary has been a darling of the Obama administration, which since 2009 has bestowed it with $1.4 billion in federal contracts, according to USAspending.gov.
The "CGI" in the parent company's name stands for "Conseillers en Gestion et Informatique" in French, which roughly translates to "Information Systems and Management Consultants." However, the firm offers another translation: "Consultants to Government and Industry."
The company is deeply embedded in Canada’s single-payer system. CGI has provided IT services to the Canadian Ministries of Health in Alberta, British Columbia, New Brunswick, Quebec and Saskatchewan, as well as to the national health provider, Health Canada, according to CGI's Canadian website.
much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
The biggest takeaway though, is that the way that the federal government bids out software is fundamentally broken. There are clearly companies in the industry who understand exactly the kind of problems that healthcare.gov needed to address. Intuit’s online TurboTax is much more complicated than the sign-up process for healthcare, and it works under heavy load. Amazon and Google both handle crushing loads gracefully as well. Why can’t the government draw on this kind of expertise when designing a site as critical to the public as healthcare.gov, rather than farming it out to the lowest bidder?
Although it's not entirely right.....government contracts are more complicated than 'going to the lowest bidder.'
"First they came for the slanderers and i said nothing."
From the list, one of the items casually mentions that usernames require numbers. What? I've never heard of a requirement like that from any other consumer system, ever.. they may suggest it (like YourName024 when a prior user has already used YourName) but do not require it.
If they worry about uniqueness, just use email addresses as logins.
"There is more worth loving than we have strength to love." - Brian Jay Stanley
That would be an even more stupid idea than Newscorp buying MySpace.
Project Managers can learn giving only minimal time for QA, at the very end of the project, with no time allotted for corrections is bad practice.
"Launch" suggests that it actually, you know, worked.
When a quarter million people hit a game company's servers and only half of them get to play, it's a disaster of unrivaled proportions.
When millions of people hit billions of dollars in government investment and a few thousand of them actually get the site to work at all, it's a "learning experience."
Never attribute to malice that which is adequately explained by stupidity.
I'd have a hard time believing that the servers have been this consistently overwhelmed with traffic. A more likely explanation is that a poorly designed system was patched together from components hastily built from a thousand different vendors. The web-app equivalent of a diesel engine held together with duct-tape and baling wire was then rolled out without any real testing.
The only time, "Good enough for government work," has ever escaped my lips was when I was confronted with a marginally functional mess of spaghetti code.
An internal system operation returned the error "The operation completed successfully.".
I thought the consensus from the last story about the shutdown was that the web sites were closed because a server that's turned off is less likely to get 0wn3d without anyone there to fix it.
If they worry about uniqueness, just use email addresses as logins.
That's exploitable when you leave your ISP, someone else claims your username at that ISP, and your old ISP-provided e-mail address now points to another person.
GTA V? Sim City? Final Fantasy? Battlefield?
Turns out millions of users who start using something on the same day often don't follow the expected and tested for behavior.
Anyone who launches a service like this should expect to spend the first week in triage mode, and the first month making adjustments. I'd like to say proper planning would mean that never occurs, but the only way to insure that would be to spend 10x what is really needed. People would hate the government even worse if they did that.
This is not news, yet. It will be news in a month if it is still fubared.
That would be an even more stupid idea than Newscorp buying MySpace.
Project Managers can learn giving only minimal time for QA, at the very end of the project, with no time allotted for corrections is bad practice.
"Are we meeting with some network engineers, tech writers and systems analysts?"
"No, we are meeting with a bunch of appointees who know next to nothing about the guts of the project.
"Great... we may as well watch cartoons."
A feeling of having made the same mistake before: Deja Foobar
I've got a personal gripe about folks who think that 'developer' is code for 'guy who's expected to do everything in the project'. Outside of small projects, that's not how it should work in a healthy software development lifecycle.
Developers architect and write code, and some of the topics covered in that short editorial are relevant; use of AJAX necessitates good error handling on the front end, and synchronization of client and server side validations. Sure, they may have a broad skillset besides and understand databases, and graphical design, and so on, but there's no guarantee they're the ones meant to provide those skills.
For example, QA encompasses an incredibly large set of skills, familiarity with a wide range of products, and to be fair, seems to attract folks with a different life philosophy than those who identify themselves as developers. To talk about load testing - which itself is not a simple unit test to be added to a build - as a developer's responsibility, and ignore the vast, separate set of specialized knowledge and experience required to pull it off is ignorance. To include UX and UI design, and say these too are in the developers purview is equally misguided. (in fact, most developers are really, really bad at UI/UX, for some reason)
Not that a developer couldn't do those things, or will automatically lack the knowledge or skills, but those are separate roles and separate disciplines.
So, tell a project manager that they should make sure the QA team does load testing, and tell the project manager that the UI/UX team needs to provide descriptive error messages when validation fails, and so on. Very little of this is important to someone who's currently wearing the 'developer' hat.
The devs are in a pretty interesting situation that you don't see too often.. They're tasked with developing an application that generally can anticipate a low load level, except for one (and only one) extreme peak load. Do you develop for the general case, or the (very important) exception? Remember that the difference between these two options would make a difference in the basic structure of the app. Do you use a traditional RDBMS (perfect for the low load case), or some sort of no-SQL system (possibly necessary for the peak load case)? Remember that you can't leverage any commercial cloud resources either -- these are government records, and there are laws saying they'll have to be housed on government computers.
Odd, in my state it worked fine...no, wait a minute, it's only Oct. 4th, who in their right mind with technical savvy or experience would access such a new product in the first week of it's availability?
I live in one of the most population dense states. My current health insurance is paid up through the end of the month. I won't be accessing the exchange for three weeks yet because everything in the article is obvious, but even if implemented within the time constraints to the best of their ability, will still probably have issues in the first few days.
Duh.
I didn't make it very deep into the web site. I was mainly interested in reviewing the rates for my county. What a surprise that there was a list with all the states's counties together! I was expecting to fill in my zip code possibly or enter the state and county to get a list of available policies. The resulting table was large enough to generate bandwidth problems. One stupid error in design could saturate their network! A good design would be easier on the users, the network and the servers. Now sometimes you have to trade server time and convenience for user time and convenience, but this was apparently not thought through. Surely someone in the government must realize that good design works better than bad design. If a web site is to be used by millions, it obviously needs a good design.
Ray Seyfarth, ray.seyfarth@gmail.com, http://rayseyfarth.blogspot.com
Did a little sleuthing and discovered they're using an F5 load balancer in front of it (at least my state exchange is). I'm rather shocked that they chose a classical client/server architecture and not say, a cloud architecture for this. This could have been written on Google's cloud or Amazon's or OpenStack even and probably done a much better job of handling this load.
I would surmise that HIPPA requirements may have made cloud architecture problematic.
If a web site is rushed into place on October 1st but there's no reason to sign up until January 1st, wait several weeks before you try use it.
It's not slashdot. There's no advantage to getting FIRST POST!!!
"Why would we believe they could accomplish something on this scale?"
Because they are the only ones who actually have successfully created healthcare systems on that scale, specifically medicare, medicaid, and the VA system.
Never attribute to malice that which is adequately explained by stupidity.
I'd have a hard time believing that the servers have been this consistently overwhelmed with traffic. A more likely explanation is that a poorly designed system was patched together from components hastily built from a thousand different vendors. The web-app equivalent of a diesel engine held together with duct-tape and baling wire was then rolled out without any real testing.
The only time, "Good enough for government work," has ever escaped my lips was when I was confronted with a marginally functional mess of spaghetti code.
You needn't source from multiple vendors to get a system that falls apart under load - single vendor solutions are also susceptible to such problems.. Even if you specify load testing in the contract, that doesn't mean that their load test had any relation to actual real-world load. Of course, the hard part is predidcting what load to expect, especially with a system that has a potential audience of 100+ million people.
Everyone goes on the assumption that scale is "just make it bigger". I'd like to add some of my own notes on why this launch was doomed from the start.
I used to work for an adult internet company who had massive traffic. We were serving millions of people daily before 2000. We would exceed 10M daily viewers about once a week. That fluctuated by rather consistent calendar influences, like the day of the week, part of the month, and part of the year. Sept 11, 2011 dropped 3/4 of our traffic for almost exactly 2 hours. So we knew how long huge news event would impact us.
To handle 10M customers without a hiccup, we had to consider a lot of things. We didn't do much dynamic content. That's a killer. There were some elements that had to be dynamic, such as the voting/polling systems, message forums, etc. Otherwise, we had to try to keep the pages (html and images) as light as possible.
The hardest abused system we had was user authentication and authorization. We only had a few million users that hit it, but there were thousands of hackers (and script kiddies) that wanted to try to get something for nothing. Come on, it was cheap porn, just pay for it. We could easily see over 10M auth requests per hour. In time, we fine tuned the system, and outright blocked abusive users at the firewall.
The advantage we had was, when I was first in control over the IT work, we'd only see about 1M/day, so we had the luxury of growing it out. We'd watch for the problematic parts, and fix them. What works on your test bed where 10,000 users try it, even if they try hard, it doesn't mean you can put it on 100 servers and expect it to work for 1M users.
healthcare.gov has some other severe disadvantages. From what I understand, they are hitting the SSA database. I don't know if that's an online query to the SSA, or if they're provided a static file to import periodically. I'd assume all kinds of government organizations have put their 2 cents in too. What are they checking identity against? Drivers licenses, SS cards, voter ID, green cards? That means they could be hitting 151+ more databases run by other organizations. Does DHS get the information? Is it fed back to them when a users accesses? Are the checked against law enforcement databases? Only those directly involved in the development will know. You can disregard anything in the privacy statements. You're not going to see a friendly note in the FAQ "If you're a wanted felon, information will be transmitted to the law enforcement organization looking for you." That kind of defeats the purpose.
Depending on load testing never replicates what real users will do. Real users do weird things, just because they can. No amount of planning and testing will give you everything. There is always a lot of reactive work to be done. Shit, everyone reads the FAQ 14 times before logging in? They 20% of the people go through the login screens, back out to the 2nd page, and try again?
I'm stuck on the same non-functional healthcare.gov site as everyone else is. I signed up. I never got an email confirmation or email address verification.
My girlfriend got the verification and signed up again. I was able to present my user:pass and it did seem to say it was valid, but stayed there until I was thrown the overloaded message. Later, it said my user:pass was invalid. Is it really invalid?
I tried to do the username and password recovery. Neither sent me anything, so I assumed my account wasn't made. When signing up again, it said my combination of email, username, and real name was not unique. Ok, so I'm at least partly there.
I signed up again with a different username. This time I received the email verification, and clicking it did say I was confirmed to be a user. I still can't get in. It says my user:pass is wrong. Is there som
Serious? Seriousness is well above my pay grade.
Why would you want to do this? If you had an income that fluctuated each year, would you not save in the good years so you could maintain a reasonable quality of lifestyle in the barren years? Or would you downsize your house and sell your car every other year as your income fluctuated.
Balancing the budget is not the challenge. The real challenge is finding a government that can save when the going is good, and convincing the US electorate of the need for a rainy-day fund, rather than giving it all back and more in tax breaks.
Since it's the same government that paves our roads, funds our schools, cleans our water, forecasts the weather, explores space, prosecutes our criminals, and extinguishes our fires, yes. We may as well add "heals the injured" and "cures the sick" to that as well.
Ours is the worst form of government except all the others that have been tried. Sure, we've got problems - big ones - but we are not doomed. The Great Experiment continues.
You do not have a moral or legal right to do absolutely anything you want.
It's not a challenge at all. Texas does it. We're required by our state constitution to have a balanced budget, and we only let our legislature meet for 150 days every other year. The result: once they are in session, they're working to hammer out the new budget and fix the real problems, instead of constantly being in session feeling the need to legislate something, messing things up, and wrecking the economy.
It works so great that our economy in Texas attracts a constant stream of refuges fleeing the charred ruins of California's economy and its legislature that occasionally takes a two week break between sessions of wrecking the state.
Note that most of "paves our roads", "cleans our water", "prosecutes our criminals", and "extinguishes our fires" is done by our State governments, NOT the Federal government.
"I do not agree with what you say, but I will defend to the death your right to say it"
How about this one, hire an Indian firm to run a government level oracle database without actually testing it or including load-balancing and you're gonna have a bad time.
Blame your horrendous failure on user volume and then call it glitches and you're gonna have a bad time.
List of known issues in order of appearance:
01. security questions not loading.
02. security answers failing validation.
03. email validation tokens timing out instantly.
04. correct passwords failing
05. password reset emails not providing clickable link for reset
06. password reset link loads page which doesn't find the profile it just emailed to.
07. EIDM server crashing and throwing system down errors.
08. oracle server errors.
09. network gateway timeout errors.
10. oracle account manager loading towards public
All of this excluding the actual waiting pages for a website.
This is either gross incompetence or sabotage.
They're using their grammar skills there.
It's not a challenge at all. Texas does it. We're required by our state constitution to have a balanced budget, and we only let our legislature meet for 150 days every other year. The result: once they are in session, they're working to hammer out the new budget and fix the real problems, instead of constantly being in session feeling the need to legislate something, messing things up, and wrecking the economy.
Yeah. They never feel the need to legislate something, right? Only work to fix the real problems? They'd never decide that they needed a bit of extra time to legislate something just because they felt the need, right?
I'll just leave this here for people who maybe aren't absolute morons:
http://en.wikipedia.org/wiki/Wendy_Davis_(politician)#2013_filibuster
Tuesday I did the signup process, filled in all the information 3 times. Then I figured out that I could just hit the "back" button to go back to the security questions page and hit submit again. Finally got registered about 9PM, then got the validation email and clicked on that several times until it was finally accepted at 10:30PM.
And I've been trying and failing to login ever since.
So why should I have to go through all that just to get prices and find out which doctors are in their plan? On Ebay, Amazon, or just about any ecommerce site I can get the product description and price straight from a Google search. I only have to go through the registration/login hassle if I actually want to buy something. If they would just provide the plan information with a simple static html page I could get the information I want, stop hammering on their servers, decide what to do, and come back next month if I decide I want to buy.
* Off-topic: If the program is even moderately successful, I suspect certain politicians will regret working so hard to ensure that Obama's name is forever attached to it.
I just successfully logged in. to a blank page.
They're using their grammar skills there.
I'm going to guess that the lion's share of that money went to requirements gathering. A site like this which has to pull in data from dozens of different companies is going to have a lot of stakeholders. The consulting time for analysts and PM's to compile all of the user stories must have been immense. The actual development on the website itself doesn't look like it could have consumed more than a couple of million. That being said, my team developed about a dozen sites per year of comparable complexity (though not approaching that scale) on a budget of about 5 million, including all of the project management and requirements documentation on top of development, testing, administration and support.
So yeah, I would have like to have had a shot at building the thing for $54 million. A little voice is whispering in my ear that I might have been taking home about half that amount for myself. According to the article you link, they are only getting $137 per hour for the lead technical architect. That seems pretty cheap for a consultant in that role on a project of this size. Heck, they bill out their account manager at $202 per hour. Oh, and they point out that they'll be getting all of their insurance plan info from eHealth.com So never mind about all that consulting time to gather requirements from all of the insurance companies on the exchange.
Oh, and another point on the scale - with a population in Washington of just under 7 million and only 5% on individual plans and another 14% uninsured, the target user base is for under 1.4 million people, presumably many of whom are in family groups - so call it less than a million users total. That's big, but it isn't that big. They probably assumed peak usage at under 1% of the target audience and got it wrong by an order of magnitude because of the general curiosity.
I'm suggesting that the funding of healthcare.gov is through a separate bill and is thus not affected by the lack of a continuing resolution for fiscal 2014.
...if I hadn't once lived in California and now live in a state with a functional state government. If you think Cali has anything but a horribly dysfunctional government with bottom of the barrel public schools, badly maintained roads, ridiculously high taxes (income, sales...) and unfair and arbitrary justice system, well, I think your standards are low.
Texas has the federal government to fall back on in case of, for example, natural disaster. The federal government doesn't have such a safety net; it must self-insure. On top of that, the federal government has to be prepared for contingencies such as war that do not really apply at the state level.
The period of time, one year, is arbitrary. Requiring a balanced yearly federal budget would be like requiring a balanced personal budget every two week pay period, even though my biggest expenses occur monthly.
What we really need is some way to balance the federal budget over a much longer period of time, a decade or two perhaps, spanning a full boom/bust cycle. This is, of course, much easier said than done.
Uh, yeah.
Wendy was a voice of reason during that debacle; everybody else just glared at Texas and shook their heads.
Texas: what a country.
Need Mercedes parts ?
I predict the way you're using two digits to count the errors is going to turn into a scalability limit.
> I predict the way you're using two digits to count the errors is going to turn into a scalability limit.
Not if the error sequence number follows the convention used in IBM RPG/400 1.1.4.4. "Sequence Numbering of the Listing after a Compile" ... "The high order 2 digits of the sequence number are made up of the characters A through Z and 0 through 9 in the following order: A, B, C, ..., Z, 1, 2, ..., 9, A0, AA, AB, ..., AZ, A1, A2, ..., A9, B0, BA, ..., ZZ, ..., Z9, 10, ..., 99. This structure allows for up to 1295 different increments of the high order sequence number. " ... it is worth noting that this counting sequence does not sort properly in ASCII or even native EBCDIC [A9,B0,BA] which leads Real Programmers away from the messy realms of real-world problems into the comfortable zone of devising elaborate workarounds for problems they had created.
Sometimes delving into the structure of ancient computer architectures and programming languages yields new and clever insights into old problems. This is not one of those times.
<blink>down the rabbit hole</blink>