UK's National Health Service Moves To NoSQL Running On an Open-Source Stack
An anonymous reader sends this news from El Reg:
The U.K.'s National Health Service has ripped the Oracle backbone from a national patient database system and inserted NoSQL running on an open-source stack. Spine2 has gone live following successful redevelopment including redeployment on new, x86 hardware. The project to replace Spine1 had been running for three years with Spine2 now undergoing a 45-day monitoring period. Spine is the NHS’s main secure patient database and messaging platform, spanning a vast estate of blades and SANs. It logs the non-clinical information on 80 million people in Britain – holding data on everything from prescriptions and payments to allergies. Spine is also a messaging hub, serving electronic communications between 20,000 applications that include the Electronic Prescription Service and Summary Care Record. It processes more than 500 complex messages a second.
Is this a big IT project that actually worked? Where's my fainting couch???
I can't help but get the feeling that within a few months they'll be running back to Oracle or some other real database system.
At this point, anyone who works with databases in industry knows that "NoSQL" has come to mean inconsistent data, corrupted data, and silently lost data.
One just can't throw away atomicity, consistency, isolation and durability without running into some serious problems.
And that's totally ignoring how it becomes damn near impossible effectively query NoSQL databases. Sorry, writing complex queries in some imperative subset of JavaScript is totally the wrong way of doing things. Intentionally not learning SQL takes more effort than learning how to use it!
As service-user I've always had the impression that the NHS database was a large Excel workbook and a load of VB macros written by interns.
Obviously you have never worked with HL7. One message will have hundreds, if not thousands of pieces of data.
..., they actually rolled out something., Didn't a huge replacement project runs for years and years, soak up bazillions and then get cancelled? But maybe that's the 'clinical' side of things. Yes, here it us .. http://www.theguardian.com/soc...
"The greatest lesson in life is to know that even fools are right sometimes" - Winston Churchill
At least these ghosts are just being kept healthy, if they lived in Chicago, they'd probably be voting too.
This issue is a bit more complicated than you think.
Obviously you have never worked with HL7. One message will have hundreds, if not thousands of pieces of data.
Yeah - at least in the US and Canada, even parsing HL7 transactions can be a pain. Different rules and practices in different hospitals, inconsistent rules and practices within the same hospital, apparently contradictory transactions, out of order transactions... I predict a royal mess with NoSQL. With Relational they had at least some assurance that what was read out of the DB was an accurate representation of what was put into it.
'The Economy' is a giant Ponzi scheme whose most pitiable suckers are the youngest among us and the yet-unborn.
HL7? I thought Valve couldn't even count to HL3.
Oh, you meant that HL7.
The "NoSQL means Not Only SQL" crap you're shitting out is nothing more than the NoSQL community frantically backtracking after their "NoSQL means No SQL" ideas were shown to be disastrous bunk.
Instead of owning up to the fact that they were horribly, horribly wrong, and made some really fucking stupid suggestions, the NoSQLers have just decided to change history. They pretend that they weren't saying what they very clearly said in the past. And they obviously need to admit that SQL and relational databases are the only viable option, but can't do this without looking like the fools that they are, so they admit that it's okay to use "sometimes". And this "sometimes" ends up being "all the time", but again, they can't openly say that without looking like the incompetents that they are.
Face it, "NoSQL" does mean "No SQL". It always has, and it always will. No amount of backtracking will change the fact that the NoSQL crowd was full of shit, and still is.
Summary says: "It logs the non-clinical information on 80 million people in Britain"
Well, yes it does hold clinical information. That is a big deal.
From the UK's HSCIC web site there's more (and authoritative) information on SPINE
http://systems.hscic.gov.uk/sp...
"The Summary Care Record:
SCRs provide emergency and out-of-hours healthcare professionals with faster access to key clinical information, including details of allergies, current prescriptions and bad reactions to medicines. The Summary Care Record helps to ensure continuity of care across a variety of care settings, and is provided by the Spine."
Having or losing corrupt information in a clinical record is a good way to kill some random person. However, it is a summary, so if a physician suspects a problem in the summary, they can go to the patient's main record. Getting prescriptions crossed can also be problematic for the patient.
Ignoring the NOSQL issue, I wish we had something like SPINE here in the USA.
"It logs the non-clinical information on 80 million people in Britain " when the population of Britain is about 64 million.
I just interviewed with one of the largest healthcare focused tech companies in the US, Epic Systems. On of the more interesting things I learned while I was there was that they use InterSystems Caché, a non-relational system that's built on b-trees instead of tables. The main draw of this system is the speed at which they are able to operate, which is one of the big things they've built their reputation on. They claimed while I was there that roughly 47-49% of Americans are covered by Epic's software at some point. Now, obviously that's not just records stored in databases they designed, implemented, and support, but, especially considering that Epic targets medium to large healthcare companies, with very little involvement with smaller outfits, and the fact that they do their best not to parcel out their software, but to sell integrated top to bottom systems... well, they seem to not only be doing fine without a relational system, but thriving. I don't work for them, so I can't say any more than that since I don't have experience, but I just thought it might be of interest in relation to the relational/non-relational debate in this thread.
That dropping ACID is not hazardous to your health.
> Sorry, writing complex queries in some imperative subset of JavaScript is totally the wrong way of doing things. Intentionally not learning SQL takes more effort than learning how to use it!
With 80 million records and heavy load, the number one priority is not "make it easy for any teenager to write queries ".
I system that requires the programmer to think things through, and therefore write an efficient query, is better in some cases.
Just as manually chosen mutexes are sometimes better than automatic full-column lovks actoss 80 million rows.
Easy isn't always best, my friend.
Sorry, I'm not the AC using the expanded ACID acronym names. Regardless of how they're referred to, they aren't buzzwords. They are the essential properties of any modern and safe database system. Anyone who insists on them all being present is totally justified, and totally correct.
I know I shouldn't feed the trolls, but I'm bored and can't help myself.
acid isn't so important when the unit is a patient's records. there is also no need for a rigid data model.
This is unbelievable. Holy fuck, I sure hope that you don't work with databases professionally. I hope you don't work with them as a hobby! Nobody with an ounce of intelligence and even a minute of working with data would ever consider saying something as utterly stupid as what you just said.
As someone who actually has worked with patient data in hospitals, he is pretty on the money regarding the non-structured nature of some patient records. Full ACID compliance is not that important in many cases, often a proper audit trail will suffice. It is similar to banking transactions, which are almost never ACID (despite being used in so many textbook examples of ACID compliance).
One difference between an amateur and professional is knowing how to balance a system's requirements and create a design that actually fit the system's needs. Strict adherence to some guidelines is just plain stupid.
-- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
I use the Electronic Prescription Service (EPS) component of the spine and take issue with the successful claim. The upgrade has been appalling.
It was rolled out over the UK's August bank holiday, with no advance notification. After the holiday, prescriptions pulled down from the spine (they haven't implemented push messaging ... ) had invalid digital signatures, rendering them illegal. Prescriptions that had been completed and payment claimed for in Jan 14 were redownloaded from the spine. Post dated prescriptions for October also began appearing. These are only supposed to be downloadable on after the valid date for obvious reasons.
Not only was this a logistical nightmare, some issues are still broken after two weeks.
I am amazed that so many issues got through testing.
Utter shambles.
Do you even know what ACID means?
Atomicity - either everything is committed or nothing is. I find it crucial to avoid inconsistent states.
Consistency - data is valid in all states.
Isolation - ensure that concurrent transactions work correctly.
Durability - no data is lost due to software crashes or power failures
How could these not be important for banking is beyond me.
What is a complex message? One with a real part and an imaginary part.
Good, inexpensive web hosting
If you're storing data, you need to use a system that provides atomicity, consistency, isolation and durability. Using anything less is pure idiocy. [etc, etc]
They are using Riak which is currently being used by 25% of the Fortune 50 (fifty, not five-hundred).
The CAP theorem states there is a trade off between: Consistency, Availability, and Partitioning tolerance. Riak sacrifices consistency (although it does have eventual consistency) in favor of availability and partitioning. The people who wrote Riak (in Erlang) actually seem to be very smart. They say they are firmly in the "right-technology-for-the-right-job" camp. They are not crusading to replace all RDBMS with NoSQL.
The availability and partitioning tolerance of Riak are amazing. For certain applications these strengths greatly outweigh sacrifices in atomicity and consistency. Due to the CAP theorem, there is no one single database architecture that will be optimal for all applications. Granted, a completely different mindset is needed to use Riak if your previous database experience is all RDBMS.
From a cursory look, Riak seems to have some excellent documentation. I suggest you look at their page that explains the trade offs between using Riak and a traditional RDBMS. It also contains links to similar documentation.
We don't see the world as it is, we see it as we are.
-- Anais Nin
I also suggest you read CAP Twelve Years Later: How the "Rules" Have Changed by Eric Brewer. He concludes with:
In general, because of communication delays, the banking system depends not on consistency for correctness, but rather on auditing and compensation. Another example of this is "check kiting," in which a customer withdraws money from multiple branches before they can communicate and then flees. The overdraft will be caught later, perhaps leading to compensation in the form of legal action.
You can claim Eric Brewer is a fucking idiot as much as you want. Eventually all you will do is destroy your own credibility.
We don't see the world as it is, we see it as we are.
-- Anais Nin
I have worked for a health insurer in UK that treated ACID compliance as a bonus, not a requirement. At the time I left them, they had a whole "data correction team" - 12 people working full-time to do live SQL queries to fix database inconsistencies. I wish I made this up, but it's real. If this is considered acceptable practice, I don't want to work in this industry ever again.
What we would indeed need, is the multi-datacenter capability. Which you get for free with Cassandra... We also sorely needed performance a few years ago (15k SAS drives was slow after an internet hiccup for example), but SSD drives helped in that. Again we could get infinite scalability with Cassandra for free.
You must choose in such a situation: either the - only theoretically needed - ACID, or the actually performing and highly available NoSQL with its additional operations, coding burden?
Given the application, I imagine most of the data stored is of the schema:
Patient NHS ID number
Patient data.
where the ID is the standard ID we all have, and the
"data" s a huge lump of XML. This is probably why it was easy to dump Oracle for a NoSQL DB - if you only store 2 columns in each table a migration is trivial.
There are certain ways ACID compliance is important and certain ways it is not, in fact sometimes it's a hinderance. In particular the following:
One patient's records must be consistent only with itself, you don't need the whole patient table to be consistent. It's a problem because you do need to have cross-table consistency (patients, episodes, diagnosises, treatments, medications and so on) which can lead to locking issues while they're really millions of records living in parallel. Really I'd like to treat them as millions of microtables that happen to share the same structure but never cross lock.
Perhaps in a hospital you can do synchronization at a database level but for an exchange or common journal you have to assume records come in asynchroneously, your general physician might finish some paperwork while you're in emergency care at the same time as a lab result you've waited a week for comes in. The actual ordering they're applied in doesn't matter, there must be rules so (A,B,C), (C,A,B) and (B,C,A) all end up the same result. This means you can relax the hard synchronization of for example a bank account where it is essential that the transactions are applied in order and rejected if you're overdrawing your account, but that's hard in SQL.
That doesn't just apply to the ordering of writes but also querying. If two people at different hospitals tries to pull up your medical records it is important they're not corrupt but it's not essential that an update being distributed is presented to both or none. In fact, for essential robustness they should be able to continue working independently if the connection is broken and when the connection is restored the records are reconciled. That kind of shard and merge is generally a problem relational databases don't handle while the distributed synchronization is rather essential and implicit in NoSQL solutions.
Live today, because you never know what tomorrow brings
If a bank doesn't care about ACID, which means it doesn't care about losing completed transactions, which means losing track *OUR* money so they can get more profit.
Perhaps this is where you have gone astray. The opposite of ACID is BASE where the "E" stands for eventual consistency. The beauty of this is that it DOES NOT lose completed transactions and at the same time it allows for high availability.
Strict consistency (the "C" in ACID) is a much more stringent requirement than eventual consistency. In particular it conflicts with high availability. This is the essence of the CAP theorem. In many industries, including banking, eventual consistency plus high availability (NoSQL) is preferable to strict consistency plus lower availability (RDBMS). Of course there are many other factors involved in selecting a database architecture.
One way to see this is by noting the three typical things you can do at an ATM: deposit, withdrawal, and show balance, commute (in a sense) when you are only worried about eventual consistency but they don't commute when you require strict consistency. This is why relaxing the requirement to eventual consistency gives you higher availability (when the database is partitioned). Transactions can be logged and later merged when the partition has healed. It is true that "show balance" does not strictly commute with deposits and withdrawals but: a) this does not cause the system to lose track of your money, and b) no one expects it to strictly commute. There is usually a warning that it may take X hours or days before a transaction shows up on your balance. IOW the balance will eventually be correct after you stop making transactions.
The strict consistency alternative you think is better will mean that all ATMs have to stop working whenever the database is partitioned. For most customers this is totally unacceptable especially since the only value it adds is ensuring that the "show balance" function always includes all of the latest transactions. Even the average person on the street would tell you this approach is really "stupid". No one wants the ATMs to be broken most of the time just to be sure "show balance" is always perfectly up to date.
We don't see the world as it is, we see it as we are.
-- Anais Nin
There clearly seems to be a failure of communication here. Since you did not like my dumbed down explanation, perhaps you would prefer to hear what Eric Brewer has to say. He seems to have gotten a whole lot of awards for someone who is a "NoSQL nutter".
Eric Brewer on Why Banks are BASE Not ACID - Availability Is Revenue:
Myth: Money is important, so banks must use transactions to keep money safe and consistent, right?
Reality: Banking transactions are inconsistent, particularly for ATMs. ATMs are designed to have a normal case behaviour and a partition mode behaviour. In partition mode Availability is chosen over Consistency.
There are more details here and in many other places.
Acquainting a traditional RDBMS with a phrase like 'lower availability' just highlights to kind of twilight zone you start getting into when talking to any of the NoSQL crowd.
Are you saying you think the CAP theorem is false? I'm assuming large distributed data sets so partitioning is inevitable. According to CAP this means there is a trade off between consistency and availability. RDBMS provide strong consistency so they cannot also provide high availability when there is partitioning.
You didn't work on Mt Gox's systems at any point did you?
Sarcastic ad hominem attacks are an extremely poor substitute for reasoned debate.
We don't see the world as it is, we see it as we are.
-- Anais Nin
You're misunderstanding what's been written in that article. This is exactly the scenario that banks *have* to prevent before and as it happens.
These excerpts from one of Brewer's talks seem to substantiate my "misunderstanding": Eric Brewer on Why Banks are BASE Not ACID - Availability Is Revenue
segedunum:
Chasing around for compensation later cannot be an option in many cases because it is going to be abused.
When the system is functioning normally, the difference between strong consistency and eventual consistency is on the order of a few milliseconds. I don't think that leaves much of a window for abuse. The fundamental question is what do you do when there is partitioning? Or as you call it, system degradation. If you take an ACID approach then you shut down everything until the partitioning has been repaired. If you take a BASE approach then you still provide at least some functionality by sacrificing strong consistency. The CAP theorem says you cannot have both strong consistency and availability when there is partitioning.
Whatever system you use locally will be checked live, usually with a mainframe based system that is ACID compliant. If that isn't possible then you have a gradual system degradation where only certain types of transactions are processed.
The fact that you have any functionality at all when there is non-trivial degradation is due to using an overall BASE strategy instead of an ACID strategy. I have no doubt that one or more ACID databases are used as parts of the system but an overall BASE strategy is used by banks when there is partitioning (system degradation).
Remember, this thread started with an AC claiming that you would have to be an idiot to use anything other than ACID for storing data. People responded by saying there is also a place for BASE systems and that the banking industry uses an overall BASE strategy. Perhaps I misunderstand what you are saying but it seems like you are saying that as long as an ACID database is part of the system (or a central part of the system) then the overall system must be ACID which makes little sense to me.
I don't think anyone here is suggesting:
the article is [...] a carte blanche to justify NoSQL systems or to do away with any core systems that compromise ACID at their heart.
The point I've been trying to make is that just like there is a place for ACID systems there is also a place for BASE systems. In addition, as the data sets become larger and more complex and more spread out, the ACID approach becomes more and more untenable due to the CAP theorem. For most (but not all) cases, high-availability and eventual consistency will trump strong consistency.
We don't see the world as it is, we see it as we are.
-- Anais Nin