Remus Project Brings Transparent High Availability To Xen

← Back to Stories (view on slashdot.org)

Remus Project Brings Transparent High Availability To Xen

Posted by timothy on Wednesday November 11, 2009 @10:48AM from the when-servers-go-south-a-song dept.

An anonymous reader writes "The Remus project has just been incorporated into the Xen hypervisor. Developed at the University of British Columbia, Remus provides a thin layer that continuously replicates a running virtual machine onto a second physical host. Remus requires no modifications to the OS or applications within the protected VM: on failure, Remus activates the replica on the second host, and the VM simply picks up where the original system died. Open TCP connections remain intact, and applications continue to run unaware of the failure. It's pretty fun to yank the plug out on your web server and see everything continue to tick along. This sort of HA has traditionally required either really expensive hardware, or very complex and invasive modifications to applications and OSes."

10 of 137 comments (clear)

Min score:

Reason:

Sort:

Already done by VMware by Lurching · 2009-11-11 10:50 · Score: 5, Interesting

They may have a patent too!!
1. Re:Already done by VMware by TheRaven64 · 2009-11-11 12:00 · Score: 3, Interesting
  
  I know that a company called Marathon Technologies owns a few patents in this area. A few of their developers were at the XenSummit in 2007 where the project was originally presented.
  
  --
  I am TheRaven on Soylent News
Himalaya by mwvdlee · 2009-11-11 10:57 · Score: 2, Interesting

How does this compare to a "big iron" solution like Tandem/Himalaya/NonStop/whatever-it's-called-nowadays.

--
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
1. Re:Himalaya by teknopurge · 2009-11-11 11:22 · Score: 5, Interesting
  
  VM replication like this still has an IO bottleneck. This isn't magic: unless you move to infiniband you're not going to touch something like a Stratus or NonStop machine. By the time you add in the cost of the high-perf interconnects, you're on-par with the real-time boxes. All this convergence going on with people redesigning the mainframe but ass-backward with client/server gear. Makes little sense to me other than it being a gimmick.
  
  By the time you get all the components that provide the processing and I/O throughput of those high-end boxes, the x86/64 commodity hardware cost advantage has evaporated.
  
  --
  Website Hosting
2. Re:Himalaya by anon+mouse-cow-aard · 2009-11-11 23:56 · Score: 2, Interesting
  
  We had a 700 kline app written in some Tandem specific application language. the smallest server we could get from HP was 400 K$. we re-wrote the app in python to use pairs of servers replicating via DRDB over ethernet and a load balancer in front. DRBD is slow, but with the new app I could just add pairs of nodes. We already had such a configuration for another application, and we combined the two, so the hardware cost was just adding two nodes in this cluster, at about 4 K$ per server node. 400 K$ -> 8 k$. I think it would take a heck of a lot of hardware to compensate for the pricing of that gear.
How does it deal with replication latency? by melted · 2009-11-11 11:33 · Score: 2, Interesting

I'm pretty sure that if I just yank the cable, not everything will be replicated. :-)
1. Re:How does it deal with replication latency? by BitZtream · 2009-11-11 13:12 · Score: 2, Interesting
  
  No it won't.
  VMWare claims the same crap and its simply not true.
  You have a 50ms window between checkpoints that can be lost, in your example . The only way to ensure no lost is to ensure that every change, every instruction, every microcode executed in the CPU on machine A is duplicated on B before A continues to the next one. You simply can't do that without specialized hardware since you don't even have access to the microcode as its executed on standard hardware.
  50ms on my hardware/software can mean thousands of transactions lost. That can wreak havoc on certain network protocols and cause database operations to fail completely as you replay portions of transactions that the database has already seen.
  I can come up with situations all day long as to how this isn't as seamless as you make it out to be. Sure, xclock transitions to the other machine in what appears to be a perfect no loss transition, or solitaire on a windows machine, but thats not exactly useful.
  Remus has plenty of uses, but it has plenty of pitfalls and regardless of claims does require consideration when developing systems unless you're introducing latency that to me, would just be completely unacceptable and would require applications to be aware of the latency. Hell, thats 6.25MB of data that can be transmitted over a gigabit pipe between checkpoints. That can kill performance.
  I know what you're saying, I know what you mean, and I just don't think you realize how much that latency can effect certain classes of applications.
  
  --
  Persistent Volume manager for Kubernetes - https://github.com/dwimsey/openshift-pvmanager
Re:state transfer by Vancorps · 2009-11-11 11:38 · Score: 3, Interesting

If your primary and secondary systems are physically located next to each other then they aren't in the category of highly available. Furthermore with storage replication and regular snapshotting you can have your virtual infrastructure at your DR site on the cheap while gaining enterprise availability and most importantly, business continuity.
I'll agree with being skeptical about transparency although how many people already have this? I went with XenServer and Citrix Essentials for it, I already have this fail-over and I can tell you that it works. I physically pulled a blade out of the chassis and sure enough, by the time I got back to my desk the servers were functioning having dropped a whole packet. Further tweaking of the underlying network infrastructure resulted in keeping the packet with just a momentary rise in latency.
Enterprise availability is fast coming to the little guys.
Re:Wrong place to put a failsafe? by dido · 2009-11-11 13:51 · Score: 4, Interesting

This is something that the much simpler Linux-HA environment deals with by using something they call STONITH, which basically means to Shoot The Other Node In The Head. STONITH peripherals are devices that can completely shut down a server physically, e.g. a power strip that can be controlled via a serial port. If you wind up with a partitioned cluster, which they more colorfully call a 'split brain' condition, where each node thinks the other one is dead, each of them uses the STONITH device to make sure, if it is able. One of them will activate the STONITH device before the other, and the one which wins keeps on running, while the one that loses really kicks the bucket if it isn't fully dead. I imagine that Remus must have similar mechanisms to guard against split brain conditions as well. I've had several Linux-HA clusters go split brain on me, and I tell you it's never pretty. The best case is that they only both try to grab the same IP address and get an IP address conflict, in the worst case, they both try to mount and write to the same fiberchannel disk at the same time and bollix the file system. If a Remus-based cluster split brains, I can imagine that you'll get mayhem just as awful unless you have a STONITH-like system to prevent it from happening.

--
Qu'on me donne six lignes écrites de la main du plus honnête homme, j'y trouverai de quoi le faire pendre.
Re:state transfer by shmlco · 2009-11-11 17:32 · Score: 2, Interesting

"If your primary and secondary systems are physically located next to each other then they aren't in the category of highly available."
High availability covers more than just distributed data centers. Load-balancing, fail-over, clustering, mirroring, reduntant switches, routers, and other hardware: all are zero-point-of-failure, high availability solutions.

--
Any sect, cult, or religion will legislate its creed into law if it acquires the political power to do so.