Domain: drbd.org
Stories and comments across the archive that link to drbd.org.
Comments · 35
-
Re:Not all RAIDs are equal
You can also use DRBD (Distributed Remote Block Device) https://www.drbd.org/en/ to replicate block devices between servers.
-
Re:DRBD
Specifically how DRBD handles recovery after an outage of the replication network. The situations where the disk isn't plugged in will look just like the network outage scenario DRBD handles. I'm not sure whether this will be more or less efficiency than the mdadm bitmap approach outlined above, but those are the two main ways people do this specific operation.
-
Linux + DRBD + Virtual machines
If the server is going to be running inside a virtual box anyway, why not considering Linux for the host OS?
Specially, it has a nice solution called DRBD - Distributed Replicated Block Device which can replicated block devices (like partitions) over the network and keep them in sync (think of it as a sort of RAID-1 over network, but which handles also nicely all the dirty behind the scene stuff for tracking change and keeping the block device in sync). Thus at any point in time, the main server and the mirror server contain exactly the same data. It's integrated into the mainline kernel, it has wide "in the wild" adoption (it's not a small project used by two labs in one university). And thus it is also nicely integrated in lots of management software (for exemple: pacemake/heartbeat).
Most virtualisation layers (VMWare, Xen, etc.) can use block devices directly as a store for their image. (Traditionnaly, a block device provided by a SAN fabric, but a black device replicated to the other sever over DRBD works exactly the same too). You can thus have the host-server (VMWare server, Xen hypervisor, etc.) running on both physical servers, with the main server having a running VMware image with the microsoft SQL service inside. And the other just sitting idly, with its virtual host in stand-by.
- If hardware maintenance is necessary (say SMART monitoring tool signal that one HDD is about to fail and needs to be replaced soon), most virtual host (VMWare server & Xen hypervisor) support "live migration" : The virtual guest jumps from one machine to the other with almost no interruption as long as the block device is available on both machines (traditionally done by having an expensive SAN fabric to which both servers are connected, but DRBD in dual primary mode works nicely too). Users see no interruption of service, and now the idle machine can be taken down for the necessary maintenance.
- it the current server crash due to some big hardware failure (say the HDD dies without any fore-warning), the mirror server contains the latest up to date copy of the same data, thanks to DRBD. The VMware image can be started from there. It will exactly look like if the Windows virtual machine simply got it its power interrupted and was rebooted from a NOT-cleanly-shutdown state. Service resume rather fast, without needing that the emergency maintenance be done on the down server.
DRBD will handle all the necessary resyncing once both servers are back online.Last but not least, both DRBD and the various popular virtualizing solutions are nicely integrated with common administration tools (like pacemaker/heatbeat). It is thus possible to automate some scenario and make the other rather trivial to carry with minimal trained supervision.
All in all, it's possible to build a good redundand solution with off-the-shelf or cheap server parts, without shelling multiple thousands dollars for more expensive solution. (Specially, DRBD enables to do away with the expensive high-availability SAN fabric - you only need the "two cluster node" (which can be either beige boxes, or the cheapest server from your favourite brand), without the "some shared storage, or some such").
Also, interesting part: There's only 1 instance of the service running simultaneously (the virtual guest), so you don't need 2x the licenses for every piece of software.
Don't forget to do snapshotting inside the virtual server (so you can roll back to the latest working snapshot in case the virtual windows system gets corrupted).
Now throw in some software RAID-1 or RAID-5 on at least one of the nodes to better survive a HDD crash and the system starts to be ratter robust.Think about back-up, namely:
- history. If any mistake happens, how can you go back to a few days ago? (For example does the windows software feature a way to do snapshots? or will you simply use the snapshot feature of the virtual host ?)
- remoting. it the restaurant burns down, how can you get your data (spe -
Re:My own backups
I do have a hardware raid question. If you have a raid array running on brand ABC controller and the controller dies, can you replace the controller with brand XYZ? Do they use the same on-disk format? I honestly don't know.
I know with software raid I can re-boot with an additional drive and add it to the array while everything (httpd, ftpd, etc.) is running. If a drive drops offline, I can bring it back online with one command and no re-boot. Also, there's DRDB.
-
Re:No one mentioned DRBD...
Maybe because DRBD is not a Clustered Filesystem? http://www.drbd.org/docs/about/ states: "The Distributed Replicated Block Device (DRBD) is a software-based, shared-nothing, replicated storage solution mirroring the content of block devices (hard disks, partitions, logical volumes etc.) between hosts."
-
No one mentioned DRBD...
-
Re:to be honest, i dont really like drbd
I'm not sure. It might just be their some pages on their web site are out of date. For example their roadmap page says that 8.3 is a future release and features "Introducing mechanisms to better deal with temporary network failures for devices in primary-primary mode". But 8.3 is already out and a yum search shows 8.3.2 is available for F12 if I want it.
-
Re:to be honest, i dont really like drbd
I'd love active-active for some of the systems I'm working on. However http://www.drbd.org/home/mirroring/ seems to imply that it is currently complex, limited, and flaky. Did you find a better way, or are they just being cynical?
-
MOD PARENT UP
Not being able to bridge from the external network to the internal network makes this thing little more than a toy.. can't do things like DR:BD or n-tier makes this just another toy for the art department.
-
Re:Some information about HA
Please look at http://www.drbd.org/home/mirroring/ and the next chapter "Recovery".
I hope hope this can help you already little.
-
Re:1000+ a day isn't very much
You'll need something that detects the primary server is offline and switches to the backup automatically. You might also want to have a separate database server that mirrors the primary DB if you're storing a lot of user content, plus a backup for it (though the backup DB server could always be the same physical machine as one of the backup webservers).
On this note, if you're comfortable (and your application is compatible) with Linux+Apache, then heartbeat and DRBD will do this and are relatively simple to get up and running. Just avoid trying to use the heartbeat v2-style config (for simplicity), make sure both the database and apache are controlled by heartbeat, and don't forget to put your DB on the DRBD-replicated disk (vastly simpler than trying to deal with DB-level replication, and more than adequate for such a low load).
Oh, and don't forget to keep regular backups of your DB somewhere else other than those two machines.
-
Some information about HA
I want to give you some more information. Based on your visitor estimates I think you do not have a lot of knowledge about it. Because for this number of visitors you do not really need a cluster.
But now to the other stuff. Yes, Windows clustering is (up to Win Server 2003 [1]) a lot easier. But this is because it is not really a cluster. The only thing you can do is having the software running on one server, then you stop it and start it on the new server. This is what Windows Cluster is doing for you. But you can not have the software running on both servers at the same time.
If you really want to have a cluster then you need probably some sort of shared storage (FibreChannel, iSCSI, etc.). Or you are going to use something like DRDB [2]. You will need something like this too if you want to have a real cluster on Windows.
I recommend you to read some more on the Linux HA website [3]. Then you get a better idea what components (shared storage, load balancer, etc.) you will need within your cluster.
If you only want high availability and not load balancing then I recommend you to not use Windows Cluster. Better set-up two VMware servers with one virtual machine and then copy a snapshot of your virtual machine every few hours over to the second machine.
[1] I don't know about Win Server 2008
[2] http://www.drbd.org/
[3] http://www.linux-ha.org/ -
Re:Geez, it took you that long to figure it out?I realized it was not an elegant solution, but I will definitely have to look at that write up. I have bounced around at companies a bit lately and will likely stay where I am because I finally kinda like it here. However they are RH and not even RH and SLES.
Regardless, The last place I worked was a big shop that did hosting and client services (large UK company's presence in the US). They had a client with a RH cluster that they kept on complaining about and having it fence off nodes, etc. Of course they "hated" SUSE for some inexplicable reason. I tried to point them to Heartbeat and even http://www.drbd.org/ but they thought they were "Not big enough to use here" and "Red Hat Cluster is a real product" meanwhile GFS has poor locking, in my opinion.
-
Old DRBD Link
First of all, I couldn't agree more that DRBD is not at all suited for the task.
But I thought I'd point out a slightly newer source of information about it. drbd.org is the home of DRBD. You might be able to pick out the age of the linked howto by the mention of support only being for the 2.2 kernel. :) -
Old DRBD Link
First of all, I couldn't agree more that DRBD is not at all suited for the task.
But I thought I'd point out a slightly newer source of information about it. drbd.org is the home of DRBD. You might be able to pick out the age of the linked howto by the mention of support only being for the 2.2 kernel. :) -
Depends...
Do think of all your options. Since I don't know of any thumb drives that'd be useful, here's what I'd recommend:
I suggest you set up a dedicated backup server at each site. It doesn't have to be much of a box -- it may even cost less than the thumbdrive. We used BackupPC to manage the backups -- it's entirely automated, and it can be configured to send out an email if a backup didn't complete successfully. It'll be doing mostly incremental backups. Keep the backups on a separate partition, so you can use something like DRBD over OpenVPN to backup a more central location, which has some sort of IT staff and can handle things like putting the whole thing on a RAID, maybe even swapping out removable hard disks to take home, and of course taking snapshots just in case the filesystem itself decides to die.
Others have talked about keeping everything at multiple datacenters, so that your backup is simply that any one can be hit by a tornado and none of your branch offices even notices. That's a lot more complex than what I've described, and if your DRBD/OpenVPN should lose its connection, local operation will likely still happen -- thus backups will still happen, if only to another local hard drive.
As far as "easy to use", that's not good enough. You want "Automatic". The datacenter is really your best option, with some sort of custom software or a web-based interface. Short of that, the packages I've described will hopefully be reasonably easy to implement, and the restores can happen from a web interface. It's a bit "do-it-yourself", but in a sysad way, not a full-time-programmer way.
Physical security, I leave to you. But if you must, it's certainly easy enough to encrypt the entire hard disk. However, if someone's able to carry off your backup computer, you're probably already hosed, and in any case, they only get information related to the local branch, I hope. Your datacenter/backupcenter would obviously be much more secure, but if the whole thing goes boom, your branch offices still work, and when you bring up a new datacenter, at worst, the branch offices have to reboot the backup server. And even that can be avoided with a few cron jobs.
The thumb drives are doubtless easier to implement -- buy one, plug it in, it works -- but if you get a knowledgeable IT staff to put together a system like the one I've described, it will pretty much run itself, and be mostly free of the whole "human error" problem -- the problem of, say, the guy who forgot to backup the data that day, or the idiotic tech who, rather than backing up, decided to use the thumb drive as primary storage, or the thumb drive that went through the wash, or the building that burnt down with the thumb drive and what it was backing up inside, or that one virus that manages to get into your data, hiding for awhile before it starts destroying things, so you restore from backup, only to find the same virus in every backup.
Oh, and one more thing -- whatever you choose, test it. And by "test it", I mean take all the hardware out of the branch office, bring in brand new hardware, and try to restore from your backup. There's no meaningful test of a backup other than actually attempting to restore it, if for no reason other than to prove to your superiors, customers, and the world in general that your backup is absolutely bulletproof. -
Re:DRBD
How is this any different from DRBD (http://www.drbd.org./
Just to save anyone else having to reply - this is for BSD and therefore automatically better. -
Re:NBD?
Not the same as NBD, but it is very similar to DRBD (http://www.drbd.org/). I've used DRBD before, and it works quite nicely.
-
DRBD
How is this any different from DRBD (http://www.drbd.org./
From the website:
DRBD is a block device which is designed to build high availability clusters. This is done by mirroring a whole block device via (a dedicated) network. You could see it as a network raid-1.
Each device (DRBD provides more than one of these devices) has a state, which can be 'primary' or 'secondary'. On the node with the primary device the application is supposed to run and to access the device (/dev/drbdX; used to be /dev/nbX). Every write is sent to the local 'lower level block device' and to the node with the device in 'secondary' state. The secondary device simply writes the data to its lower level block device. Reads are always carried out locally.
If the primary node fails, heartbeat is switching the secondary device into primary state and starts the application there. (If you are using it with a non-journaling FS this involves running fsck)
If the failed node comes up again, it is a new secondary node and has to synchronise its content to the primary. This, of course, will happen whithout interruption of service in the background.
And, of course, we only will resynchronize those parts of the device that actually have been changed. DRBD has always done intelligent resynchronization when possible. Starting with the DBRD-0.7 series, you can define an "active set" of a certain size. This makes it possible to have a total resync time of 1--3 min, regardless of device size (currently up to 4TB), even after a hard crash of an active node. -
Re:NBD?
How does this compare with Linux Network Block Device? Sounds very similar
It doesn't compare at all.
From my (quick) scan of the article - think of NBD as a replacement for NFS (well, sorta) & this as a sort of network RAID (kinda, not realtime).
They're not really alike - for linux drbd is probably closer. -
Re:As a MySQL shop...You're in a bind but it's one of your own creation. You're supposed to think about backups and redundancy and failover when creating the application and setting up your system. Yes, you have a problem today but MySQL didn't create that problem. You did - or perhaps not you yourself, but whoever set up the system without doing that. What was the backup plan when this was first set up?
Options you could have used if you'd thought about this when setting up your systems include:
- DRBD to do block level mirroring. Used by LiveJournal for a critical server pair.
- Hardware RAID 10 and taking one set of drives offline to back them up. Commonly used by banks.
- Replication with at least two slaves. Used by Wikipedia (400GB) and LiveJournal (may be out of date information - may all be DRBD now).
- InnoDB Hot Backup from a second system (slave or DRBD)
- Application level copying. Used by Wikipedia for some backup dumps.
Since you have two days a year when you can set things up, I suggest learning about DRBD and considering using it to get you out of this situation.
A single duplicate record error does not break replication. It stops it until a human has corrected the source of the problem and told it to resume. The problem is typically an application server which writes to a slave which isn't set to read only mode.
-
Re:Grass root? Mainstream?
right now, i agree with you 100%, 10g is hardly needed anyhwere. but to address your orignal post, its that future thing that has me wondering. doubling up or tripling up on gigabit is currently standard practice for data intensive systems-- not just big iron, but rouge hackers cobbling together powerful clusters and grids for fun or profit. and it makes sense, its so damned cheap why not? besides, who can afford 10g? but with the throughput wars, how long can this really last? this is grassroots. this is mainstream, within the domain of computing hardware systems. for the actual computer industry, 1g is woefully insufficient.
grass-roots is people using drbd network replication with Xen to support live virtual host migration. if a filesystem fails, just migrate the hosts on that filesystem over to the host on the network backup and run them there. This sort of advanced system shuffling used to be the domain of blade systems and IBM, but now that bandwidth is becoming commodified and abundant we can start doing these things grassroots. Even now, some casual idiot can throw twenty one hard drives into a case for a couple terabytes of online storage. Soon with SAS (good overview), custom build storage will become only more of a reality.
Actually, SAS expanders use 4g infiniband interconnect. Maybe we just need cheaper infiniband. 4G is "nearly" enough.
So, currently storage solutions and blade systems are proprietary and expensive. With 10g and rapidly accelerating high availability and distributed systems, the linux kiddies are building it themselves.
this being said, i do wish to emphasize once more that I do really agree with you. there wasnt a single thing i didnt say yes yes and nod my head to in the parent. -
DRBD
Have you looked into DRBD? It works kinda like RAID1 over a network. It uses 2 computers to store the database. Another computer acts as a heartbeat server. You'll need 3 NICs in the database servers; one for the connection to the network, one (gig-e preferably) for the connection between servers, and one for the connection to the heartbeat server.
http://www.drbd.org/
If you are smart, you'll play around with this on a test network or VMWare first. Get it all tweaked out and actually test it by killing a server while in mid-transaction to see if it works for you. -
Re:SCSI Question
I ask because its either this or using drbd to replicate the entire file system over multi GigE lines while having to use twice the number of hard drives. I'd much prefer to avoid these interconnections altogether and simply have SCSI itself be the common communication bus, at least such that either controller can access the raid array should the other fail. I'm not
/totally/ OT. I was looking at 4 gige or a 10gb solutions which would have pounded cpu usage to death... I'd much rather just make sure I can always access the drives.
Myren -
OpenBSD clusters make my heartbeat faster...
Here's the plan:
1. Set up High Availability router with pfsync. (using computers rescued from the trash)
2. Set up a HA Network RAID system using DRBD or something similar. (using more computers rescued from the trash)
3. Build a Kerrighed or OpenSSI Single System Image cluster. (using the latest and greatest computers one can rescue from the trash)
4. ???
5. Profit! (and thus, have enough money to actually buy equipment)
I've already set aside Tuesday evening to upgrade my bandwidth throttling OpenBSD router. I set it up the day before 3.6 came out, so I didn't feel like upgrading until now. I'm tired of the typical hardware failures you tend to get out of computers people throw out (maybe that's why they threw them out in the first place) but mostly I'm looking forward to getting a learning experience hundreds of times more valuable (personally) than getting my MCSE 2003.
\/\/\/ -
No, it's not.
rsync doesn't scale to huge numbers of files. It also doesn't work so well when all of those are changing at once. Finally, the protocol and algorithms may work for imaging an entire disk as if it was a file, but the program doesn't -- it can ONLY copy device nodes as device nodes, and will NEVER read a block device as a normal file. There have been patches to fix this, which have been rejected.
We use a scheme which actually seems better for systems which are always on: DRBD for Linux. Basically, every block written to a device on the master is automagically duplicated to all the slaves. If the master goes down, you promote one of the slaves to master, mount the partition, and start services. If you have the heartbeat package, this can be done automatically, complete with an ip takeover.
We aren't using it for high availability, actually. We just use it to duplicate a BackupPC partition out to someone's house, over openvpn. It's much nicer than rsync -- rsync was filling up a couple of gigs of RAM before it sent a single file, and in every instance, it was still eating up more swap when we killed it out of frustration.
The high availability design does help, though. If the entire office gets nuked, we can physically carry the backup box in, turn it on, make it master, and use BackupPC's native restore feature. Sometime soon we're going to make our PHB cream his jeans by demonstrating a full, bare-metal restore. -
Networked RAID, anybody?
-
Please learn how to make links.Please learn how to make links.
<a href="http://www.drbd.org/">replicating block device</a>
yields: replicating block device -
Well...
I'll be building DRBD clusters in a blink of an eye.
Actually I already do on 1gbps :)
Redundancy is good. -
Re:yes, that's actually the basic idea
What you've said is true only for the case of web sites with static or almost static content, where you could have the content in the local drives of each webservers, and use rsync to distribute new content (web site changes) to all the servers.
But it's a very different situation when your webservers handle very dinamic content, specially when the content is upload by the users. In this case, you have three alternatives:
1) Content in the database. Is up to you to use a clustered database to provide High Availability and Load Balancing
2) Content in a NAS (NFS, etc.). You have the same content for all the webservers, and with drbd you achieve High Availability... but you don't have Load Balancing.
3) You use GFS or other distributed File System (don't know the issues on this option).
btw, for load balancing at the IP level I would recommend Linux Virtual Server, and Heartbeat to achieve High Availability in the balancers. -
If you're using Linux
you may benefit from a combination of heartbeat and DRBD, which respectively provide IP address/service failover and a network (no special hardware required) data replication solution.
If you have appropriate hardware you might also appreciate Stonith, which provides forced-shutdown of a failed node (in the case that the failed node won't release the IP address, and hence you would otherwise have problems switching service).
If you're in the UK then give me a shout and I'll set it up for you (for a reasonable fee)! My contact details are available on my web site. -
how does this compare to openssi?
How does this compare to OpenSSI? OPenSSI is nice because of the single system image approach, that makes administration very simple. AFAIK, an OpenSSI cluster also supports PVM and MPI in addition to exec and run-time load balancing (a'la mosix).
OpenSSI has a lot of "HA-" support, including support for various clustered filesystems, failover of network interfaces across nodes, and failover of the first node (hopefully soon without needing shared SCSI storage but using something like drbd).
-
Re:DRDB and or Linux Virtual Services
I'm going to guess that you're not referring to the Digital Radio Development Bureau (Google's top hit for DRDB) and, in fact, are referring to DRBD, which is a distributed block device for Linux HA clustering.
-
DRBD does it as well...
-
Distributed Network Block Device
A perfect solution would be a form of network block device that mounts distributed NBD shares. The Linux DRBD Project has this capability. From their website, "You could see it as a network raid-1".