Ask Slashdot: Low Cost Way To Maximize SQL Server Uptime?
jdray writes "My wife and I own a mid-sized restaurant with a couple of Point of Sale (POS) terminals. The software, which runs on Windows and .NET, uses SQL Server on the back end. With an upgrade to the next major release of the software imminent, I'm considering upgrading the infrastructure it runs on to better ensure uptime (we're open seven days a week). We can't afford several thousand dollars' worth of server infrastructure (two cluster nodes and some shared storage, or some such), so I thought I'd ask Slashdot for some suggestions on enabling maximum uptime. I considered a single server node running VMWare with a limp-mode failover to a VMWare instance on a desktop, but I'm not sure how to set up a monitoring infrastructure to automate that, and manual failover isn't much of an option with non-tech staff. What suggestions do you have?"
Why don't you have good uptime to begin with? I've SQL Server 2005 on a single unimpressive physical server with months of uptime. Is your restaurant open 24 hours? Is your current server flaking out? Concerns about uptime itself might be misplaced. What isn't made clear in the OP is why you think you need better uptime.
You may want to look at what they are doing with avalibility groups. You can avoid the shared storage with avaliblity groups and could cut your hardware costs a bit. assuming you your software support SQL 2012. Link http://msdn.microsoft.com/en-us/library/ff877884.aspx
You are missing some critical details about this. How many transactions with this database are going per minute/hour/day? If this is a fairly basic SQL instance, I don't see the point of your fail over scenario. Simply create some jobs to run backups every few hour or half-hour (storing transaction logs and such) and roll that over to the "desktop" on a share or something. Obviously money is the issue for you, so don't make it so complicated you can't afford someone else to come in and fix things if you get stuck during production.
While I always agree with this, you can't expect a business that is open 7 days a week to just wait for it to fail.
Azure had multiple 24 hour outages.
It's cloud so that seems like a great idea !
New things are always on the horizon
Sounds like an awesome idea - that way, you are TOTALLY FSCKED when your internet connection goes down. One thing if your online ordering for a business goes down - just take orders by phone. When it completed hoses your order taking system (offline and online), then your screwed.
Get a decent server, maybe an HP. Dual CPUs, Dual HDDs, Dual Power supplies. Get a UPS.
Install Windows, SQL, and UPS controlling software. Install AV, but be certain to exclude AV scanning the SQL directory and SQL DBs and logs. You don't want AV killing your SQL server by accident. You might want to consider putting a firewall on the box and blocking all non-SQL traffic.
Patch as needed.
Install nothing else. No mine-sweeper, no restaurant food ordering software, no adobe. Nothing will kill a server faster than turning it into a desktop. Don't try to do anything on it. Just let it be a server running SQL and you'll be fine. Don't plug USB drives into it.
You should be able to back up the SQL db every so often stopping SQL and then starting it. Try to do this around the monthly patch cycle. Don't patch immediately upon one becoming available, but rather wait a week. This will give Microsoft time to correct any patch issues they have. You'll be much more vulnerable to patch issues than you will from viruses if you follow the "don't turn it into a desktop" suggestion.
Hoist Number One and Number Six.
Man, why is every askSlashdot these days full of people who don't think anyone can do their own backend infrastructure (or anything else, for that matter) unless that's their only job? Look, this guy has obviously been running his POS system for some time, already. Just because he isn't a whiz at SQL Server failover doesn't mean he should just throw up his hands and hire a contractor.
Have you completely given up on learning new things and exploring your options, or do you just advocate that other people do so?
Not enough information, you'll have something like failure possibilities in: the physical server, the VMs, the SQL Server instance, the Hard Disks, the hypervisor, the POS application, the queries, the backup process, the restore process, etc. All of them have tried and trusted solutions, but you need to establish what you're tackling first. If the answer is 'all of them' and you don't want to break it down and think about each item then you're better off pushing the problem out to an expert to manage it for you, you can take a range of hosted solutions or get someone to remote or local manage your infrastructure for you.
MySQL is not web scale. He should use MondoDB. That is web scale.
restaurant food ordering software or even time clock software may need a backend sever for there own data bases or some stuff like the food system may tie into the main data base.
You can run MySQL Cluster on two machines. It's somewhat complex to set up. And your POS terminals have to be able to connect to either server. But it's available.
If you're getting more than one crash a year, you have hardware problems. Commodity hardware may be unsuitable for a restaurant environment. You may need an industrial-grade PC, with a broad operating temperature range and resistance to dirt, dust, grease, and water. There are PCs and enclosures for restaurants, and the fast-food industry uses them extensively. Every McDonalds, Burger King, and KFC outlet uses industrial-quality POS systems.
You wouldn't use a home-quality stove or a home-quality coffee maker in a restaurant. It wouldn't hold up. The same goes for a computer.
Agreed.
I'm highly familiar with the POS system in US McDonald's restaurants. I'm sure they have more volume than you do.
If the server goes down, the registers can limp along on their own but it isn't pretty, and there's no high availability anything going on. Depending on what fails, the KVS might or might not work.
(For the uninitiated, KVS = Kitchen Video System. Those are the screens that tell the kitchen what food to make. Most busy restaurants - not just fast food - have them these days.)
What you need is for the staff to have some pieces of paper with the menu items and prices printed on them, so they can check stuff off and add it up. You might round prices down to the dollar to make the math easier. If you round up the customers will complain.
PS - If it ain't broke.... Getting the latest and greatest doesn't necessarily get you anything except the effort of getting stuff to work again.
Roger Moore's law.
What does James Bond have to do with this?
Stop. Whoever makes the POS software is the expert that you want to consult. Call up their support and ask them what they recommend.
If you can't measure it, you can't manage it. You haven't taken the first and most essential step in analyzing your problem: measuring it. Is your problem caused by network failure? By power? By software failure? Hardware? If hardware, by server hardware, disks, or something else?
If software, by OS, database, or application software? All of these have different solutions. Going to the cloud won't solve a network failure, it will make things worse. Going to the cloud may improve persistent hardware failures. but the MTBF of most decent hardware is pretty good, so are you sure you have clean power and a good (cool, clean) environment?
If your software or system is crashing, then that's its own problem.
Have you really ever had problems with SQL Server crashing before? What version? What kinds of workloads? Did you tell Microsoft?
Don't bother with clustering, use Mirroring or AlwaysOn Groups instead.
so +1
Agreed but. How old is the hardware?
If it's over 5 years then you should consider monitoring the following components that are exposed to physical degradation:
- Hard drive
- Cooling fans
- Power Supply
- Network equipment if your electrical current is flaky.
Your uptime might be more impacted by the level of maintenance you do on your equipment. I'm afraid to ask but if in any way your equipment is close to kitchen smoke then you could experience nasty greasy deposits in fans and air vents.
Your next step is have a disaster recovery at the ready with your most recent VM image backup.
sounds like a fucking stupid idea, the entire operation is local and can run easily on computers which can be bought 200 dollars a pop. two of them, set up mirroring and go - even azures pricing sounds stupid compared to that.
the cloud is a stupid idea because a) azure can go down and b) their link to azure can go down. in both cases they're screwed.
what they could need/want would be a totally separate backup, if they're american then maybe square - if they're european wtf does a restaurant need a complex POS system for when you can get self contained payment terminals that take chip cards and connect wirelessly(or cache those which don't need instant verfication) on the pennies? (ok, those don't take care of your actual orders and such, if the restaurant system is really fancy)
world was created 5 seconds before this post as it is.
I've seen pretty nifty windows based POS systems that had the SQL server start on each POS, look for a master and if there was one already, try and be slave. If that failed, it would just fail and try to become slave every few minutes. Apart from that, the POS software would just connect to the master-of-that-moment. Once the master went down, the slave would promote itself and the fist POS that tried then became the slave. All POS terminals that were up would constantly replicate database files/dumps/backups whatever so they would never be too far behind the master. I don't know how this mechanism worked exactly, but it was pretty resilient against little restaurant accidents and power glitches.
I Wouldn't want to directly advise to go cloud. If your uplink dies, so does your POS system. You could put backups in the cloud to prevent theft or arson destroying your accounting and books, but I'd not trust a single uplink high latency service with your primary business myself. Even if you get business quality lines with a proper SLA, you still can be down for half a day easily and pay hundreds of dollars per month for just the single uplink. Getting two independent uplinks with this kind of SLA will be so prohibitively expensive that you could easily afford to do your own cluster for that kind of money.
I was promised a flying car. Where is my flying car?
When running a business, you really have to draw lines between what you can/can't do, and what you want to/don't want to do. You also have to factor in what the cost will be if you mess it up while trying to learn.
If I were currently running a business, I would hire an accountant do handle my taxes. It's not that I couldn't figure it out, but it's something I don't care to figure out. I'd rather focus on what my business is. But then again, if I did it myself and screw it up, who badly could it hurt my business?
Here's a guy who knows some about running SQL Server and POS. But he has a business that's running 7 days a week that probably won't run if the SQL/POS is not operating. So what's the cost to the business if the POS doesn't work for a day or two because he hosed it up?
In this case, the risk is high of a failure (because he's in learning mode). He needs to balance the cost of failure with what he gains by learning to do it himself. If the cost x risk is too great, then yes, I advocate he hire someone to do it. If he can tolerate the cost x risk, then by all means, I encourage him to go for it.
If it were your business and you had a guy handling the mission-critical SQL/POS and you needed to add fail-over but your guy doesn't know how. Would you advocate him doing it on your production system? That's the situation this guy is in, except he's also the guy doing the SQL/POS work.
A key part of being successful in business is knowing what you can do well and what you're better of hiring an expert to do it for you. Good things to consider are whether the work is critical to operations and if it's part of your core business.
Firstly, you need to make sure there is a paper process which people can run by if the kit fails. Business continuity doesn't always require a massive DR strategy especially in your market area. If the kit does go pop, your staff need to be able to work instantly - paper is the best and did for years before computers came along.
However from a technical POV, speaking from 15 years experience running SQL Server instances, there's no cheap solution that works reliably. Hot standby (HA mirror) is the best approach for your scenario. It's probably a good idea to host it in another DC to isolate larger failures.
Info here: http://msdn.microsoft.com/en-us/library/bb934127(SQL.105).aspx
WTF? Software isn't subject to physical wear like an engine. Do you think friction will eventually turn a 1 into a 0 somewhere in the code?
At the bottom of the
I don't understand why people complain so much about that service. 9.99999% of uptime should be good.
I was the Program Manager in the SQL Server team owning all of our availability products in Redmond (created the AlwaysOn program). My recommendation is to 1) keep it simple and 2) implement a layered approach. But first I have a basic question - are you trying to protect SQL Server only or do you also need to protect that application and hardware? Because before we dive into the details, it might make sense to take an entirely different approach than the lower level availability technologies. And you also want to consider what it is you are protecting against. If you want to protect against hackers as well as power outages, disk failures, etc (and I suppose you probably will want to do that!), then I recommend that the first thing you do is perform regular backups to a cloud provider. That gives you the ability to restore to a point in time prior to a malicious attack. And it gives you defense in depth. Then, if you want to protect the app overall, maybe you should consider making it something that can also be hosted in the cloud for the next layer of redundancy. That way if you completely lose the site you can direct people to the cloud enabled app. But this also retains the ability to run it locally as the POS solution. Next, I'd consider a way to keep the data synchronized between the POS local installation and the cloud solution running in the VM. The cheapest solution is to use log shipping which performs backup and restores into the secondary (here in the cloud). This is also nice since you need the backups anyway for the first reason stated and this automates it. You should consider using database mirroring (now called AlwaysOn in the latest incarnation in SQL Server 2012) for the data synchronization. It's integrated into the SQL Server engine and provides better performance and the ability to configure it for no data loss and auto failover using the synchronous option. It comes in Standard Edition (sync only) and Enterprise Edition (async and sync). Also cover yourself for the common failures locally. Use a battery backed UPS and consider RAID for your disks on the computer. RAID 5 is probably fine for POS. If you have any other questions, feel free to email me johnmatthewhollingsworth@gmail.com. Best. Matt
I recently inherited an application built on .NET, we're a Linux organisation. The devs had typically built it on SQL Server Express with not a care in the world, but it was a core business app.
We bought a single license of SQL Server standard, and put it in Master Slave replication mode. Not having touched Windows for years (as a server) it was a bit of faff to get an Active Directory setup going, but it actually works okay. You don't need to license the failover server for SQL Server.
If there's a failure, it's about 10 minutes on notification to flip the servers over and a bit of manual intervention. You can cut this down by buying a third box to use as an observer, but that seems to be another SPF.
WTF? Software isn't subject to physical wear like an engine. Do you think friction will eventually turn a 1 into a 0 somewhere in the code?
No, but if there's a race condition that occurs once in a blue moon, the cumulative probability of trouble can increase monotonically with time.
Every end has half a stick.
Restaurants POS systems aren't exactly cutting-edge technology. There's gotta be a kazillion commercial systems out there and lots of pros to install and manage them who have set up and maintained hundreds of these systems. There's probably even some sort of restaurant owner's organization that can recommend systems and consultants. Why are you spending time on a tech site wondering about how to roll your own POS infrastructure when it provides no competitive advantage to your business and any screwups could cost big money? Spend your time worrying and working on stuff that will actually help your business compete with all of the other restaurants out there. Not that I know much about the restaurant biz, but I'm going to guess that getting and keeping good cooking staff, waitstaff, etc, getting quality supplies at a good price, and marketing the place and other restaurant-y things belong much higher on the owner's worry list than what hardware and software the POS systems are using.
I don't reply to ACs
By this time you have realized that 98% of Slashdot posters don't know a damn thing about the issue that you need to resolve (... the cloud? .... Really ??? ) and are just flinging buzzwords (monkeys... poo ... ) or asking questions that you won;t know the answer to in hopes that this will get then off the hook in actually answering your question.
.
Short and sweet - you want database mirroring with automatic fallover. You can set up a second SQL server on a separate machine (cost less than $500 for the machine) to be the mirror and if your primary machine fails then you are still golden. Here is an article that tells gives you an idea as to how to do this in MS SQL '08 :
http://www.databasejournal.com/features/mssql/article.php/3828341/Database-Mirroring-in-SQL-Server-2008.htm
Yes, you should hire a **competent** DB consultant to do this for you. Yes it will cost you another $800 - $1000 do have a **competent** consultant do this for you (figure 8- 12 hours work at 80 bucks an hour) - you will lose far more than that the very first time your database fails and/or you attempt to do it yourself and blow away your database because you made a mistake (you do have backups , of course.... right ??? ).
You can try to do it yourself but I do not recommend it as it's risky.
I've been doing DB work for 25 years - feel free to send me a Slashdot message should you desire to use my services.
----- In Your Cubicle No One Can Hear You Scream...
Why not? It's not friction that'll turn a 1 in to a 0, but simple entropy or cosmic rays (seriously). ECC RAM is designed specifically to guard against that kind of problem.
Friction can kill anything that moves -- fans and hard-disks. If a fan dies, either in the PSU, the case or on the CPU, you could end up with a server that crashes and has all kinds of problems. That's why real servers beep incessantly to let you know if a fan is dying.
And software can crash over time. I write software for a living, and a few months ago I had to change a bit of code I wrote in 2005 -- it was broken by a security update Microsoft released in November last year.
Anyway, OP is asking about cheap-skate ways to emulate Microsoft's expensive fail-over clustering options. The only real answer (I can think of) is replication/log shipping and something in the software that detects the fault and enables the fail-over without any on-site technical expertise. But he's going to need to talk to the ISV about that, not us.
There is no cheap magic bullet, if there was, everyone would be doing it. You will either pay for licensing, pay for hardware, or both. Clustering is usually a nonstarter due to the expense of a SAN, you get a cheap SAN then you still end up with a lousy single point of failure. SQL replication may work but the POS software may or may not work under that configuration and the fail-over may or may not be automatic so its a real crap shoot. Your best bet is a single quality server, minimize the crap you install on it, preferably just SQL, get a solid properly rated UPS, and make sure it is all setup properly. You will get great uptime. A mismanaged cluster is much more liable to cause downtime than a properly cared for single instance server.
Nuclear war would really set back cable. - Ted Turner
Yup. Ten percent uptime is where it's at!
No, no, you're not thinking; you're just being logical. --Niels Bohr
not that generic.
i did a google search for 'SQL Server' and found 1 link on the first 6 pages that didn't use 'SQL Server' to refer to the Microsoft product. and that 1 link was mysql.com which didn't mention 'SQL Server' on the page at all (only 'MySql Server').
the only other major DBMS to have 'SQL Server' in its name was Sybase SQL Server the ancestor of Microsoft's product.
Everybody thinks they know what restaurants do.
This is why so many otherwise smart people get into the restaurant business and then fail.
Forget the word restaurant.
Instead, think highly competitive, low volume, high mix, low margin, short lead-time manufacturing.
Think highly perishable inventory.
Accurate inventory, accurate predictions of future demand, and data driven product design make all the difference between success and failure.
Data collection and analysis is what really successful restaurants do. Or did you really think it was like Top Chef?
Uh....if you can't afford to buy decent hardware then why are you playing with MS SQL Server? Oh, that's right, you spent the money on licenses. Anyway, if you want fail-over you need to setup the server properly using the required hardware. You can't have something for nothing. Seriously, you used a fighterjet when all you needed was a Cessna. You platform choice reaks of microsoft hammer syndrom. Whoever developed your system knew Visual Studio .NET and saddled you with this costly burden.
If it were me I would use MySQL, automated backups, and an image or VM standing by with extra commodity hardware. Hell, just use this scenario with your current setup.
Another thing, I hope you're using quality harware even if it isn't redundant. Or is this a wal-mart setup?
I object to power without constructive purpose. --Spock
Enough cloud paranoia. Small businesses often have shitty internet service, even business class, that can go out for a day at a time. The weakest link is the last mile.
I object to power without constructive purpose. --Spock
I recommended RAID 5 because it can tolerate two drive failures if you give it all five, and I have seen two drives fail at once. It also performs better for SQL, not that it really matters in this case.
Uh, no, no it can't, not no way no how. And it doesn't necessarily perform better for SQL either. A 2 disk RAID 1 can handle one of the two disks dying. A RAID 5 of ANY size can handle one of the n disks dying. If you're thinking of RAID 6 (HP called it RAID ADG for ages) then yes that can handle 2 disk failures. So can RAID 10 for a subset of cases.
And in either case I doubt POS for a restaurant is taxing the server - I recall Dominos stores in Australia running a simple SATA mirror set on their in-store servers for hundreds of orders each night. The biggest load it ever had was reporting (end of day etc).
As for my recommendation - two desktops (relatively new) with SQL Server mirroring and backup to disk (replicate the backups). Having the data and logs backed up gives you protection against "delete database" and Bobby Tables, among other things, which you will not get with a straight replica. Failover should be as simple as an icon on the desktop (that runs the appropriate script). Not automatic, but cheaper than having a third PC with SQL Express (or Workgroup, or whatever they're calling it nowadays) for a witness server. Less to go wrong too.