In Search of the "Perfect" Pager Rotation?
jSpectre asks: "At my new job the Unix SA team has increased from 5 to 7. We're trying to work out a new, rotating on-call schedule and everyone has 'perfect' but conflicting ideas. Twelve weeks on and 6 off, 25 weeks on and 10 off. I thought someone out there must have come up with the perfect formula given N number of people you could rotate through the weekdays and weekend most efficiently. My google and web searches have come up with nothing. Does anyone know of a good formula/solution? The requirements are this, we have 7 people (but the forumla should ideally apply to N people) who should rotate through the weekdays (a 24 hour period) and the weekend (a 48 hour period). There is a desginated primary and a secondary person. They should be on for a few weeks and off entirely for a few. Sound like a good thesis/research problem for someone? By the way, Google comes up with a lot of people's schedules if you search for pager rotation. Tisk tisk."
Hmmm - maybe we're onto something . . . ;>
/. editors keep posting questions from real people with real jobs asking about help with their real jobs. Penis envy?
...don't worry about pager rotations because our datacenters never have failures, you insensitive clod!
I want to win the Powerball® jackpot which is estimated at $250 million.
Does anyone know of a good formula/solution? The requirements are this, I want to win this Powerball® jackpot (but the forumla should ideally apply such that out of the N times I play, I should win at least N-1 times). Sound like a good thesis/research problem for someone? By the way, Google comes up with a lot of pages if you search for lucky Powerball® numbers. Tisk tisk.
Give everyone points per week, either same for everybody or based on seniority. Then set up a schedule in advance, whoever has the most points gets the duty. When duty is taken, points are removed. People can of course volunteer for duty, and if multiple ones do, low points get first choice. Allow points to go negative.
Or something like that. I'm sure it could be an interesting exercise designing the points system and implementing a web page to handle it.
One more thing, you need some kind of deadline, no changing your mind within a week of duty. But if you get someone to swap, allow that.
Now if you are going to pay for the duty, you want the weekly points awarded based on how much different shifts cost. Maybe factor in seniority also.
Infuriate left and right
Shit guys, if you can't work this one out on your own you should not be sysadmins. What's the next "ask Slashdot" "My manager has given me all these tapes to copy stuff from the sever to every night. What is the best rotation strategy?
Maybe you could work out a schedule based on the calendar year. The current one could seriously frustrate some people if they are on through the entire holiday season. Thanksgiving, Christmas, and New Years could easily come within 6 weeks of each other. It might be entirely possible some people are on through this whole time and others are off that entire time. If it's not balanced off the next year and some of the same people have to stay on call a 2nd year through that time, I would think it could lead to hard feelings.
While the holidays may not effect your business, they do have a storng emotional effect on most staff and it might help to set up the schedule to treat people fairly not just in regular time on/time off, but also in holiday time. For example someone who works Christmas or a 3 day weekend might get an extra week or weekend off some other time.
I've never had to deal with this in the tech field, but when I was in property management, I know anyone on call over holidays always felt at least a little frustrated, but at least they knew they all had to deal with it more or less equally.
Let's start with the assumption that you don't care if you're on-call, so long as you never get paged to do something during non-business hours.
...)
A simple system that would work for N people might be the following:
1. Number the people 1..N (or 0..N-1 if you're feeling geeky).
2. The pager starts with person 1. If you need a secondary or tertiary (sp?), then assign to persons 2, 3,
3. If person j takes the call passes the pager on to the next unallocated person in the list, who takes on j's priority (i.e., primary, secondary, etc.); if the primary takes the call and you have secondaries, etc., the secondary becomes the primary and the next unallocated person on the list becomes secondary.
4. Goto 3 (couldn't resist)
Assuming that calls are evenly distributed, then you only have to take a call every N*(call inter-arrival time) units of time.
You could change around the "who gets primary next" rule in various ways.
Assuming that you don't get more than one call on a weekday or over a weekend, this system should be reasonably fair.
Make it even easier say $x a day for pager coverage (more for weekends) plus pay for call ins. That makes it easier... I don't have a life, need more money, I volunteer more often/take extra shifts. I have a real life, don't want as much extra work... I don't volunteer as much. Shifts that aren't covered are simply rotated through (still getting the extra bucks)
I have mod points and I am not afraid to use them
Supposing the density of the calls is not very high, use a two-person/week scheme.
..., week 4: 7-1, ..., week 7: 6-7). After 7 weeks, each person will have worked 2 weeks. Not so bad, only a 22% uptime is needed.
Person 1: Primary from midnight to noon, secondary from noon to midnight;
Person 2: Primary from noon to midnight, secondary from midnight to moon;
Rotate each week (week 2: person 3-4,
And for an odd N (in your case 7), you automatically shift morning/night in each iteration.
Seems easier that way too, you don't have to remember "are we thursday and it is really between 6am and 11h30am?".
...we are just about to start the "on call" stuff, and we're working it like so:
Starting off ranked by seniority: Newest employee is #1 on the list, most senior (me, whoo hoo!) is at the bottom.
When we get an off-hours support call, it goes to #1's phone first. If there's no answer, it bounces to #2, and so on. Whoever actually takes the call goes to the bottom of the list, everyone else moves up a notch. Lather. Rinse. Repeat.
This is really easy! The length of the period of rotation is exactly x (number of persons) weeks long. At week 1, person 1 (primary) and person 2 (secondary) are on for Monday through Sunday. At week n, person n (primary) and person n+1 (secondary) are on. At week x, person x (primary) and person 1 (secondary) are on.
The length of a week is used so everyone gets an equal amount of time. Everyone is on call 2/x number of weeks. Shifts can be swapped easily, just be careful that primary and secondary aren't the same pserson! Primary and Secondary can be swapped -- the system can work with people being primary then secondary just as easy. Also, when deciding the order of rotation, you may wish to take skill/competence into account. Have it so the Junior guys are spaced out, so if junior person (primary that week) can't figure it out, he can call the secondary (more experience person).
He who laughs last is stuck in a time dilation bubble.
just tell them when the hell they're working and when they're on call, and be done with that. Make something up, anything. If they don't like it, tell 'em they're out of there, and hire another unemployed monkey for $2/hr less. It's a manager's economy, charlie! Think with your head.
1 week on, $numberofemployees - 1 off
manager intervenes in cases where a person is about to work the the second holiday in a row, in which case the person being screwed trades with the person before him in the rotation.
Apply appropriate further perturbations in cases where the holidays worked differential exceeds 2.
Incidentally, in case somebody's looking for a really good unix admin to cover a holiday or something...?
I've found that the best rotation is the everyone-gets-paged-and-if-you-don't-see-it-fixed- within-a-few-minutes-find-a-terminal rotation.
I like to use Buzz word bingo to select the next victim. Today "Beowulf cluster" is an instant critical hit.
Or try These
Sorry about the writing. Robot fingers, you know? Cliff Steele in DOOM PATROL #23
We all carry nextel phones or pagers and get text messages from our alerting system when things go down. When I see something go down and I'm somewhere I can vpn in I'll usually do so and check to see if any of my coworkers on actively vpn-ed in and give them a beep to see if they are already working on the problem and see if they need any help. If there isn't anyone looking at it and I'll usually jump on in and get to work and the next person who takes a peek will see me and we'll talk they'll get a feel for what's going on, the extent of the problem, etc. This works pretty well for us (we are small, 10 person group and pretty much know where each other are the evenings/weekends). We also have an answering service that has a formal 'on call' list, so if a customer calls with a problem they'll just call down the list until someone answers. We rotate the list around every quarter, or therabouts, so the 'primary' person changes. I often volunteer for the top spot on the list because I'm pretty flexible and don't have a wife/kids and don't mind being called. This works out well because if I'm not able to fix it we have a list they can call down and everybody pulls their weight so no one really feels bad if they have to hand it off down the list now and then. I'm guessing that since you are looking for a more formal schedule that this sort of 'loose' system may not be the right answer for you, but it may be the right answer for others. We have separate call out lists for some more specialized systems (oracle, router folks, mainframe, telecomm) so that when there is a problem with a specialized system you get the more experianced people first. Our managers are also at the bottom of the list, so if all else fails the call will go to a person who has a complete phone list and a good idea of where people are so they can start call people's alternate numbers and what not. This system also allows for quite a bit of flexiblity people. Our 'tradespeople' (aka union) side of the house where there are MOU's or something relating to off hours support has a more formal policy, but they also have financial incentive (time and a half type stuff) for being on call, so there isn't usually a lack of people willing to be on call. One thing that we have considered, that may work for you, is dumping our answering service for a PBX based application. That way customers could be routed through a menu tree (yeah, people just love those...) that would identify the area of the their problem and then page out the people on call for that area. You can tie something like that in with your calandering/scheduling system so that only people who have time marked as free are paged. You would have to incorperate some sort of checking to ensure that you had at least a primary and alternate for any given time, but it should be doable. I am coming from a shop where people are generally will to be on call when possible, so we don't often run into a situation where someone has to give up something just to be on call, so YMMV with this type of system.
This is a perpetual scrum in medical residency, too. We can't do back-to-back calls, which makes it harder than, "You cover this weekend."
0 - Get a big ass calendar with holidays and some pencils. Decide how many days/year each person will have to work. Break them down into 4 or so categories: weekday, friday, weekend, holiday are ours. Friday is annoying because you can't go out but not as bad as an 24h (weekend/holiday) day. If weekends are light, you could just have "weekday" and "friday + weekend" categories. Anyway, share around evenly.
1 - Holiday parity is a good place to start. Noone wants to get screwed both xmas and new year's. Ask for preferences and nail down someone for coverage for these and Labor day, the 4th, T-day, etc. They can trade later.
2 - Map out the conferences and people's known vacation blocks, anniversaries, exams, etc.
3 - Some people haven't adapted to a totally random fill pattern of coverage, so give people a choice of contiguous blocks/easy to remember patterns (M/W for the month of ___) or irregular blips.
4 - Schedule the parts where many are out of the office with whatever it takes. Subtract these and the holiday days from the totals each person has to work. Schedule the pattern-desiring people and people with evening classes/outside commitments/inability to show up if on a random schedule. Again revise the totals.
5 - Start marching through at the beginning, rotating through the N people available. Keep running track of the fridays/weekends, do a little stagger to keep the weekends from being the same person on the same day, and it will start filling out.
6 - Think outside of the month to fit those last days in. You don't have to fill months contiguously or in date order. If there is a new employee, it may be best to slack off a bit on them (no weekends) at first until they fill out their KB; this gives you some flex.
N - Nothing you can do will make the perfect schedule. You have to have one master list that is the last word, and on which everyone must record their trades. Leftover days are best distributed to the people who took the least holiday days or the dues-paying new hires.
N+1 - Write some open source software to do this. Acrimony might be less, and the legibility would be better for sure.
Have everyone involved in the rotation pick a few important holidays. Maybe two or three each. Then factor in all Major Holidays that may have been missed. Lets say 7 people and 30 holidays, schedule your weeks/days on and weeks/days off so people don't work too many holidays in a row. (eg: Thanks giving, Christmas, New years) Print up a couple years as a sample to see if its fair. Adjust as needed. Allow people to trade weeks/days so they can work around vacations, weddings, giving birth etc...
Being called a dork on Slashdot must be like being called the retard in special ed.
Uh... You know... why not just have each person on duty for one week, in rotation.
This week, it's Bill, next week it's Terry. The week after that, it's Bob. Everyone gets six or seven weeks in between their turn and nobody gets burned out by doing more than one week at a time.
I work for a top-level engineering support group within a very large company. We're the link between the devs and tech-support and there are about 80 of us for just this portion of the software division (and there are about 400 tech support guys in frontline and the company has about 40,000 employees overall).
We don't deal with seniority or any other bullshit. It's real simple. Each product has to have 24x7 coverage. So if there are six engineers on one product, then every sixth week, it's your turn to roll again. If there are two people on your product, you are on duty every other week. If there are ten people on your product, it's your turn every tenth week.
I don't see why IT should be any more difficult. If you have five guys, you work through the rotation every five weeks. Nobody has to do more than a week at a time and it's easy to map out "gee, so in four weeks it's my turn again.."
What's the next question here? "How do I fit 200 cd's into a 200 cd disc carrier"?
For example, let's say you have N people working (and all are interchaingable, to start with). That means that each of them should be on call for K = 1000/N milliseconds out of every second (on average). Provided there are less than 500 people to be scheduled, you can accomplish this by rounding K to an integer (for the case where there are more that 500 people to be scheduled, either schedule them for one millisecond each, or go to a finer grained time-base). One important point to remember is that you must resource lock the call to the person in << K ms to avoid race conditions (which can garble text messages and result in an annoying high-pitched noise if two or more people try to return the call simultainiously and get multiplexed--
Hot damn, my run just finished.
G'night all...
-- MarkusQ
Low bidder gets stuck with the job, and second lowest bidder gets secondary. All the rest have to pay the third lowest bid to the person stuck with the job.
Whatever. Do what my team lead used to do. Every time we paged him when he was on rotation, we'd realize the pager was in his desk drawer. Well, every time except the one time when he said "I'm too drunk to come in."
The Army had to solve this problem eons ago in order to have rotating guard and staff duty schedules. Thier solution is the DA Form-6. Look it up. It has all the features you specified.
In my team which has 5 people, we do one week of on-call, rotated around the team each week.
It means you go on call 1 week in 5.
Simple to understand, simple to implement and no hassles. People swap individual days with other SA if they have some pressing engagement which they need to attend.
This scheme works out good. We all end up with an even amount of day pager duty in a month, and we each get 13 weeks of night pager duty per year. If a holiday falls on a weekday, the night pager person does the day shift.
A friend of mine on another team does an entire week of pager at a time. It sucks for him. He hates it. I much prefer our system.
Hi-Technical Excellent Taste and Flavor!
Forgot one...
5) Profit!!!
back to lurking...
BIGstan!
Errr, this might be too obvious, but have you asked if anyone wants to work the graveyard shift?
That would remove 90% of the problem.
I don't know about your situation, but most groups of geeks contain one or two that are night owls. Why not pay them a small premium to work nights? A lot cheaper than overtime.
.02
cLive ;-)
-- Trinity in high heels carrying a whip: The donimatrix - there is no spoonerism
put a bunch of holidays in a hat (twice, once for primary, and once for secondary) give each holiday a value (prim, secd) Christmas - 5,3 New Years - 5,3 4th July -5,3 Easter - 3,2 Thanksgiving - 3,2 Labor Day - 2,1 Memorial Day - 2,1 Presidents Day - 2,1 everyone draws one holiday Everyone must end up with at least 6 pts of holiday er something... This wouldn't solve the whole thing, but it would fix the holiday problems...
DA Form 6 form does not answer his question on HOW to allocate the schedule. It is just a pre-printed sheet to write down a schedule once it has been established.
Leave it to the government to turn what is essentially a sheet of lined paper in to a formal, named and itemized military-spec piece of equipment.
How much you think a pad of those costs?
DA Form 6 form does not answer his question on HOW to allocate the schedule.
Correct, but not very helpful.....
How much you think a pad of those costs?
Nothing. ALL Army forms are available in PDF format. The advantage of a piece of paper is that you don't need a computer to operate it. This comes in handy in places like foxholes, which typically lack electricity.
FWIW, AR 220-45 tells you HOW to use the form. This took me, oh, 10 seconds to locate via google. FWIW, here's a PDF copy of the reg. Of course, I'm not a cynic who condemns all things military because they are military. Oh, and I guess I should say that I'm a Major in the Army and have used the DA 6 for most of my adult life to do things like this.
A little more searching will probably turn up either a standalone program implementing the duty roster, or a spreadsheet. The paper forms become tedious to maintain for large groups of people, or when maintaining a separate rotation for weekdays and holidays/weekends (which is common), but are VERY fair in allocating duties. More importantly, they are AUDITABLE, so anyone can look at the roster form and determine that the duties are being assigned fairly.
You don't mention how many on-call at once? Is there a primary contact and a secondary contact?
At my last job, there was a primary contact, who would receive the initial call. If, within a given time period, there was no response, the secondary person would be called.
How about giving the pagers to two people each week - if you've got seven people, you could have time off between the time spent as primary and secondary, or just make everyone do two weeks of on-call (one as the primary contact, one as the secondary), with a large block of free-time before their next on-call bit?
*shrugs*
Riiiight!. It's almost unateinable to get 3 people to agree on where to have dinner.
:-P
You wanna make 7 techs to agree on a schedule?
Do it by "Military Junta". Get a Manager and 2 elected members of the team to decide on the of the proposals submitted by the team.
Then, blame failure on the Junta and praise those who abstained to vote.
In my group we have about a dozen people to rotate the pager through. Right now, each takes it for a week at a time - so you're only on-call about 4 times a year. We hand over the pager every Monday morning.
We just picked someone at random to be the "first" and then went through the list of people in the group alphabetically, copied/pasted 3 times (to get about a year's worth) and then overlaid it on a calendar. If someone has a week they know will be bad, they can swap with someone right away. Holiday conflicts (both for people pulling duty on holidays regularly and people who will be away for the holiday) will come out pretty easily too - generally the younger/single people tend to go out of town to see family while those with spouses and kids will have family coming to them, and so on.
The only trick is, when you're only on-call 4 times a year, a lot can change between each time you're carrying the pager, and you have to keep on top of it.
I am sure that your VP could agree with 52 weeks on and 0 weeks off....
(it can happen!)
1) How responsive do you need to be? Generally the shortest time period you can rely on is 15-30 minutes - if you need faster responses, you better keep people on-site 24-7. But there's a big difference between being "on-call" with a response time of 2 hours versus being "on-call" with a response time of 15 minutes. With a 2 hour window, people can see a movie, go on a short trip, and generally have a normal life. With 15 minutes, the "on-call" period is really a "please stay at home" order.
Generally, we've found that the more responsiveness you want, the shorter the on-call periods have to be.
2) Manageability of the rotation: On-call duty is a pain. People end up doing all sorts of things that cause management problems. Do you have a person who always seems unavailable - so you always end up calling the back-up? Did someone go off on a trip because they forgot they were on-call?
Ultimately, someone will be responsible for the rotation and will need to call non-performers on the carpet. It's MUCH easier if you have a defined system, you know who is behaving well and who isn't, etc.
3) Predictablity of the rotation: Do you ever page the on-call person and they don't respond? Or they aren't sure they are on-call? In our shop, that was a sure sign that the on-call rotation wasn't working. Having a predictable schedule makes life much easier.
4) Fairness - everyone needs to share the burden equally. Sometimes you get certain people who are just better at dealing with crises, and some people who are just awful at it. (For example, we often had the case where person A could never solve the problem and we always had to call the backup.) You need to make sure that the schedule is fair - don't simply give the effective people more work and avoid the ineffective people. That rewards the wrong behavior.
5) Flexibility - flexibility comes at the end. Yes, people need to be able to trade shifts - but not if it breaks any of the above rules. For example, it's not efficient to allow people to trade "part of a shift" - you need to trade the whole shift or none.
So with those comments in mind, there are some helpful tips: