The High Tech Sweatshop
Its 4:30 am on a Friday and I just finished the last Mountain Dew. We ran out of coffee hours ago, the remains of it now black sludge at the bottom of the pot. The buildings air conditioning went off sometime the previous night and its up to almost 90 degrees in the server room. The two volunteer hackers on the staff went home after 12 hours, leaving me and the sysadmin...
This is a normal day for me.
I'm a systems engineer in the client services division of a network security software company. Basically what that means is that when networks break, I fix them.
I am 22 years old, I make a large multiple of the national average salary, and if I cashed in my stock options I could buy a very nice house. I'm also sixty pounds overweight, I sleep an average of four hours a night, and I have several ulcers. I usually spend about 60 hours a week at the office, but I'm on call 24 hours a day seven days a week. If I was honest with myself Id probably say I worked about one hundred hours last week. This is a normal life for someone working in this industry.
We live in a world today that runs on information. And people want all of it now. When was the last time you actually wrote out a personal letter to someone, on paper, in pen? Why bother when E-mail is so much faster and easier? But what goes on behind the scenes when you hit the "send" button? There are thousands of people out there just like me who have titles like "Network engineer" and "Systems administrator". We keep that information flowing, and we get paid what seems like a lot of money to do it. If you've been in the market for a good network admin lately you know what I mean. The market is pushing the salary into the 100k+ plus range for someone with the necessary experience to handle even a relatively small network, never mind what the really large companies like State Farm insurance or Wells Fargo bank have.
I started work on this problem with the sysadmin on Thursday before the close of business, getting things set up, preparing for the changes etc... The company was switching internet service providers that night because the previous one hadn't provided the level of service they needed. This entailed changing the IP addresses, and DNS configurations of every machine in the building, running three different operating systems, probably two hundred machines all told, then setting up the servers, routers, and switches necessary to get it all running. It's a big job, but with six people working on it we figured we could get it done before start of business the next day. Normally you would do this kind of thing over a weekend, but the ISP could either do the changeover tonight, or wait till next week, and we needed to be online before Monday.
Getting back to what happens when you press the send button. You expect the computer to send the message, and that the person it was sent to will receive it. What happens to the message then is an incredibly complex series of storage, sending, routing, switching, redirecting, forwarding and retrieving, that is all over in a fraction of a second, or at most a few minutes. But you don't care how or why it gets there, only that it does, and this is all you should care about. After all you don't have to know how your cars engine works in order to drive it right. But someone has to know in case it breaks. And when your email breaks you expect someone to fix it. It doesn't matter what time it is, or where the message is being sent, you want it to get there now.
Its now 8 am and the network is still down. We've managed to isolate a routing problem and are in the process of fixing it. The ISP gave us the wrong IP addresses and now we have to go back and redo all two hundred machines in the building. The router was crashing and we couldn't figure out why. Two hours on the phone with the vendors support, and three levels of support engineer later we fix it. People are starting to come in to work and ask why they can't get their email. The changeover process takes us about three hours and finally everyone has the right IP, but things still aren't working right. A bunch of people use DHCP for their laptops and the DHCP people cant get out to the net. The CEO of the company is one of those people...
So what do we do? Well we hire people to take care of the network. And we give them benefits and pay like any normal employee. We also give them pagers, cell phones, a direct phone lines to their houses so that any time, any where, we can get them, because the network could go down, and we DEPEND on that network, and those people. This is where things go skew from the normal business model.
All compensation is basically in exchange for time. The only thing humans have to give is their time. When I pay you a salary it is in exchange for me being able to use your abilities for a certain period of time every year. The assumption is that the more experienced or knowledgeable you are the more your time is worth. This works fine when you are being paid a wage, but salaried employees aren't. They exist under the polite fiction that all their work can be done in a forty hour period every week, no matter how much work there is. We all know this isn't the case of course. And when it comes to Systems administrators and network engineers that polite fiction isn't so polite. In exchange for high salaries and large stock options the company owns you all day and all night, every day and every night. You are "Mission critical". High salaries become an illusion because when it gets down to it your hourly rate isn't much better than the assistant manager of the local Pep Boys.
I finally went home at 1 that afternoon. I couldn't stay awake any more and if I didn't leave right then I wouldn't have been able to drive home. The funny thing is I felt guilty for leaving. Things still weren't working quite right, and I felt like I should have stayed until they were. Even funnier is that I volunteered for this. The only part of the job that I actually had to do was to change a few IP addresses and configure the firewall, but I thought I'd lend a hand, and I couldn't do the firewall till everything else was working anyway. My wife hadn't seen me in two and a half days, and I could barely give her a kiss when I walked through the door and collapsed on my bed. The SysAdmin was fired a few hours after I left. Back to work Monday morning.
As y'all know, I've done a lot of work on the NT platform, and in my experience about 80% of NT problems can be traced to poor systems administration (about 15% more are caused by deploying it into inappropriate roles, and about 5% because of flaws in NT). Why is such a large proportion due to this cause? It's because NT looks like Win95 on the surface, a simple, domestic OS, and it's very easy for people to bluff their way into sysadmin roles on the NT platform - there are people calling themselves Domain Administrators who I wouldn't trust to look after a digital watch, much less an enterprise computing resource! And there's no way to find out until a recovery situation for most companies, as they lack the skills for a truly rigorous hiring process. This isn't a criticism - after all, that's why people get hired, to bring a skill into the company in the first place!
I've never worked with Netware, but I gather the Novell folk found themselves in a similar situation in the early 90's. A bunch of people who could manage the basics were placed in positions of responsibility, and when the situation arose that required deadly skills, they just weren't capable. And everyone suffered for this: the corporates didn't have the network support they needed, the operators were humiliated and fired, and the industry as a whole was blamed. However, the CNA/CNE programme went a long way to weeding out the incompetent, and the MCP programme is starting to have an impact in the quality of NT staff.
Any kid can download linux and teach themselves, which is a good thing when viewed abstractly, but it will definitely result in a lot more people on the market who, whether intentionally or not, grossly exagerate and misrepresent their own skills. This can only be a bad thing, it will bring ill-repute on the sysadmin profession.
I'm sorry. I don't mean to be rude, but this is the same as every other labor-related story thats cropped up in the last few weeks on here. I bet we see the same B.S. about unions and the same arguments for and against that.
;) If you've got high profile clients, you could always use a NAT solution to handle the switchover period. I think Linux could probably even do it for you.
In the end though, it boils down to one thing. If you don't like it, quit. As you said, you're making multiples of the national average income for someone your age. You could always go sell clothes at The Gap or something. Or take one of those several hundred thousand other open IT jobs at companies that have sufficient technical resources and skills in house not to end up in that sort of a situation. (And a properly designed network architecture shouldn't have nearly the issues in that sort of a switch over... but I'll get to that)
There is a tendancy for people in the industry -- particularly people who are in positions significantly beyond their realistic abilities (I'm not saying this is your case, but A case) -- for people not to stick up for themselves. If you don't like working late hours, don't. Half the time people think they have to, their management really isn't saying that, they're just assuming it. If management IS saying it, then say no. If they fire you, they fire you. If you really have any skills, you'll get another job without any problems, and if you don't, maybe thats what you should concern yourself with.
On the area of mass IP migration, I hope this story serves as a warning to anyone else working in those situations. Its not difficult to engineer your network systems to handle this cleanly. Generate your DNS entries out of a database. Generate a DHCPD configuration file that assigns internal-only IP's for each server, Also out of the database, do the same thing with your server configuration, and IP configuration. Simple scripts to do that. (And you're not using NT for real work are you? You probably could do it with NT anyway, just takes a bit more hacking)
A few days before the switchover, change your SOA's for a near-immediate changeover. Run a query against the database to regenerate your various configuration files, and bring down and back up the networks on the servers. On most systems you won't even need a reboot, and you'll have a few seconds downtime.
I've done provider switchovers at companies with dozens of servers and hundreds of clients no-sweat with less than an hour downtime. If you don't have any other downtimes, you're still doing better than EBay
As another in the countless horde of the "been there, done that, probably going to do it again" types, I feel that the only wisdom I can add to this is a little reality check:
You are not curing cancer.
You are not saving the world from mass destruction
go home.
Too often, I find that we sysadmins shoot ourselves in the foot by trying too hard to meet a user's requests. We get a project request, bust our ass to complete the project in record time, and please the user community immensely. This is all fine and good until we get another project request, and now we're expected to complete it in record time as well because "you did it once before, why can't you do it again?" Usually, in the first instance, it was not necessary for us to complete our project so quickly. Our users would have probably been happy if it was finished a week or two later, but we delivered if only to demonstrate that we could. But then we've doomed ourselves, because now the user expects miracles to happen; s/he actually makes plans based on the fact that miracles occur on a regular basis. And we chastise them for their naivete, even if we set them up for it in the first place by working hard when we really shouldn't.
Why do we work so hard? Part of it is to keep the high-paying job, but it's mostly because we take some sort of masochistic pride in burning the midnight oil longer than anyone else; working on some component that has been deemed mission critical by someone who has grown too lazy to know how to conduct business with an abacus. And we call this martyr syndrome professionalism.
But in the end, for most of us who work for corporate or academic institutions, what have we accomplished when we finally go home? Some people can receive an e-mail about "How to make $$$$ FAST" in ten seconds instead of ten minutes. Some people can make more money in less time. Some people never notice that anything changed. Their lives go on.
I'm not saying that we should be fat and lazy, but we shouldn't be burning ourselves out when we don't have to. Yes, there will always be projects and network outages and an ever-increasing pile of work that we need to tunnel out of, but no it doesn't have to all be done today. Any project that requires any sort of planning should be done without anticipating anything like overtime. If overtime is required, it has to be for a good reason. Too often, we bitch about having unreasonable project deliverable dates, but that's usually because we just don't know well enough to push back.