The Cult of DevOps
packetrat writes "I was at OmniTI's Surge conference today, which turned out to be, among other things, a meeting of the cult of DevOps. Ars Technica covered the keynote and some of the presentations, but some of the best stuff is in the comments. Google CIO Ben Fried told the tale of a really poorly engineered trading application at Morgan Stanley that he was associated with, and how the way IT was structured there contributed to that engineering and to its spectacular failure, costing the bank untold millions in stock trade processing fees from its institutional customers. He said what he learned from cleaning up the mess has informed how Google runs its IT operations, and a culture that promotes generalist skills. A lot of how he describes Google's approach sounds like the DevOps kool-aid a lot of the other speakers were serving, but it also sounds like common sense — are most IT organizations really that poorly run that developers are totally unaware their software is sending messages that are generating network storms, or network engineers are clueless enough about QoS to route leased lines into their data center through their public-facing Internet?"
I'm not sure I can take anybody who calls an attempt to make IT and Development more aware of each other a cult, seriously.
The traditional way of doing things didn't work for 30 years. Why is it that when people are trying to make (and apparently making) a difference to how companies work, they're regularly denigrated by a large subset of the very people whose working lives they're trying to improve.
Haters gonna hate, I guess.
are most IT organizations really that poorly run that developers are totally unaware their software is sending messages that are generating network storms
How is it the IT organizations fault the developers are unaware of the effect of their software? Is it the IT departments responsibility to debug software?
Having to work for a living is the root of all evil.
A while ago I thought most OSS application and framework projects including such Gnome and KDE are in trouble, due to the large use of the fumble-around development approach. Also known as the first code then think approach. All the great model-based, model-driven and agile development methods seam to be far away from the way many OSS projects are developed.
However, lately I came out of my ivory tower and stood eye in eye with experts from the industry who largely believe that they are professionals and really do great stuff. They also use the fumbling approach. The main difference is the call it agile. Even though it is far away from such an approach. I always thought that one of the problems of our new economy company back in the 2000 originated from being too deep in code and too light on design, planning and documentation. But it looks like, that tinkering is a more widespread way of software development. So I guess that leaves me with bad management (which was not my responsibility).
Most of these tinkering approaches originate in the absence of developer discipline and the "Add this quick"-management method. but I am telling nothing new. We all know for decades now what is wrong in software development. A lot of people wrote books on patterns (design and otherwise), but in the end if no one follows these patterns the problems remain.
The "Add this quick"-management originates at large by a misconception of IT and its importance for businesses and other organizations.
The answer to that question is "yes". In my experience what is it uncommon is to find a truly competent IT organization. And normaly the biggest it is the worse the problems are. But, what could you expect in an age where your IT group is composed with the cheapest guys you could hire with no experienced (and more costly) ones?. Or what about the clueless IT managers with little or no experience on IT?. They can't plan against what they don't know. Yeah, as a colleague told me not long ago, IT life is nowadays like war: long periods of boredom punctuated by moments of sheer terror.
I've been around enough to know that this 'new' 'DevOps' philosophy is just the way it always has been done at many successful companies making extensive use of technology.
I have come to associate the phrase 'enterprise IT' with those who *don't* work that way, and make their lives needlessly complicated. Of course, every last party to the mess will generally recognize it and know what could hypothetically be done about it, but only bitch in private about it and rarely ever push for meaningful change. The reason is simple, so long as it is a complicated mess, it requires a great deal of human care and feeding, meaning job security. Management can't force things to change without huge risk as everyone has sufficiently entrenched themselves.
XML is like violence. If it doesn't solve the problem, use more.
That last was a rhetorical question, right? You could have stopped at "Are most IT organizations really that poorly run". Yes. Upwards of 95% of CIOs are completely unqualified to hold their position. They are grossly overpaid talking heads with NO real understanding of the technical underpinnings.
Seriously. The single largest problem that most IT organizations have is that the people running them are completely unqualified to execute the duties required. It is not that IT workers are overpaid. Good IT workers are worth their pay several times over. The problem is that far too many are overpaid for their actual skill and knowledge level.
I like to compare current IT workers to the medical personnel of the 1700s. Many of them had some extremely rudimentary knowledge of how a body worked, but much of what many of them believed was rubbish and they had no idea about the depths of their ignorance. They have good intentions, but they are going to milk you for every penny they can and many honestly believe they deserve the pay.
Finally, those at the "top" of the food chain, highest in management, are also most likely to be the most unqualified. They rose to their position through personal connections and snake-oil patter.
Until IT becomes a disciplined hard science with the same kind of internal reviews and requirements as the medical field, you are going to have an awful lot of hedge-witches, snake-oil salesmen, and field medics running around causing more harm than good.
Yeah, that's exactly what we need. More rules, more books describing how to do *doing* something. Meta-meta-meta-everything...
And more companies that take a methodology which has quite sensible premises and transform it to a paper-pushing-based freak-child.
Whenever you see someone dismissing what you think is a genuine concern, it's not because they don't get it. It's because they don't care. Everyone has their own priorities. Dev's priority is functionality. Security is just a necessary burden. But no one prioritizes burdens. Managing priorities is a management job. So if you see a system in which important priorities are not given weight, that's just poor management. Management's job is identifying priorities and then creating an incentive structure (I don't mean salary, I mean day-to-day incentives) which emphasizes the important and de-emphasizing the unimportant. Btw, that doesn't mean that management gets to ask everyone "what are your priorities?" and pretend they have done their part. Asking that question is tantamount to shifting the burden of identifying the priorities to those who don't have that responsibility.
Any guest worker system is indistinguishable from indentured servitude.
This cancerous attitude happens in any project, across any set of roles, once things get big enough. If the organizational culture is rigidly defined roles and CYA, this crap is inevitable.
There's no silver bullet for this obviously, this is a pure management problem.
They see things as a triangle: developers, Q&A and IT, and in the middle DevOps.
That's not at all how it works from my experience.
While developers and Q&A are very close, developers are not particularly close to IT at all, while Q&A is.
Testing is what requires large infrastructure set-ups. Development doesn't really: the Q&A guy directly gives you access to a working configuration that demonstrates the problem. As a software developer I never have to interact with IT.
Devs need to drink more of the DevQA cool-aid too.
In an attempt to answer the barely articulated question buried amidst the run-on editorializing, I will answer "yes". Many otherwise extremely talented developers carry little understanding of how their code performs once exposed to chaotic and nondeterministic production environments. Likewise many ops engineers hold little appreciation for the motivations of and the pressures on those developers.
Much of this isolation stems from culture, but at least as much is from experience. Being paged at 3am because an entire pool of servers simultaneously ran out of file handles or because some unknown engineer failed to properly sanitize their input tends to temper ones soul. Having your deployments constantly stalled because one failed to meet some ill-defined barely document seemingly nebulous operational requirement too tends to galvanize ones opinion about "those people". Both teams scream, "they just don't get it," I lot. And they are both right.
The result if these tentions are unaddressed is clear. Many live system outages and stymied projects.
Devops, cult or not, is a conscious effort to bridge these frequently opposing forces inside technical organizations. The teetering platform of fast, cheap, and good need not be out of balance. Rather it is the express job of the devops engineer, by having equal interests both in execution and in sustainability, to create a world where deployment happens freely and fast, where quality is not sacrificed, and where we are not mindlessly throwing money at the inefficiencies created by conflict.
The kool-aid is actually a pretty tasty beverage. You might take a sip.
The wiki page on "devops" sums up part of the idea to involve "deep cross-departmental integration"
While there is a lot of agile practises around Google, including "stand-ups", iterations, scrum, planning poker, and whatever, there is hardly any "cross-departmental integration". Communication is as poor as at any other huge behemoth. Just like we've read about multiple internal teams fighting each other at Microsoft and Nokia, so do many teams at Google create similar products, both externally and internally.
In fact, there are multiple projects and initiatives to cut down on duplicated projects. Think about that one for a second.
the software industry in silicon valley by and large does not have a role 'architect of product'. we have architects of technology. the product comes together as various teams: tech pubs, qa, tech support, professional services, it, ops, manufacturing, marketing -- all contribute their efforts to take the technology delivered by the typical development team and transform it into a product. DevOps is one attempt at mitigating or filling this gap to varying degrees depending on the folks involved. Now what is interesting is, "where does DevOps" report into? CIO? is so, where does Development report into? and now we are back to ownership...
Wish I had mod points handy.
The separate department structure for computer systems encourages everyone to take the approach of 'Not My Problem' even when they could fix it. From a broader perspective, the barriers among Dev/QA/Ops are artificial: it is all computer people doing computer-ish stuff. External constraints are vastly simpler than the construction analogy -- no opcode shipments stuck in Customs, no delays from rain or freezing temperatures. If the computer industry can not overcome these lesser barriers there is little hope of handling bigger barriers such as Marketing|Development.
And they really need to be overcome. Industry has spent a truly astonishing amount of money on 'computer systems' with frequently negative results. Recall the number of anecdotal reports of large computer system efforts that are abandoned after YEARS of work for many millions of dollars. Although the computer system company has a 'loss' of anticipated future revenue from the project, the host company loses real money and irreplaceable time. Further, they are often left with a confused and discouraged work force that will affect them for years to come.
Bent, folded, spindled, and mutilated.
Historically a sysadmin has been able to and does write, fix software, and administering large scale systems has always meant templating and then deploying from updated templates... which means having a build system which can build to a template usually automatically (it's boring) which apparently is now called continuous integration. You also need an infrastructure designed to easily deploy changes. VM or physical is almost totally irrelevant, VMs make life a little easier.
Lets see... Infrastructures.org for example was definitely there mid nineties (and still is, cool.). That's almost a whole generation ago, how old are you?
I have no idea what makes people think this devops stuff is new. Is it just the fact that there are now thousands of newbies trying to make names rediscovering good practices? Is it just that we now have more large scale systems, or is it that we now tend to have highly distributed vs centralised systems?
And it's not development. It's engineering. The difference is maths.
Go on, get off my lawn!
Deleted
I think it's more of a case of developers having to now do a lot of sys admin work, not wanting to do it, not having the expertise to do it, but trying to make out that they can 'develop' things away and it will all be easy. Small companies have this problem, and small IT companies like to believe they can just hire in sys admins and farm crap out to places like Heroku and Engine Yard.
In my last job I pointed out to our developer when the development work dried up, as it does, that our clients paid for their applications to be kept running and live and it was the sys admin they were paying for. He was most upset.
I'm guessing in the 90s, it was pretty rare to actually use those practices. A new shop would have one server called "server". Then it would get a few, named after the Seven Dwarfs. Eventually, you would get to the point of needing to get a new sysadmin who could handle large-scale deployments (or have the old sysadmin change the way they worked), but that would be around the post-IPO stage (if ever), not the "three guys, one of whom is figuring out how to script EC2" stage.
By scripts.
Add to that that devOps is much, much harder than system engineering. A DevOps engineer needs to know enough about multiple programming languages *and* algorithmics *and* data and telecom infrastructure to tweak the programs that are really controlling the network. There aren't many people like this at the moment (so if you want to move as a programmer to a $200k/year job, understanding the packet datapath and it's controlling plane is where it's at, provided you're good at optimizing algorithms. The first company to make a real control plane that merges control of the application inside the host and control of the network above the host, controllable by an api is going to make a bundle).
Most sysadmins and netadmins these days know a bit of scripting, and have lots of knowledge over specific details of the OS/switches/routers/firewalls, like the access subsystems or networks and firewalls. But almost none of them could tell you exactly what happens, in code, when ssh refuses or accepts a connection, and what access that implies for all parties involved. I've yet to meet the first sysadmin that can tell me how to optimize a non-trivial problem. An example devOps issue that is requested is to increase that access (e.g. we control the firewall, and we would like it to log ssh connections. Can you tweak the ssh daemon with as little interference on the client side as possible. And then use that to log all access to production systems, in other words : execute a man-in-the-middle attack on the firewall to get full accountability of the production network).
But this is child's play compared to what the devOps guys really want. They want an API they can use for problems like "we have a network with nodes N, links L(n1, n2, capacity, cost function), peering links P, and data transmission requests from the applications R(app1, app2, dataSize, profit, constraints). Configure the applications and network so that cost is minimized, profit is maximized and all specified constraints are followed (e.g. "maximum latency for video : 2s, max latency for voip 50msec, ...). Oh and do tell us which improvements to the network would bring the most benefit in terms of $ per increase in maximum bandwidth transmitted."
If you have a company whose profit is mostly determined by the capacity of it's network, any advantage you have in this is going to pay out hugely. For a netflix, amazon, bing, microsoft's online office and other apps, google, yahoo, any online advertising company, any online voip company, ... having a functioning algorithm like this means a vast advantage over competitors.
The big issue is that it's much easier for a capable programmer to become a devOps engineer than it is for operational people to become devOps engineers. Not that we have any kind of surplus of capable programmers at all, but I doubt that even if they have no choice many sysadmins will become good programmers.
So the issue is, this is about automating network/system operations the way ford automated car production : if you work in operations, one of 3 things is going to happen :
1) you move to development/engineering, and spend most of your time writing and debugging programs
2) you become a robot, a call center employee, the interface between a computer system and external parties, in other words, your job is reduced to "you want fries with that ?" level
3) you're fired (which it will be for a lot of people)
For the vast majority of people in operations it will mean a gradual move to option 2, and after a few years option 3 if you didn't manage to get recruited for engineering ... Of course this does mean that if you do succeed it will mean an increase in salary, potentially a big one. But most people won't succeed (which is of course the reason for the salary increase in the first place).
And that is of course the big issue. The IT workforce is going to drop massively if these guys succeed, and consist entirely of call-center employee
It can be summed up like that: If the problem will arise in an area he isn't responsible for, and if implementing it that way regardless is cheaper, faster or less problematic for his department, he will implement it that way.
Running the traffic through the internet instead of the LAN? Why not? It's not going to be the development's problem (since that's operation's problem), and it's faster to do it that way, so it's done that way. A user interface that makes users beg to break the programmer's fingers and make him a consultant? Not the programmer's problem, it's the user's problem, and if he can implement it faster with a shabby interface, it will be done that way.
The problem isn't that the people wouldn't know that these problems will surface. Often they know that quite well. The problem is that this is not a criterion, while the time they spend on solving the problem is. Hence, whatever lets him get it faster off his table will be the way it is done.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
This sounds more like a blunder on the IT-Sec's department than anyone else's.
Security is important (hey, I can't speak against my trade, can I?), but the moment it keeps people from being able to do their work, they will start looking for workarounds around the security system. And this is about the WORST thing that you could possibly have as an IT-Sec. It's akin to an inside job, only without the intent to actually cause trouble.
But intent doesn't really matter when the data is lost...
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
...and I'll even try to remember to check back in case people respond.
Assumption: the trading application being critiqued is the same one that was there when I was an IT consultant at Morgan Stanley. I left in August 2000.
I know the application well. It was developed by a department headed up by Vinny (whose name is withheld because...I'm senile and don't remember it). I worked in the department that wrote the messaging infrastructure that was used by every application on the sell side of the firm.
If the application is the same one then the mere fact that it was still in use when Mr. Fried left is a testament to the application's effectiveness. Would they do it better now if they were writing it from scratch? Of course. But Morgan Stanley has 3,000-4,000 IT staff in its ranks so they could easily do so if the application were as bad as he says. And the messaging infrastructure...well, I have no love for the original authors (no hard feelings Steve and Arthur...I'm lying of course), but that subsystem was extremely robust, predated TIBCO and Talarian, and provided more functionality that those two products until after they were on the market for several years.
cults are stupid.
if you want to name yourself something stupid, i don't care. Stupid people do stupid shit because they are stupid.
Be seeing you...
I don't want to be an asshole.
But if you don't adapt, you might loose your job. Especially in IT.
Is that something people are surprised about ?
New things are always on the horizon
Lots of people are going to lose their jobs, because only the better ones will be retained. Managing systems manually is a skill that will go the way of car building skills. That's a lot of people.
And IT is just about the only sector that is supposed to be adding jobs to the economy. It will not go down smoothly once this starts happening.
Sorry, English is not my first language
New things are always on the horizon
It could also be that we end up being a lot more effective. That is after all the whole point of IT. I've never made much sence to me why people did the 'manual labor' in IT.
But if I look at education in my home country: the Netherlands in Europe, I can tell you there aren't even enough people studing programming to fulfill the available positions (well, I don't know the real numbers, but that is what they keep telling us in the local IT-press and looking at the people that get hired I'm not all that surprised).
So maybe there is room for people to be retrained.
I don't know, we'll see.
New things are always on the horizon