Should Developers Have Access To Production?
WHiTe VaMPiRe writes "Kyle Brandt recently wrote an editorial exploring the implications of providing developers access to the production servers of a Web site. He explores the risk introduced by providing higher level access as well as potential compromise solutions."
Whenever an error occurs that I can't replicate in a dev environment, I'm always SO tempted to hop into prod and start adding in some output statements.
Yeah, it's probably a good thing I don't have access to prod.
LOL! No.
All we do is run scripts and get in the way of developers trying to "git'r done!".
No. It just encourages sloppy development practices.
Would you want to drive over a bridge that wasn't actually designed and engineered, but rather they just piled some stuff up and will fix it if it collapses? Or have a surgeon chopping you open with the idea that they'll figure it out as they go? So why would we want developers to work with the expectation that they get to intervene at the last instant to resolve their failures?
It is my experience that giving development access to production gives you a production environment that looks like it has been vandalized. Although meaning well and trying to make the best application as possible; they need their own development lab, and their own staging / production lab.
No.
If needed there should be a mechanism to automate bug reports in a meaningful way, as most professional software has.
If you want to have control over your production code, you need to have assurance that it is not changing in an uncontrolled fashion. Allowing developers to have access to production locations makes it all too easy for this to happen. Read-only access allows developers to see the running code and perform file comparisons which can be useful in troubleshooting. They should never need more than this.
And in some cases, even read access can be risky -- I've seen production web sites with resources linking back to development server URIs. It's a good idea to firewall your production servers in such a way that it is not possible for them to reach resources on development servers. This shouldn't prevent developers from being able to read the files on the production server, though.
You see? You see? Your stupid minds! Stupid! Stupid!
One would hope that developers have a greater level of professionalism than that. However if it turns out that they aren't mature and professional then they shouldn't be surprised when they end up losing their jobs and facing criminal charges.
Everyone agrees that developers should never have access to production...Unless they're the developer, in which case it's different.
Its a good practice to keep them separated, but in the end its just a pissing contest. The server admins don't want some filthy dev messing with their stuff, and I can appreciate that.
However, admins often lack appreciation of some dev-specific issues, and their ignorance can lead to problems down the line.
In the end, its the best practice to have everyone work together sensibly, than throw down inflexible rules that cause more trouble than they prevent.
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
So, yes.
This is why there is a change control process, and a testing environment.
If you're doing it wrong, you're asking for trouble.
The price is always right if someone else is paying.
There's no correct answer to this question. It depends on the size of the organization and the nature of the system. I've worked in different companies that have been on either sides of where I thought the line should be. The line is drawn in a very different place for a 20 employee company than where it is in a 20,000 employee company.
The day developers can write code that compiles the first time, then yes, otherwise, jesus, no.
I work as an Oracle DBA for a mid-size company, and I provide a day-old cleaned copy of production in a different environment/box, and it does the trick.
Hi, I Boris. Hear fix bear, yes?
Yep. I did this for a few of my applications. When the business would call and had a major problem that needed to be resolved immediately, I would use it instead of the 1+ hour of time and 5 additional people involved.
Was it "right"? No, but when the Production support/engineers are worthless, what is one supposed to do?
I think it's helpful in analyzing real-world data and getting an idea about real system loads, testing issues to see if they are in the wild today, etc. For a good developer, it makes life much easier.
In a very healthy development ecosystem all this data is replicated and there is never any need for a developer to touch prod. In the development ecosystems that exist in the real world though, most are very unhealthy, frustrated by ham-fisted security, process flaws, red-tape, inconsistency, and incompetence ranging from scattered to mostly cloudy.
The answer is, do you have the class of developer that knows what not to do and desires to play nice, or do you have the usual.
As a developer I can tell you that it's impossible to test programs properly and thoroughly without access to production data. However, developers should NOT be granted access to production logins/sites - production data should be copied into development work areas so that developers have an appropriate "sandbox" in which to work/test.
I don't think he meant malicious backdoors. I read that as backdoors to allow debugging/etc.
And the concensus is ... NO
Who let this question through? It doesn't even seem controversial. I am not aware of any good reason to routinely give developers access to production.
Censorship is obscene. Patriotism is bigotry. Faith is a vice. Slashdot 2.0 sucks.
I completely agree with this, but I also have an undying commitment to my customers and when it's my name on the application and as the primary POC, you better believe I'm going to do everything I can to support my end users in any way I can.
Yes, I could've lost my job or got in trouble..in the end, I would never condone it or defend it as reasonable, but I just couldn't stand to be shackled and ultimately cost the business more money by having an entire department sitting around with their thumbs you know where because of a stupid process.
I dislike blog postings on Slashdot as a rule - they can get a Slashbox like everybody else - but the arguments made in the article are well-reasoned if somewhat short on detail. How do developers troubleshoot in a production environment? The article acknowledges that troubleshooting in production is necessary and mentions the installing of software, but installing software alone changes the environment (generally a bit of a no-no for debugging, due to Heisenbugs) and debugger hooks can pose a potential security threat (a big no-no for sysadmins). Further, there was no discussion as to whether developers should be the ones troubleshooting - first rule of testing is that you should never rely on programmers to test their own code. They're way too close to it. Either have testers or have programmers test other programmers' code. It is the only way to ensure that there's proper coverage of sufficient corner-cases.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
As a general case, I say that no, developers should not be given access to production. While giving us access to production might seriously speed up the resolution of an issue, in my experience, it always eventually introduces more problems than it fixes. It also tends to create an environment where testing is devalued because it creates the perception that any issues can be quickly resolved in production. This encourages management to compress timelines and causes the dev team to waste a lot of time fighting fires.
The best environments I've worked in have a fully replicated "break fix" mirror of production that can be used to test and rapidly deploy emergency production fixes outside of a normal test cycle if absolutely necessary.
Biggest issue my cow-orkers and I have is that the sysadmin *claims* that the dev box and production box have the same packages, configuration, etc. but in reality, they don't. Most often we find out when we ask for production stuff to be copied over to the dev site to test errors, etc. and just loading it - which works on the live site - generates errors on the dev site.
Don't blame me, I voted for Kodos
That may very well be a true for a small shop, but in a small shop you may not actually have dedicated systems for a complete Dev cycle anyway (Dev, Q/A, Prod) and if you do, it may be acceptable to let the devs have access to production systems (they may even be the only people capable of maintaining them anyway). In a large environment review processes should prohibit backdoors - if not, then you need to fix the review process. But somewhere in-between are those businesses where it can be difficult to balance system integrity and security with development access.
I use error handling to give me as complete a picture as possible of stuff on production. I don't want anymore access to production then I absolutely need.
~~ Behold the flying cow with a rail gun! ~~
Yes, certainly developers should have access to their production machines.
No, they shouldn't be allowed to do anything they want with them.
Troubleshooting application breakdowns are much easier for the developer to do. Thus, the access should be limited to logging data, etc. Unless the admin worked on the application itself, diagnosing those kinds of issues through someone else can be extremely difficult at best.
Non impediti ratione cogitationus.
If you are a small software shop then I can see reasons for allowing your small technical staff to have access to production. It's all well and good saying that only the admin of that server should have access and there's a full rollout procedure in place to be followed only on certain days, certain times; but even when I've seen that sort of structure in place there are times when it's useful for the developers to have access to production. Nothing is perfect and we'd all love to have multitude's of staging servers, replicating the typical load and uses of production but for a hell of a lot of (non critical I'd add) systems that just doesn't happen.
There simply is no one rule fits all. Sometimes I wish we had extremely rigorous rules & regulations in place - I'd probably get to go home a hell of a lot earlier. I'm not suggesting you start chucking exceptions all over your checkout code on live but I think you should asses your own situation (and staff for that matter).
jaymz
With some fore-thought and some discipline an application can be developed with very robust logging techniques. It takes development time, but there is nothing cooler than asking the production guys to turn the logging detail up for a few packages and seeing tons of data in the logs. It's not perfect as you can't log every variable at every moment but it certainly does help.
I understand some shops can't or won't modify the logging levels on production servers.
Blar.
I work in an environment where the devs fix bugs before adding features, so the code is stable almost all the time. I have less than 1 callout a week that's caused by something a dev has done to the code.
We hire the best devs, and work in an environment where fixing bugs is more important than adding features. The result is that our devs get full access to production, and even offer to provide support in order to ensure that they're the ones that are woken up if something they've broken falls over OOH.
I've been at my current company long enough that I'd forgotten there were places where devs and ops didn't trust each other.
If you are running into any issue that requires the developers to have access to production then you have much bigger problems than access control. Developers should need access to development servers only (which really should just be there local box or a set up identical to the supported configuration if you need to test things like clustering or different platforms). Developers should not even require access to testing environments. If you have valid contracts and adequate testing then the only issues that should get to prod are environmental issue, things that can be handled by administrators.
On the other hand, denying your developers access to anything, be it production servers, IM access, youtube, is just asking for them to circumvent the system. So your developers should never need access to production servers, but I wouldn't waste time trying to lock them out of it, or else they will work around those locks if it turns out that they do need it (because your process failed).
Is that font illegible to anyone else? I had to turn Readability on, it was so bad. Who the heck thought it was a good idea?
I am a developer. Our environment team is practically retarded. I have to go on-call during DR tests because it is too complicated to restore an image and double click an icon. God forbid they have to install a App Server, or configure 35+ JBoss instances (default is 1 instance per box) to start, tune for memory usage or performance or both, etc. Just last week they decided to upgrade the os to 2008R2. no need to test anything right? Sure all of the code is 32-bit but that wont have any implications will it? Trust me I would much rather not have access to prod, but as the saying goes "Better us drunk, than them sober"
Developers should have read-only access to production. In this way, they can investigate what is happening but should on no account have any ability to alter anything.
Speaking as a developer, I want/need read-only access in production. All too often I need to dig out information while troubleshooting, and most commonly I don't know what all bits I'll need when I start. If it were easy to identify exactly what I'd need to find the problem, I usually already know what the problem is. The hard ones are the ones I can't replicate in development and I only have a starting point, something that won't identify the problem but might help me narrow down where to look next. In those cases the only place I can look is production (since I can't make it happen in a controlled development environment) and I can't give the admins a list of what I'll need (because I need to dig through logs and config files before I'll know what I need to look for next). And if we've gotten to this point, it's probably a priority problem impacting production so it needs to get fixed Right Bloody Now.
OTOH, while I may need to look at production, I don't need and don't want the ability to modify production except by going through the admins. This, of course, also requires admins who can follow basic instructions like "Look at config file FOO. Find the line in section X that starts with Y. It's value should be XYZZY followed by the number 1. Change that 1 to a unique number for that machine/instance. Repeat this for every machine/instance.". But all too often the response is "That's too complicated. Can you just give us config files to install?". And of course when I ask for the current config files, so I can be sure I'm not overwriting any other modifications to them (which may have happened since the admins control them and do modify them), I get "We can't do that, they've got production passwords in them.". Now all I can do is throw up my hands and go "Whatever.".
I develop mostly internal apps for a very large company, but even for the external ones I'm the one who moves files to production. It's not that way for every department, but for ours it works. Better than waiting half a month to get a type-o fixed.
It would be nice if I worked with support people who knew what they were doing. I don't have access to certain environments but if something goes wrong I'm supposed to fix it somehow. But then again, I work in a robot clone environment where software development is some sort of alien concept. I need a new job.
Under no circumstances are any units in a company to have even contact with each other, much less share work product. This leads to unacceptable things, like collaborations over lunch and generally helping each other out and making a more efficient company. If we have a more efficient company, that may mean we have to lay off even more employees, and this cannot happen in this economy because we'd then pass reporting requirements for layoffs and be subject to higher FICA taxes.
Just because you're paranoid doesn't mean they aren't out to get you
It's not necessarily a case of the admins versus the developers, its more of practicing good data governance.
Our developers used to have direct access to all of the production databases. This was bad enough, but because of this the organization permitted them to directly "clean up" databases (meaning they wrote to tables directly), we had data that was being changed without the ability to really know who did it. The DBAs hated it and the developers were extremely uncomfortable doing it but it happened anyway. We eventually had a real process audit and the auditors had a field day.
Needless to say we changed. I hope.
Before I would have said "at least read access", since in my experience the bug reports are usually very inadequate and you need to know exactly what the user was seeing and any settings/configuration made in production. Write access was already rather iffy before, and now with most servers being virtualized the best way would be a fast track to create a new clone of production for the ugly cases. We used virtualization heavily at a client I was at, they originally had two environments, test and production. We did a major upgrade, and at most we had 5 environments:
1) The old production, ready to be resurrected in case of OMG problems
2) The old test, used to verify upgrade results (not old prod as we didn't want people making changes there by accident)
3) The new production, obviously
4) The new QA, where the customer was doing regression testing
5) The new test, where we kept working on the next delivery
Being able to suddenly scale up to five environments - eventually down to two again - was brilliant. The cloned it, changed a few IPs and away it went...
Live today, because you never know what tomorrow brings
The author/owner of an application should be on the hook for keeping it running and for it's failures. To separate these responsibilities creates perverse incentives and encourages fire and forget development with no thought to future maintenance and troubleshooting. At the same time, to discourage the practice of 'keeping things going by kicking them', access should result in a detailed audit trail, which would be necessary anyways for regulatory compliance.
This doesn't work very well without other arrangements in place, namely, a standardized version control and deployment system, hardware as a service and fleet maintenance systems and a hetero-generous service based architecture.
Realities just a bunch of bits.
Super simple question: yes, they should have read-only access.
Unless you are concerned about privacy issues, but then you probably solved those for your sysadmins too, so no biggie.
In my experience, programmers with production access cause more visible problems but have substantially higher productivity (6x to 8x).
My company has previously had unbelievably tight controls put in place as a result of SOX which added a 45 day overhead to any change (except emergency changes) regardless of size (which means small projects were no longer approved- only home runs).
Now we are going to SAP. All that is gone for now-- productivity is off the charts. I'm sure they will start locking it down after we get the first production environment settled but it is nice to be productive again.
She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
I found it's best to have the admins restore a copy of Prod to test or dev then reproduce it there and fix it. Updating directly to prod or debugging against prod should always be a last resort.
I am one of the few people who can run correct code the first time round. I am also proficient enough in OS matters to be able to circumvent access to locked down resources. So I don't care what this post says, I'm doing it myyyyy waaaaay.
I work with world class developers and an equally competent team of operations folks. The amount of disconnect between the 2 sets of folks is amazing. The developers black box stuff out of their consideration (e.g. setting up load balancers, with or with out affinity, not littering certificates all over the place, the amount of privileges a service needs etc.). The operations folks ignore other aspects (a cache that's hard to build could be lost after a process recycle, not version controlling their ad-hoc queries/sql jobs etc.)
Even if I take out considerations of giving developers access to customer sensitive data, the mere fact that most developers assume that a complete clean reinstall is as trivial as going back to a previous VM image (wrt time considerations) makes me pause and not provide them access. Add to the fact that developers talk in logical terms (regardless of scale) while operations talks in physical terms (actual machine names, drives etc.) and watching them communicate is like watching 2 blind men describe an elephant to you.
Our team makes it mandatory for developers to request for clean concise information from operations who procure it on their behalf. Yes it is slow, yes, it makes the developers having to batch their queries together but I can't imagine doing it any other way right now.
Not only no, but hell no.
Sorry, was it a trick question?
=~ s,(.*),<sarcasm>$1</sarcasm>,g if any_point_you_wish();
I'm a sysadmin who used to work at a web development company. As a one man team I managed ~40 servers, 8 of them being in production web servers hosting 200-300 sites each.
Web developers should not be allowed anywhere NEAR a production server. The last time I let one onto one, I spent the next day and a half fixing what he broke.
On the flip side, sometimes developers will just flat out need access. In this case, at least in my experience, a clone does the job just as well. You just need to have a couple servers sitting around specifically for development use, and then have a way to clone machines to this hardware in short order. In my years of experience I have yet to come across a problem that absolutely needed to be tackled on a production server.
Can anyone tell me why 99% of
Mister Potato Head! Mister Potato Head! Back doors are not secrets!
I worked at a small company where I was the sole developer, and had access to the production system. I was able to make changes and roll them out quickly, and only once or twice did I screw something up (and I was able to fix it right away). The problem is that users started coming to me instead of the sysadmin when they had problems. Then the sysadmin/tech support guy got all butt-hurt about it and declared that he would no longer support anything I wrote. As a result, I ended up having to spend way too much time teaching users how to use their computers (half the time it didn't even have to do with any code I wrote) and didn't have enough time to perform my primary job function.
I'm sure you'll say I should have just refused to provide tech support. But when you work for a small company where half the employees are either family members or personal friends of the person who signs your paycheck, that's not always a possibility.
(learned recently in my BS-Information System Security program)
I worked as an IT auditor for a very big public accounting firm. Reviewing IT controls was a key part of the financial audit (and more so now with Sarbannes Oxley).
If I found developers had access to production, it was automatically a "no reliance" finding.
This means the financial applications are inherently untrustworthy that the financial auditors would have to review original source documents for validation.
"No reliance" meant the audit became much more expensive as a result.
Also - if the auditors can't rely on the financial reports, should management?
If your children ever found out how lame you are, they'd murder you in your sleep
Even when the suggestion of "would you like root on this internal box?" was put to me, my answer was always "No". I write code. Others test it. Admins deploy it.
People specialize for a reason. If you want half-assed administration, give root to a developer. If you want half-assed code, let admins write software. If you want half-assed testing, have admins and/or developers do it.
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
If I found out a developer changed something in a system I tested without it going through the proper process...
Let's just say I would be very interested to hear why they shouldn't go back and rerun everything again on THEIR dime. (at the very least) In fact, we DID do just that to someone who let a revision slip into their UUT because a developer felt it would fix something and make it perform better.
It wasn't too expensive of a mistake, just $250,000 to rerun that portion of the test. Although that was just the physical cost of performing the test. I don't even want to know how much it cost in labor especially considering it was a 22 day test.
Even if the change was removed, how do I know that without physically verifying checksums (do I even trust it anymore since their CM process is obviously flawed)
Out of modpoints but really liked a post? 1BDkF6TtmmeZ3yqXbz9yhdYVqRYnwFoXDj
I'm a developer, and I have to say... HELL NO!
.. no one should have to access to production servers for anything other than pure upgrades and if necessary to read logs and inspect monitoring programs. If your organisation can afford a decent test server with the same basic hardare as the poduction servers you should simply clone it to the production environment.
In reality of course there are always mysterious problems that only seem to happen when a system is put into actual use. To catch as many of those a possibe, and without having to resort to panic changes in your production environment, both the test and production servers should have a decent set of monitoring software and the ability to produce as much logs as possible.
If you do all this and still have problems developers might have to look at the production servers. But any changes should be addded to the test server first, both to test it and to make sure that the test server is always updated when the production servers are. Once the fix is ready whoever is responsible for the production servers should approve the changes and make the update. This also makes sure at least two people know about any change made to the production environment
If you organisation is really small I don't really think it will matter is your developers have access or not. Then you should just give access to whoever you feel can handle it on a person to person basis.
It wasn't strictly a 'backdoor' but a few years ago when our ADABAS DBAs decided to lock the developers out of the production db I wrote a suite of direct-calls ADABAS/Natural programs so I could diagnose what was going on in the live db (in ADABAS the database to use is just another parameter, you see). It worked so well that people were coming to me for information instead of using the applications designed for the purpose, and I ended up rewriting my tools in regular Natural for the users.
Once I was a four stone apology. Now I am two separate gorillas.
Of course developers should have some level of access to the production environment. No matter how good your test environment is, it's not going to match the live server in load, or what's in cache, or the concurrent access to some resource, etc.
Our process was to have one person with access, investigating whatever problem via the SQL command line, or the Rails console (let the RoR jokes commence), with another person watching, to make sure they were doing select * and not update or delete. Even then we'd execute stuff in a transaction or sandbox so that we weren't making any permanent changes, although changes to memcache generally can't be rolled back so easily.
I've seen admins, who are adamant that dev not be allowed to change anything, change psql configurations at a whim, crippling DB performance. And then blame dev for poor response times. That's so not cool.
Having lived with this scenario for about a year, I'd say that the biggest issue is not whether devs have access to production, but rather HOW they have access to production, and what the change-management process it.
Access to allow them to push updates from a proper repository to the live server: OK
Full SSH/root access to the production boxes. HELL NO.
I can't think of how many times my head nearly exploded because a dev pushed some massive freaking update to a live server and torched the box. We're talking uploading an entirely new app, without proper stress-testing, to a live in-use server. End result... leaks, bugs, and the live site explodes. Sometimes it can take hours or DAYS to figure out why resource usage suddenly jumps through the roof, and all the time the way the dev talked the system had been in place for months (being fairly new to that company, I wasn't really sure myself).
Oh look... thousands of SQL queries locking the tables due to unoptimized database tables... OK lets sift few a few THOUSAND files trying to find where that little query is running from...
Again, the big issue wasn't with devs having access to production. It was with devs having privileged access to production, and either
a) Messing with installed modules/software/whatever... that's an admin's job
b) Installing updates without checking, and without notifying other departments.
Similar issues ensued with even just basic updates in cases where there were unchecked bad links to internal-only testing URL's, etc. Of course, when stuff was broken, nobody would be willing to take ownership for the mistake, which meant that IT spent excessive time tracking down the changes to fix them, and then the offender(s) have gone home for the day several hour ago while the admins are spending tons of (salary... unpaid) OT tracking down why the server crashed-and-burned.
A change-management system does fix a lot of that. Something as simple as "git" should be enough for many situations, and easy enough for people to understand. It allows you to revert a major code-screwup, isolate when bugs occurred, and find which people are submitting unapproved updates that break servers. If you still need to restrict dev access to a production machine, then only allow admins to do the "git pull", upon approval of the edits made by dev.
There's plenty of arrogance to go around between development and sysadmins, and often some overlap between the two roles. Having a change-management system and proper access controls adds a big factor of accountability and rollback to things, as well as helping to prevent people from clobbering each others updates.
http://www.reddit.com/r/blog/comments/d33x7/reddit_is_hiring/c0x7aq5
I tend to feel the same way. I don't feel like developers who don't understand their production environment, or system architects who don't understand the code, can be fully successful. As a lead developer for a fairly high traffic site, I thought a lot about how the code and hardware interacted, and what the various limits were and how things degraded.
If only the operations folks could handle the design and implementation of indexes and otherwise handle optimization / speed related issues, then the devs would not need access to production servers. There seems to be no frictionless way to balance.
"Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
...when the Production support/engineers are worthless, what is one supposed to do? Send out resumes?
I've abandoned my search for truth; now I'm just looking for some useful delusions.
Like people have said it all depends on your situation. What's horrible is if you have a system admin that doesn't understand that.
Suffice it to say, I did leave the company because I couldn't take it. The company itself was phenomenal, but they hired and promoted from within the retail side of the business..so we wound up with a lot of people who really had no place in IT, but due to their tenure weren't going to be leaving the position for who knows how long.
If you can't take the time to make an ECO, you've got no business mucking with the production server.
:(){
I do not think developers need access to the back-end of production servers. Being able to read the production logs, and having produciton code that can spew meaningful errors should be enough.
Be able to set a debug=true, or --verbose flag in the code, that spews a lot of information, as needed.
For a mid to large environment, web server type stuff, it is my opinion to go a route like this:
Ideally have the developer develop as its own version tag (cvs or such) on their work station in isolation, then move to a dev environment to vette out any gotchas.
Once vetted, then have the frozen tagged updates applied to the staging environment, which should be 'as close to as what is on production' as possible. If all goes well, great: becomes part of next release candidate.
Put tested release on a subset of production.
Once that seems to be doing well, then migrate production wide. (Certainly developers should have read access to server logs, as a good prod environment can even send prod logs to be duplicated to somewhere 'safe' for analysis.)
Uh, Linux geek since 1999.
There have been MANY when there were issues that ONLY occured in production where we had to point our local development environments to production databases, servers etc.. to debug the issue. 9 times out of 10 it's an environmental issue. Something an admin changed on a server setting or some person changed data in the database that caused the issue. But without being able to debug against production you wouldn't find it.
The Truth is a Virus!!!
Having worked in a small shop where the devs had access to production, I can tell you it is not a good idea. Just in case you had doubts. A few are responsible, and do know better than to screw around with the production system. But many just don't understand.
We'd install the latest version, and test it and find minor problems which they'd try to fix on the spot, by editing the source code right there on the production servers! Even when it works, which isn't nearly often enough, that's not a good process. If there's some problem, the live site stays down while they figure it out. Sometimes they'd have to give up and revert to the old version, which had to be rebuilt from source (took about 20 minutes) as the update process they were using threw all the old stuff away. If that sounds insane, there was a reason for that. It was an attempt to make the devs keep their code clean. Had too many cases where the only reason a new version worked was because it was installed over an old version that had maintained various files, states, caches and what have you, and which would crash in their absence, i.e., on a fresh install. Still, it was so easy to force a fresh start while keeping the old stuff, just in case, that trashing it never did make good sense. After all that fun, they'd try to copy their fixes back to their development environments, and that never went smoothly either.
This also meant we had to have the build environment on the production servers, so they could compile the code there. Adds more complexity, and makes updating take even longer as the production servers are given the additional burden of compiling everything. Was quite usual to have downtimes of 30 minutes when everything went smoothly. When they didn't, well... an hour or more was common. One thing I was tasked with was setting up a suitable message to be served to web site viewers, complete with estimates on when we'd be back up. The developers demanded some handy way (through a web interface of course) to set the estimated time. What a waste of effort that was. I set things up so that updates could be done nearly instantaneously, with code already compiled elsewhere. Don't need estimates when it's always under 1 second, right? All the production servers had to do was install which I set up so it happened while continuing to operate on the old version, and then flip a switch to change to the new version. No more lengthy builds on the production machines. And definitely no more hacking on the source on the production boxes! If there was a problem, the switch could be as quickly unflipped. If there was a big problem, say something that corrupted the databases, they could be rolled back or restored, which took a bit longer but nowhere near 20 minutes. The switch would have been truly instantaneous except for some very hastily and poorly designed daemons the devs never found time to replace or redo so they'd restart quickly. As it was, only that part of the switch was not instantaneous.
By all means, let them look at logs and such. Here, remote logging can be a help. But when they try to treat production like another sandbox, draw the line. If a problem can't be reproduced in a test environment, the first thing to do is NOT to go running to the production environment to poke around, it's to figure out why the test environment isn't reproducing a problem and fixing that. One other difficulty to note: sometimes debugging messages were spammed to syslog, or other files. Have to be careful about that sort of thing, or the production servers could very quickly run out of hard drive space trying to hold millions of debugging messages. Logrotate can be overwhelmed. If hundreds of debug "tweets" are being generated every second, can take only a week to run even the largest hard drives out of space. Even without that, the software was a timebomb. It had enough leaks it would crash and burn in about 2 months if not restarted, and only the weekly updates masked this. Fixing such problems was, understandably, a low priority.
Intellectual Property is a monopolistic, selfish, and defective concept. It is "tyranny over the mind of man"
Comment removed based on user account deletion
I am a developer and I would never want production access and cannot understand why any sane developer would. I do not want there to be any chance that the script I run to clean out the Customers table on the development database could ever be accidentally run on production. Forgetting if it encourages sloppy practices, even if your development practices are excellent, any sane developer would always want to be able to say with absolute certainty that they did nothing to hose production, even mistakenly, because they simply do not have access. It must be the sysads fault.
I've been in two bad situations.
1. Management was too cheap to have adequate development systems. They expected us to make up good test data on the fly. We never did good testing as a result.
2. We had SAP. The accountants had access to the production system (of course). They were allowed to develop their own programs on the production system (dangerous). I was merely the programmer, so I had no access to the production system, or the stuff they had already written (extraordinarily stupid).
I'm not asking for access to live data or permission to run code on live data, but don't give me bullshit data, or keep me in the dark on production code. Hey, wait, isn't this what they call a mushroom farm?
My other car is a 1984 Nark Avenger.
Never allow access to PROD. They will do their testing in PROD, then "promote" those changes to DEV. Seen it a million times. Have a QA/Test instance, script a nightly LUN snapshot from PROD, and let 'em have at it.
If I screw up, people can't get the correct pills. It's fun to make other people live dangerously. :-p
FTFY. Well, for certain values of "pharmacy benefit management system". If your production hacking can botch scrip fulfillment, please say what company you're working for so I can try to avoid it like the plague it is.
I don't know if Blue Cross Blue Shield has fixed this but, as of a few weeks ago (and this probably has existed for a while), living in EST has made it impossible for scrips to be fulfilled via insurance between midnight and 3AM. This is because, according to the late night pharmacist who is familiar with the issue, the servers are in PST and won't allow fulfillment from the anything but the "current day" regardless of time zone. Too bad the devs there don't understand time zones adjustments / UTC/GMT. Yet again, non-profit environments don't tend to attract the swiftest of folk in general.
boot the server to a ramdisk. That way you know it is byte for byte identical .put all configuration in svn and distribute it using cfengine or similar.you get guaranteed identical performance . An os image can be as little as 100mb using a normal distribution.
Deleted
For a one man show the answer is self evident.
For a small web company developing "brochure-ware" - probably more efficient.
For a small team it's ideal to have individual sandboxes - with one sandbox listed as "staging". Assign the lead developer to turnover code to production. Individual developers have access but are told not to touch anything. They will typically sift through live environment making sure it matches what is in their sandbox, looking at logs, etc.
For a mid-size team you need one person for maintenance (which includes monitoring nightly builds, responding to code turnover requests, managing automated testing). Even more critical if the code you write is compiled, fragile, or highly sensitive. - Individual developers don't have access to the live box - maybe the team lead will.
For large teams or small team "units" part of a large production shop : Several layers of "staging and testing" will exist. Code turnovers are mostly automated. Developers don't have access. Automated rollbacks are possible from a robust code management system.
The key is discipline. If you find yourself modifying live code - you're not disciplined. It means you're not willing to insert logging code and would rather pollute the production environment. There should never be a need to copy from production back to a sandbox (that is what version control software is for!) And version control files should never live on the production server (i.e. in Subversion you never do a checkout of code on the production server - you do an export instead).
Even with controls in place, there may be a tendency to "develop on production by proxy". Which means instead of re-creating the problem in development, the developer is saying "here try this, here try this, here try this". The team lead should recognize this and put a stop to it.
-CF
I'm a developer and the sysadmin for the webservers that run my code. I work at a mid-size manufacturing company and I'm kind of surprised to see there aren't more people filling both roles..
I went to eat some animal crackers and the box said, "Do not eat if seal is broken." I opened the box and sure enough..
While it would be lovely to have a perfect sys admin they don't exist. I would say about 1 in 10 problems in production are due to the admins not doing their part. I test like crazy but often times I find the following
- Crazy rules in production but not staging
- Mapped folders on production but not in staging or development
- Databases or Database permissions incorrectly configured
- Caching doing wierd things one does not see in staging
This only gets worse if you have load balanced servers etc.
I believe as a rule no developer should "develop" in production but having access to the production environment depends on your work environment and who would know best. Sys admins NEVER KNOW BEST when it comes to websites. They know their hardware but rarely do they know how to troubleshoot issues and almost always blame your app. However, at the same time developers often don't know enough about the technology their applications are running on so they instantly blame the sys admin.
Ideally, in a large scale deployment you have someone with the knowledge of both (that's me) who can identify and troubleshoot common issues. That individual would have access to production. Additionally, in smaller development shops not having access to production is just stupid. It doesn't make sense.
- Source Control is a great idea, but rarely used due to difficulty of use. Sometimes its not an option because someone has to support that.
- Financial limitations can reduce the ability to have shared development/staging areas. Working on your own machine is useless when testing. It does NOT test working in a production environment and thus... sometimes production is the test field.
We can play the blame game but in reality who has access to production should be limited to those trusted enough to not do stupid things without backing themselves up. Sysadmins should keep a backup of the production site at all times, developers should not mess with production unless it's urgent.
Every piece of specialized software will fail occasionally. Not talking highly productized software with a gazillion beta-testers, but the ERP integration layers and other type of software that has been more or less written for a single, or for a few customers.
No developer, and no amount of synthetical testing can ever cover all the possible angles in these highly complex systems the way the real world can. This is especially noticable for integration systems, highly dependent on external environment.
Some of this software is highly mission-critical, when it stops, business stops. In these cases, dev-access is probably the sound way to troubleshoot, and get things running smoothly again.
I usually work for large companies with QA/Staging processes. When someone suggest I poke the production servers, I REFUSE to even be given any password related to those. The argument being, we have 3 steps before an application goes live, if there is an issue, it's either a bug that hasn't been caught early enough or there's a support group who has the authorization to help in investigating.
If a developer must access production servers, something in the bug detection process failed and it's way too dangerous to have anyone probe them. Also, in many organisations, the data is sensitive enough to not have the common human being even have a glance at it.
I do work mainly with LAMP stack apps, and one major step that we've taken is to work more CI magic into our workflow. I *love* Hudson, and have it setup to do everything from typical testing duties, to jobs for pulling sanitized production databases back for testing. The cool thing is that I can give some developers access to certain Hudson jobs, and let them trigger the production dumps whenever they want.
I've even taking to setting up jobs that will spin up a VM, that gets setup with puppet, and then load the app with latest production dump, with parameters for the name of the environment. Now developers can even build their own testing/staging environments with a click, and everyone gets hassled alot less, and production sits alot safer.
Can developer recreate issue in development environment? If yes. No. Stop.
Can developers recreate issue in test which is loaded with production data? If yes. No. Stop.
If developers can not obtain information in development and test systems then absolutely they need access to the production system. But that access is to diagnose, not debug! There is a difference. Logging may need to be enabled and production may slow. Generally speaking, if the administrators have done a proper job of specing and maintaining the system in the first place, the system will survive additional logging being enabled.
Do they carte blanche access? Generally not. But if you have really good developers who are capable of using good judgment and/or have a good relationship with the production guys, there's generally no reason things can't be discussed and worked out.
Really, this is a question of simple problem solving skills and relationships. Is an article really required?
Having access to and mutilating the environment are two completely different things. Treating developers as hostiles by server admins doesn't create the friendliest work environment.
There is a big difference between a bug and the reason why the bug occurred - having access to a production environment is paramount to understanding the underlying issue.
In most newspaper sites the headline / lede / seo field / first graf may is usually programmatically brought in as the META description for SEO purposes (unless specifically overridden). It's a fairly common assumption that this field would be pure text and overlooked in that it doesn't need sanitization. Of course, it's also a fairly common consequence that some silly editor eventually breaks the site by putting HTML code in fields they weren't meant to house. You'd be surprised how many (even big) media sites fail to sanitize these fields.
Onto my point: having HTML (or faulty HTML for that matter) inside a HEAD description field seems like a bug. Sure, you can replicate the error by copying the environment and fix it by stripping anything unexpected out. However - that may not be the root of the problem. Thus, developers end up putting bandaids on a system and treating symptoms rather than curing the problem.
When copying this environment to reproduce the issue, one might simply grab the part that's affected, ignoring the user - CMS preferences which would actually end up telling you what the problem was. What a developer SHOULD do instead is poke around the environment, notice that this was a common occurrence with particular users, talk to the editors in question and shadow them for awhile to understand why this sort of thing keeps repeating.
In many cases, I've seen editors take advantage of programming or security holes to produce far richer and personalized web content than the original design or system permitted. By treating this problem out of the production environment context, the editorial side is completely cut out of the feedback and has no say in the matter and their creative outlet disappears with no dialogue. This in turn breeds hostility and distrust between technology and content.
I can assure you that in reality there is minimal dialog between developers, designers, project managers - and - editors. In my past experience access to the production environment was the only means of lateral communication between abstract technology and persons who use it. In my cases, I've been able to provide editors with features they hesitated to request to circumvent problems while still tightening various holes. This, in turn, improved everyone's day.
Look at it from a philosophical view- you don't want developers to be "writin' them codes" in vacuum of non-editorial space. They need to be a bit more intimately involved with the entire publishing cycle from start to end and not be ticket-solving space cadets when it comes to solving problems.
Developers: "Yes" Everyone Else: "No"
We test the new blockware logic on test racks before installing it in the plant but the devs have full access to the running control system of the refinery plant.
We can at any time 'monitor' the values of all function blocks in the whole plant, and change configurations on the live system.
While this sounds iffy at best, it is the only way to do it. You can only test things so much before they have to be put into the live system. Replicating the live system is nearly impossible without spending silly amounts of money. So far no major accidents have happened but there is no guarantee.
In a perfect wooooorld and so on .
... in a split decision, vi wins. Oh wait ... wrong holy war!
Que Deus te de em dobro o que me desejas
[May God give you double that which you wish for me]
If you don't have a development server that closely replicates the installed state and environment of the production server, you're doing it wrong.
You've built inaccuracies into your development and test system, differences from the production system that will result in differences in the completed code which will then encounter those differences and behave improperly on the production server.
I.e., you're building bugs in deliberately.
Stop that.
Now, I understand, sometimes scalability is an issue. You can't replicate a whole server farm for the devs. But you can isolate that variable and design for it from the first day. It's the other variables that you don't anticipate and don't have visibility into that you need to keep from varying.
Question and Answer? Or QA as Quality Assurance? :P
Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
Clones generally only duplicate installs and not data. The bug may be caused by a slow corruption of data on the prod server. It may only show up after weeks or months of use. It may only happen when certain data happens at a certain time (interaction with cron jobs). Sure one can clone the data too but when the data is multi-TB that will take time and money. Even at this point one can not clone the data flow into the server. Going into a prod server and adding logging lines should be allowed. (Sorry but the "log everything" solution is not a solution as one could create TB of logs in a single day on a prod server)
Access for debugging purposes:Yes
Bug Fixes: No (they need to be vetted first. If it is critical the vetting should have very high priority)
I told the new IT director that we needed a change management system.
What I thought it meant was a piece of software that would:
1. Back up the old website
2. Take the new website from a Staging folder on the Dev machine and move it to production, after hours,
3. Do some basic functionality testing (does the website even load,etc.)
4. Provide an one button restore function of the shit hit the fan the next morning.
What he thought it meant:
1) Fill out a piece of paper advising everyone that the application was going to change and who they could blame if it didn't work...two weeks ahead of time.
When Fascism comes to America, it will call itself Anti-Fascism, and tell you to give up your guns.
But I do need access to an exact copy of Production. One of my big annoyances was when I couldn't replicate a problem, then found out the data I was working with was different from the Live data that caused/revealed the problem. Or some supporting software had been changed in Production and I was testing with an obsolete version.
I don't want to mess with (or mess up) the Production stuff, but if my stuff is different from theirs, I can't guarantee that I can find their problem.
Slow down, cowboy! It has been 4 hours since you last posted. You must wait another few hours.
The Mother of All Bad Deployments over at Digg today. Everything is broken. Good thing they didn't publish during peak usage. Oh, wait...
The more developers work in production, the more they can ONLY work in production.
I'm all for read access (the more eyeballs the better), but actual access to change anything is a train wreck. The devs will forget to check the changes in to the source repo, or they'll check them in differently (bad copy/paste), or they'll check them into the wrong branch/tag. Regardless the next release that goes out silently adds the bug back into production.
And if developers think it's difficult to fully clone a prod environment configuration into dev now, wait until they try to do it after developers have been hacking on it directly for a while.
Pretty soon every release is a train wreck requiring tons of post-release tweaking and hammering to get it in place. Every release is a stressful mess as you're all crossing your fingers because you really have no idea what you are actually changing and no way to find out.
Just don't do it. Hire a good build engineer/release manager/software configuration manager that can sort out, automate, and track environment management well enough that yes, you can reliably clone an accurate representation of production in a matter of minutes. He'll cost you about as much as a good sr developer, but the savings across the board will easily dwarf his salary.
My
1. No matter how perfect your setup is, there will always be variations between prod and the most meticulously setup prod clone, shadow, backup, standby, etc... that will cause some (hopefully not many) bugs to present themselves in one environment and not in others
2. Data changes behavior -- Even if the web apps are identical, a small change to a database table could cause breakage in ways you never imagined. This can be controlled through proper layering and decoupling, but it always seems to creep up when you don't even notice it
Developers should never have direct access to production. The usual steps for solving production issues should usually be solved as follows:
1. If a support person gets the defect (either directly finding it or from a customer/user) then they look up in their knowledge base to make sure that the company doesn't already know about the problem and have a reasonable solution.
If the problem hasn't been logged and is outside of the technical competence of the support staff, it usually gets thrown on IT / Operations support
2. Operations support should one again verify that the problem hasn't been addressed before. Often problems will just continue to re-occur. Some problems just never get solved for one reason or another. Some problems -could- be solved, but the fix is just easier than a large amount of investment in making the problem go away. Often 'reboot the server' is a good enough solution for problems with low frequency occurrences and high complexity solutions.
3. Operations doesn't know the problem, so its time to poke around the logs, system resources, network connections, etc... A good network and systems management solution makes this problem moot. I've seen tons of production problems end up being the result of rarely used disks getting full with nobody bothering to alarm on them. Finally, ops throws their hands up in the air and are like 'WTF.. this is a development issue'.
4. This step is the hand-off between operations and development. I say hand-off, but in reality, this is when the two teams need to start COLLABORATING to find a solution. Too often either operations chucks the problem onto development's lap, or else development takes the problem and ignores operations until they find what they think the problem is or give up. The best solution in dealing with development related production problems are for both teams to work together and use shared knowledge and experiences of both groups to diagnose and resolve the issue. I've played on both teams over the years and I know quite well that Developers too often want to assume they know everything about deployments, servers, environments, etc.. Often that knowledge is diluted and ends up blinding them to issues that could be diagnosed at least in part with a network administrator / operations support. The worst situation is for this step is to get into the cycle of blaming one another for the defect. Tough to diagnose problems too often flame into blame games which always exacerbate the time to find, diagnose, and resolve an issue.
5. Development asks for information X, Y, and Z. If there's enough information to gather an accurate diagnosis, a solution or a workaround is devised. If its a workaround, the problem and the solution should be logged somewhere so that all parties (support/admin/development) can see what problems are outstanding and if the problem comes up again, the workaround to address it. If the solution is a new release / patch / etc.. the fix should go through the general release cycle as usual (though much faster for more severe problems). Some problems don't and may never have evident solutions. The trick here is to find an adequate workaround that keeps stakeholder impact to the minimum.
Developers should always have access to:
- Server / Database logs (Just don't put anything confidential within logs that can compromise customer / user privacy)
- Production IP's, and other data that is unique per release environment
Developers shou
Bye!
Long answer, it depends on the situation.
I (and a couple of my coworkers) have access to production servers, but we don't develop on prod. End of story. We have other devs who do not have access to prod. Dev is for dev, prod is for prod and don't let anyone without the discipline to keep that rule have access to prod.
LK
"Hi. This is my friend, Jack Shit, and you don't know him." - Lord Kano
Reading a lot of comments it looks like there's a wide variety of definitions for some of the job titles and roles people are discussing here, so I'll list how I see them:
* System Admin - Person(s) responsible for the hardware and supporting (OS, Web service, code language and client libs, JVMs, etc) software. They do not in any way support the applications running on said system and would be incapable of debugging or supporting an *application* problem even with a gun to their head. Most can only describe 2-3 sentences of what the applications even do. They do not report to or answer directly to the application teams. They also do NOT install application code.
* Database Admin - Only want to address roles here. At every location, the actual application data stored in the database is NOT the role or responsibility of the admin. It belongs to the application team and any changes are their job and their accountability. The DBA only deals with schemas, packages, procedures, scripts, access roles and grants, etc. DBAs should NOT MANIPULATE DATA. Asking or allowing them to do so opens up a never ending blame game and is counterproductive. If you want to create some title and role within the application team where all data manipulation funnels through, that's the way to do it.
* Implementation Specialist (Code Migration) - Trained monkeys who are supposed to follow a set of pre-delivered instructions for deploying application changes. In my experience their technical knowledge is limited, they cannot verify copy/paste correctly, and screw up (transferring ZIP files in ASCII instead of binary) more than they succeed. I don't feel this position is even necessary. The PROCESS is necessary and it can be performed by anyone, even a developer, as long as they switch their role hats before starting and are held accountable for accurately following the deployment instructions given.
* Production support - They act both in a technical and relationship role, being the contact point between the customer (internal or external) and the application team when issues arise. Generally have read-only access to production. They are able to debug many problems and resolve a few, but definitely not all of them. They do not participate in any part of the development lifecycle processes.
* Developers - Not going to discuss or debate any pre-production roles here since it's irrelevant to the topic. Developers are the only ones I would be confident could debug ANY problem. They are going to need some reasonable level of access to production, logging, or information if you want to have an application that can maintain high availability and recover quickly from any type of outage.
If your definitions to these roles differ significantly, then my answer for your company's situation would change.
Depending on the size of the application and the team allocated to run it, I've performed up to 4/5 roles and was pretty much the 5th as well since the Sys Admin only could barely squeak by supporting Windows 2k and definitely had zero knowledge of any of the supporting software. Are you going to hire someone to do 6 hours of work per month just to separate the responsibilities? Of course not. So the OP's generalized question is open to a million different interpretations because of all the different variables that weren't specified.
My most recent application team recently went through our production lockdown after finally migrating over an application suite purchased from another company. Developers and Prod Support have read-only access. Database passwords used by the applications have been restricted down to just a couple individuals. When changes need to be made to either the application or data, an Emergency ID is checked out to a requesting individual with the appropriate access level,
To me this is the crux of the question: I have seen developers who were perfectly capable of managing individual servers due to long experience and having to perform DBA duties in the scale of the systems they have dealt with. I have also seen developers who know a language but complain about the SQL server when they write queries that run like batch processes through lack of understanding about the way the systems that they are writing generic SQL into work.
The first type I would probably allow access to the production system - provided it was not widely outside the developer's experience and did not have an uptime requirement meaning that it had to be strictly controlled and tested, the second is type is exactly why I would never allow a lot of developers near production systems - small scale or not.
Source control is difficult to use? Where in the world are you working? Even the smallest of shops have access to free source control, which requires minimal administration. How much maintenance do you thing is needed in, say, a mercurial repository?
Hell. No.
I'm a developer as well as a sysadmin and I NEVER tweak anything in production and I have full access to it.
I have an exact copy of my production environment for development and I do all my tweaking/test deployments there.
In fact nothing gets deployed to production until everything has been checked in development.
My previous job had dev/qa/prod environments where the devs had full access to development and it was so bad that we had to virtualize it for them just so we could revert back to a pristine snapshot whenever they jacked up the dev server.
We don't give our fixes to a trained monkey we give them to System Administers
- you have Ministers that systematize ads?
You can't handle the truth.
No. Do not give developers access to the production machine ever, except me...just this once..
The Kruger Dunning explains most post on
Marketing should have limited access to THEIR OWN data. Never has any other department required so many restores...
I have systems which go out to site, and sometimes, I need root access to them because we simply don't have access to the hardware in our R&D lab, and need to test code on a production system.
Note "intensive" purposes and "begs the question".
It's deliberate fingernails squeaking on the blackboard.
I'd fire you for introducing security holes and bypassing all procedures.
Modify access? Absolutely not. Read access? Yes.
Because who is management going to run to at 3:00 in the morning when something isn't working? Not the system administrator - the developer. Let them see the logs. Let them see what files/versions/timestamps were deployed. Let them see what else is running on the box. And let them do it without giving instructions over the phone to some admin who is sharing their console over GoToMeeting or something. Or this is going to take all freaking night.
As a side note, how many times has a developer been dragged in to troubleshoot why "their program quit working" only to find that the real problem was something like OS updates were applied without being tested, or a new virus scanner was installed, or system X was installed on the same box?
Marketing should be walled off from everything but a large, caged enclosure in a sub-basement, with the only other external access being to a restroom door marked, "Beware of Hungry Jaguar."
@Mindless Drivel: 100% of Twitter posts ever Tweeted.
The people running test environment and production environment are usually different groups. Communication needs to be clear in both directions so that the software levels are the same.
Blar.
Everything isn't a webapp/DB solution. Our networks are highly segregated for security. Who is in charge of making sure that 'different DB' has relevant and correct data in it? Who makes sure that the cloned prod image doesn't have sensitive info in it? Shops serious about security would laugh you out of the building for suggesting what have post.
Blar.