Should Developers Have Access To Production?
WHiTe VaMPiRe writes "Kyle Brandt recently wrote an editorial exploring the implications of providing developers access to the production servers of a Web site. He explores the risk introduced by providing higher level access as well as potential compromise solutions."
Whenever an error occurs that I can't replicate in a dev environment, I'm always SO tempted to hop into prod and start adding in some output statements.
Yeah, it's probably a good thing I don't have access to prod.
Is that marketing definitely should not have access to production. They ... they wiped out the site a couple months ago, while trying to update an image.
LOL! No.
All we do is run scripts and get in the way of developers trying to "git'r done!".
No. It just encourages sloppy development practices.
Would you want to drive over a bridge that wasn't actually designed and engineered, but rather they just piled some stuff up and will fix it if it collapses? Or have a surgeon chopping you open with the idea that they'll figure it out as they go? So why would we want developers to work with the expectation that they get to intervene at the last instant to resolve their failures?
If you deny developers access to production servers, then don't be surprised if your web sites end up with some backdoors. Not saying it is right or wrong, but you're pissing in the wind if you think you can cut off the developers.
It is my experience that giving development access to production gives you a production environment that looks like it has been vandalized. Although meaning well and trying to make the best application as possible; they need their own development lab, and their own staging / production lab.
No.
If needed there should be a mechanism to automate bug reports in a meaningful way, as most professional software has.
hell no.
If you want to have control over your production code, you need to have assurance that it is not changing in an uncontrolled fashion. Allowing developers to have access to production locations makes it all too easy for this to happen. Read-only access allows developers to see the running code and perform file comparisons which can be useful in troubleshooting. They should never need more than this.
And in some cases, even read access can be risky -- I've seen production web sites with resources linking back to development server URIs. It's a good idea to firewall your production servers in such a way that it is not possible for them to reach resources on development servers. This shouldn't prevent developers from being able to read the files on the production server, though.
You see? You see? Your stupid minds! Stupid! Stupid!
Everyone agrees that developers should never have access to production...Unless they're the developer, in which case it's different.
Its a good practice to keep them separated, but in the end its just a pissing contest. The server admins don't want some filthy dev messing with their stuff, and I can appreciate that.
However, admins often lack appreciation of some dev-specific issues, and their ignorance can lead to problems down the line.
In the end, its the best practice to have everyone work together sensibly, than throw down inflexible rules that cause more trouble than they prevent.
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
So, yes.
This is why there is a change control process, and a testing environment.
If you're doing it wrong, you're asking for trouble.
The price is always right if someone else is paying.
There's no correct answer to this question. It depends on the size of the organization and the nature of the system. I've worked in different companies that have been on either sides of where I thought the line should be. The line is drawn in a very different place for a 20 employee company than where it is in a 20,000 employee company.
The day developers can write code that compiles the first time, then yes, otherwise, jesus, no.
I work as an Oracle DBA for a mid-size company, and I provide a day-old cleaned copy of production in a different environment/box, and it does the trick.
Hi, I Boris. Hear fix bear, yes?
I think it's helpful in analyzing real-world data and getting an idea about real system loads, testing issues to see if they are in the wild today, etc. For a good developer, it makes life much easier.
In a very healthy development ecosystem all this data is replicated and there is never any need for a developer to touch prod. In the development ecosystems that exist in the real world though, most are very unhealthy, frustrated by ham-fisted security, process flaws, red-tape, inconsistency, and incompetence ranging from scattered to mostly cloudy.
The answer is, do you have the class of developer that knows what not to do and desires to play nice, or do you have the usual.
As a developer I can tell you that it's impossible to test programs properly and thoroughly without access to production data. However, developers should NOT be granted access to production logins/sites - production data should be copied into development work areas so that developers have an appropriate "sandbox" in which to work/test.
And the concensus is ... NO
Who let this question through? It doesn't even seem controversial. I am not aware of any good reason to routinely give developers access to production.
Censorship is obscene. Patriotism is bigotry. Faith is a vice. Slashdot 2.0 sucks.
I dislike blog postings on Slashdot as a rule - they can get a Slashbox like everybody else - but the arguments made in the article are well-reasoned if somewhat short on detail. How do developers troubleshoot in a production environment? The article acknowledges that troubleshooting in production is necessary and mentions the installing of software, but installing software alone changes the environment (generally a bit of a no-no for debugging, due to Heisenbugs) and debugger hooks can pose a potential security threat (a big no-no for sysadmins). Further, there was no discussion as to whether developers should be the ones troubleshooting - first rule of testing is that you should never rely on programmers to test their own code. They're way too close to it. Either have testers or have programmers test other programmers' code. It is the only way to ensure that there's proper coverage of sufficient corner-cases.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
As a general case, I say that no, developers should not be given access to production. While giving us access to production might seriously speed up the resolution of an issue, in my experience, it always eventually introduces more problems than it fixes. It also tends to create an environment where testing is devalued because it creates the perception that any issues can be quickly resolved in production. This encourages management to compress timelines and causes the dev team to waste a lot of time fighting fires.
The best environments I've worked in have a fully replicated "break fix" mirror of production that can be used to test and rapidly deploy emergency production fixes outside of a normal test cycle if absolutely necessary.
Biggest issue my cow-orkers and I have is that the sysadmin *claims* that the dev box and production box have the same packages, configuration, etc. but in reality, they don't. Most often we find out when we ask for production stuff to be copied over to the dev site to test errors, etc. and just loading it - which works on the live site - generates errors on the dev site.
Don't blame me, I voted for Kodos
I use error handling to give me as complete a picture as possible of stuff on production. I don't want anymore access to production then I absolutely need.
~~ Behold the flying cow with a rail gun! ~~
Yes, certainly developers should have access to their production machines.
No, they shouldn't be allowed to do anything they want with them.
Troubleshooting application breakdowns are much easier for the developer to do. Thus, the access should be limited to logging data, etc. Unless the admin worked on the application itself, diagnosing those kinds of issues through someone else can be extremely difficult at best.
Non impediti ratione cogitationus.
If you are a small software shop then I can see reasons for allowing your small technical staff to have access to production. It's all well and good saying that only the admin of that server should have access and there's a full rollout procedure in place to be followed only on certain days, certain times; but even when I've seen that sort of structure in place there are times when it's useful for the developers to have access to production. Nothing is perfect and we'd all love to have multitude's of staging servers, replicating the typical load and uses of production but for a hell of a lot of (non critical I'd add) systems that just doesn't happen.
There simply is no one rule fits all. Sometimes I wish we had extremely rigorous rules & regulations in place - I'd probably get to go home a hell of a lot earlier. I'm not suggesting you start chucking exceptions all over your checkout code on live but I think you should asses your own situation (and staff for that matter).
jaymz
With some fore-thought and some discipline an application can be developed with very robust logging techniques. It takes development time, but there is nothing cooler than asking the production guys to turn the logging detail up for a few packages and seeing tons of data in the logs. It's not perfect as you can't log every variable at every moment but it certainly does help.
I understand some shops can't or won't modify the logging levels on production servers.
Blar.
I work in an environment where the devs fix bugs before adding features, so the code is stable almost all the time. I have less than 1 callout a week that's caused by something a dev has done to the code.
We hire the best devs, and work in an environment where fixing bugs is more important than adding features. The result is that our devs get full access to production, and even offer to provide support in order to ensure that they're the ones that are woken up if something they've broken falls over OOH.
I've been at my current company long enough that I'd forgotten there were places where devs and ops didn't trust each other.
If you are running into any issue that requires the developers to have access to production then you have much bigger problems than access control. Developers should need access to development servers only (which really should just be there local box or a set up identical to the supported configuration if you need to test things like clustering or different platforms). Developers should not even require access to testing environments. If you have valid contracts and adequate testing then the only issues that should get to prod are environmental issue, things that can be handled by administrators.
On the other hand, denying your developers access to anything, be it production servers, IM access, youtube, is just asking for them to circumvent the system. So your developers should never need access to production servers, but I wouldn't waste time trying to lock them out of it, or else they will work around those locks if it turns out that they do need it (because your process failed).
Is that font illegible to anyone else? I had to turn Readability on, it was so bad. Who the heck thought it was a good idea?
Sysadmin: It's working now; what did you change?
Dev: nothing.
Sysadmin: (sigh)
I am a developer. Our environment team is practically retarded. I have to go on-call during DR tests because it is too complicated to restore an image and double click an icon. God forbid they have to install a App Server, or configure 35+ JBoss instances (default is 1 instance per box) to start, tune for memory usage or performance or both, etc. Just last week they decided to upgrade the os to 2008R2. no need to test anything right? Sure all of the code is 32-bit but that wont have any implications will it? Trust me I would much rather not have access to prod, but as the saying goes "Better us drunk, than them sober"
Developers should have read-only access to production. In this way, they can investigate what is happening but should on no account have any ability to alter anything.
Speaking as a developer, I want/need read-only access in production. All too often I need to dig out information while troubleshooting, and most commonly I don't know what all bits I'll need when I start. If it were easy to identify exactly what I'd need to find the problem, I usually already know what the problem is. The hard ones are the ones I can't replicate in development and I only have a starting point, something that won't identify the problem but might help me narrow down where to look next. In those cases the only place I can look is production (since I can't make it happen in a controlled development environment) and I can't give the admins a list of what I'll need (because I need to dig through logs and config files before I'll know what I need to look for next). And if we've gotten to this point, it's probably a priority problem impacting production so it needs to get fixed Right Bloody Now.
OTOH, while I may need to look at production, I don't need and don't want the ability to modify production except by going through the admins. This, of course, also requires admins who can follow basic instructions like "Look at config file FOO. Find the line in section X that starts with Y. It's value should be XYZZY followed by the number 1. Change that 1 to a unique number for that machine/instance. Repeat this for every machine/instance.". But all too often the response is "That's too complicated. Can you just give us config files to install?". And of course when I ask for the current config files, so I can be sure I'm not overwriting any other modifications to them (which may have happened since the admins control them and do modify them), I get "We can't do that, they've got production passwords in them.". Now all I can do is throw up my hands and go "Whatever.".
I develop mostly internal apps for a very large company, but even for the external ones I'm the one who moves files to production. It's not that way for every department, but for ours it works. Better than waiting half a month to get a type-o fixed.
It would be nice if I worked with support people who knew what they were doing. I don't have access to certain environments but if something goes wrong I'm supposed to fix it somehow. But then again, I work in a robot clone environment where software development is some sort of alien concept. I need a new job.
Under no circumstances are any units in a company to have even contact with each other, much less share work product. This leads to unacceptable things, like collaborations over lunch and generally helping each other out and making a more efficient company. If we have a more efficient company, that may mean we have to lay off even more employees, and this cannot happen in this economy because we'd then pass reporting requirements for layoffs and be subject to higher FICA taxes.
Just because you're paranoid doesn't mean they aren't out to get you
It's not necessarily a case of the admins versus the developers, its more of practicing good data governance.
Our developers used to have direct access to all of the production databases. This was bad enough, but because of this the organization permitted them to directly "clean up" databases (meaning they wrote to tables directly), we had data that was being changed without the ability to really know who did it. The DBAs hated it and the developers were extremely uncomfortable doing it but it happened anyway. We eventually had a real process audit and the auditors had a field day.
Needless to say we changed. I hope.
Before I would have said "at least read access", since in my experience the bug reports are usually very inadequate and you need to know exactly what the user was seeing and any settings/configuration made in production. Write access was already rather iffy before, and now with most servers being virtualized the best way would be a fast track to create a new clone of production for the ugly cases. We used virtualization heavily at a client I was at, they originally had two environments, test and production. We did a major upgrade, and at most we had 5 environments:
1) The old production, ready to be resurrected in case of OMG problems
2) The old test, used to verify upgrade results (not old prod as we didn't want people making changes there by accident)
3) The new production, obviously
4) The new QA, where the customer was doing regression testing
5) The new test, where we kept working on the next delivery
Being able to suddenly scale up to five environments - eventually down to two again - was brilliant. The cloned it, changed a few IPs and away it went...
Live today, because you never know what tomorrow brings
The author/owner of an application should be on the hook for keeping it running and for it's failures. To separate these responsibilities creates perverse incentives and encourages fire and forget development with no thought to future maintenance and troubleshooting. At the same time, to discourage the practice of 'keeping things going by kicking them', access should result in a detailed audit trail, which would be necessary anyways for regulatory compliance.
This doesn't work very well without other arrangements in place, namely, a standardized version control and deployment system, hardware as a service and fleet maintenance systems and a hetero-generous service based architecture.
Realities just a bunch of bits.
Super simple question: yes, they should have read-only access.
Unless you are concerned about privacy issues, but then you probably solved those for your sysadmins too, so no biggie.
In my experience, programmers with production access cause more visible problems but have substantially higher productivity (6x to 8x).
My company has previously had unbelievably tight controls put in place as a result of SOX which added a 45 day overhead to any change (except emergency changes) regardless of size (which means small projects were no longer approved- only home runs).
Now we are going to SAP. All that is gone for now-- productivity is off the charts. I'm sure they will start locking it down after we get the first production environment settled but it is nice to be productive again.
She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
I found it's best to have the admins restore a copy of Prod to test or dev then reproduce it there and fix it. Updating directly to prod or debugging against prod should always be a last resort.
I am one of the few people who can run correct code the first time round. I am also proficient enough in OS matters to be able to circumvent access to locked down resources. So I don't care what this post says, I'm doing it myyyyy waaaaay.
I work with world class developers and an equally competent team of operations folks. The amount of disconnect between the 2 sets of folks is amazing. The developers black box stuff out of their consideration (e.g. setting up load balancers, with or with out affinity, not littering certificates all over the place, the amount of privileges a service needs etc.). The operations folks ignore other aspects (a cache that's hard to build could be lost after a process recycle, not version controlling their ad-hoc queries/sql jobs etc.)
Even if I take out considerations of giving developers access to customer sensitive data, the mere fact that most developers assume that a complete clean reinstall is as trivial as going back to a previous VM image (wrt time considerations) makes me pause and not provide them access. Add to the fact that developers talk in logical terms (regardless of scale) while operations talks in physical terms (actual machine names, drives etc.) and watching them communicate is like watching 2 blind men describe an elephant to you.
Our team makes it mandatory for developers to request for clean concise information from operations who procure it on their behalf. Yes it is slow, yes, it makes the developers having to batch their queries together but I can't imagine doing it any other way right now.
Not only no, but hell no.
I'm a sysadmin who used to work at a web development company. As a one man team I managed ~40 servers, 8 of them being in production web servers hosting 200-300 sites each.
Web developers should not be allowed anywhere NEAR a production server. The last time I let one onto one, I spent the next day and a half fixing what he broke.
On the flip side, sometimes developers will just flat out need access. In this case, at least in my experience, a clone does the job just as well. You just need to have a couple servers sitting around specifically for development use, and then have a way to clone machines to this hardware in short order. In my years of experience I have yet to come across a problem that absolutely needed to be tackled on a production server.
Can anyone tell me why 99% of
I worked at a small company where I was the sole developer, and had access to the production system. I was able to make changes and roll them out quickly, and only once or twice did I screw something up (and I was able to fix it right away). The problem is that users started coming to me instead of the sysadmin when they had problems. Then the sysadmin/tech support guy got all butt-hurt about it and declared that he would no longer support anything I wrote. As a result, I ended up having to spend way too much time teaching users how to use their computers (half the time it didn't even have to do with any code I wrote) and didn't have enough time to perform my primary job function.
I'm sure you'll say I should have just refused to provide tech support. But when you work for a small company where half the employees are either family members or personal friends of the person who signs your paycheck, that's not always a possibility.
(learned recently in my BS-Information System Security program)
I worked as an IT auditor for a very big public accounting firm. Reviewing IT controls was a key part of the financial audit (and more so now with Sarbannes Oxley).
If I found developers had access to production, it was automatically a "no reliance" finding.
This means the financial applications are inherently untrustworthy that the financial auditors would have to review original source documents for validation.
"No reliance" meant the audit became much more expensive as a result.
Also - if the auditors can't rely on the financial reports, should management?
If your children ever found out how lame you are, they'd murder you in your sleep
If you have a sysadmin staff that can provide a clean test environment no more than 24 hours out of date, you should treasure them. It's very rare to have sufficient skilled staff and duplicate hardware in the Real World [tm]. It's extremely commonplace to pretend to have a valid test environment, but after 40 years in the industry I can tell you it's usually just self-delusion or a sham for fooling auditors.
Even when the suggestion of "would you like root on this internal box?" was put to me, my answer was always "No". I write code. Others test it. Admins deploy it.
People specialize for a reason. If you want half-assed administration, give root to a developer. If you want half-assed code, let admins write software. If you want half-assed testing, have admins and/or developers do it.
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
If I found out a developer changed something in a system I tested without it going through the proper process...
Let's just say I would be very interested to hear why they shouldn't go back and rerun everything again on THEIR dime. (at the very least) In fact, we DID do just that to someone who let a revision slip into their UUT because a developer felt it would fix something and make it perform better.
It wasn't too expensive of a mistake, just $250,000 to rerun that portion of the test. Although that was just the physical cost of performing the test. I don't even want to know how much it cost in labor especially considering it was a 22 day test.
Even if the change was removed, how do I know that without physically verifying checksums (do I even trust it anymore since their CM process is obviously flawed)
Out of modpoints but really liked a post? 1BDkF6TtmmeZ3yqXbz9yhdYVqRYnwFoXDj
I'm a developer, and I have to say... HELL NO!
.. no one should have to access to production servers for anything other than pure upgrades and if necessary to read logs and inspect monitoring programs. If your organisation can afford a decent test server with the same basic hardare as the poduction servers you should simply clone it to the production environment.
In reality of course there are always mysterious problems that only seem to happen when a system is put into actual use. To catch as many of those a possibe, and without having to resort to panic changes in your production environment, both the test and production servers should have a decent set of monitoring software and the ability to produce as much logs as possible.
If you do all this and still have problems developers might have to look at the production servers. But any changes should be addded to the test server first, both to test it and to make sure that the test server is always updated when the production servers are. Once the fix is ready whoever is responsible for the production servers should approve the changes and make the update. This also makes sure at least two people know about any change made to the production environment
If you organisation is really small I don't really think it will matter is your developers have access or not. Then you should just give access to whoever you feel can handle it on a person to person basis.
Why would you need 'cover' if something expected happens?
I do not want to work there...
Of course developers should have some level of access to the production environment. No matter how good your test environment is, it's not going to match the live server in load, or what's in cache, or the concurrent access to some resource, etc.
Our process was to have one person with access, investigating whatever problem via the SQL command line, or the Rails console (let the RoR jokes commence), with another person watching, to make sure they were doing select * and not update or delete. Even then we'd execute stuff in a transaction or sandbox so that we weren't making any permanent changes, although changes to memcache generally can't be rolled back so easily.
I've seen admins, who are adamant that dev not be allowed to change anything, change psql configurations at a whim, crippling DB performance. And then blame dev for poor response times. That's so not cool.
As a former developer, I always wanted access to the prod systems because it made my life easier. Then there was an outage and I was blamed. A month later, it was determined to be an intermittent hardware issue that caused the issue, but that month people wanted me fired. Since that time, I've built into my system multiple log levels and traceback messages that would get me to the line of code where an issue happens.
Now, as a technical architect, I do not allow developers any access to prod systems - NEVER. Not even to install the program. See, when developers have access, they don't get forced to write documentation and the docs aren't validated by another person. Run support teams need to understand the programs and if the dev team does all the work, they won't.
Now you have a single point of failure - no business wants that. Yes, it is more effort and less efficient to split these roles, but you'll sleep better at night. Trust me.
Having lived with this scenario for about a year, I'd say that the biggest issue is not whether devs have access to production, but rather HOW they have access to production, and what the change-management process it.
Access to allow them to push updates from a proper repository to the live server: OK
Full SSH/root access to the production boxes. HELL NO.
I can't think of how many times my head nearly exploded because a dev pushed some massive freaking update to a live server and torched the box. We're talking uploading an entirely new app, without proper stress-testing, to a live in-use server. End result... leaks, bugs, and the live site explodes. Sometimes it can take hours or DAYS to figure out why resource usage suddenly jumps through the roof, and all the time the way the dev talked the system had been in place for months (being fairly new to that company, I wasn't really sure myself).
Oh look... thousands of SQL queries locking the tables due to unoptimized database tables... OK lets sift few a few THOUSAND files trying to find where that little query is running from...
Again, the big issue wasn't with devs having access to production. It was with devs having privileged access to production, and either
a) Messing with installed modules/software/whatever... that's an admin's job
b) Installing updates without checking, and without notifying other departments.
Similar issues ensued with even just basic updates in cases where there were unchecked bad links to internal-only testing URL's, etc. Of course, when stuff was broken, nobody would be willing to take ownership for the mistake, which meant that IT spent excessive time tracking down the changes to fix them, and then the offender(s) have gone home for the day several hour ago while the admins are spending tons of (salary... unpaid) OT tracking down why the server crashed-and-burned.
A change-management system does fix a lot of that. Something as simple as "git" should be enough for many situations, and easy enough for people to understand. It allows you to revert a major code-screwup, isolate when bugs occurred, and find which people are submitting unapproved updates that break servers. If you still need to restrict dev access to a production machine, then only allow admins to do the "git pull", upon approval of the edits made by dev.
There's plenty of arrogance to go around between development and sysadmins, and often some overlap between the two roles. Having a change-management system and proper access controls adds a big factor of accountability and rollback to things, as well as helping to prevent people from clobbering each others updates.
http://www.reddit.com/r/blog/comments/d33x7/reddit_is_hiring/c0x7aq5
I tend to feel the same way. I don't feel like developers who don't understand their production environment, or system architects who don't understand the code, can be fully successful. As a lead developer for a fairly high traffic site, I thought a lot about how the code and hardware interacted, and what the various limits were and how things degraded.
If only the operations folks could handle the design and implementation of indexes and otherwise handle optimization / speed related issues, then the devs would not need access to production servers. There seems to be no frictionless way to balance.
"Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
Like people have said it all depends on your situation. What's horrible is if you have a system admin that doesn't understand that.
I don't understand release management philosophies that remove people with the most domain knowledge away from their apps in production. Let's face it, shit happens and having access to quickly implement a fix or workaround to minimize or prevent customer disruptions is vital.
Virtual machines are easily cloned.
in reality all those procedures get thrown out the window as soon as the customer starts complaining load enough or the budget get tight enough
The project i'm working on right now is a prime example, the customer has a data aggregation system running on live data feeds, the wanted full fail-over capacity, which currently isnt working, there are four replica environments, none of which actually is a full 1:1 replica of the production systems (and im not just talking hardware here, the software is out of sync as well), now we needed to add some functionality, but it had to happen fast and cheap. Off course the existing codebase is the worst i've seen to date and i keep tripping over bug after bug.
Long story short, not only am i developer, i also ended up soully responsible for testing/test reports (a role which is normally cleanly separated at our company, for good reason), and doing deployment on machines i hardly know anything about (and i am not a linux admin to start with, i hobby around a bit... but thats it) and the day before the live deployment deadline we were still discussing basic system architecture, since apparently no architect had actually anything to do with the design i was given six weeks ago...
I kept telling them (management) that the deadline was undo-able, but the customer just complained loud enough, and we kept right on truckin untill twelve hours before deployment management finally woke up and realized it wasnt gonna work... by that time just about every rule and procedure we have had been broken
and now i'm stuck as the only guy still on the project still trying to build what we originally promised, fixing a gazilion bugs in the existing codebase, doing testing etc...
(posting anonymously, you never know who is reading)
If you can't take the time to make an ECO, you've got no business mucking with the production server.
:(){
I do not think developers need access to the back-end of production servers. Being able to read the production logs, and having produciton code that can spew meaningful errors should be enough.
Be able to set a debug=true, or --verbose flag in the code, that spews a lot of information, as needed.
For a mid to large environment, web server type stuff, it is my opinion to go a route like this:
Ideally have the developer develop as its own version tag (cvs or such) on their work station in isolation, then move to a dev environment to vette out any gotchas.
Once vetted, then have the frozen tagged updates applied to the staging environment, which should be 'as close to as what is on production' as possible. If all goes well, great: becomes part of next release candidate.
Put tested release on a subset of production.
Once that seems to be doing well, then migrate production wide. (Certainly developers should have read access to server logs, as a good prod environment can even send prod logs to be duplicated to somewhere 'safe' for analysis.)
Uh, Linux geek since 1999.
...and even then, with extreme caution
There have been MANY when there were issues that ONLY occured in production where we had to point our local development environments to production databases, servers etc.. to debug the issue. 9 times out of 10 it's an environmental issue. Something an admin changed on a server setting or some person changed data in the database that caused the issue. But without being able to debug against production you wouldn't find it.
The Truth is a Virus!!!
Comment removed based on user account deletion
I am a developer and I would never want production access and cannot understand why any sane developer would. I do not want there to be any chance that the script I run to clean out the Customers table on the development database could ever be accidentally run on production. Forgetting if it encourages sloppy practices, even if your development practices are excellent, any sane developer would always want to be able to say with absolute certainty that they did nothing to hose production, even mistakenly, because they simply do not have access. It must be the sysads fault.
I've been in two bad situations.
1. Management was too cheap to have adequate development systems. They expected us to make up good test data on the fly. We never did good testing as a result.
2. We had SAP. The accountants had access to the production system (of course). They were allowed to develop their own programs on the production system (dangerous). I was merely the programmer, so I had no access to the production system, or the stuff they had already written (extraordinarily stupid).
I'm not asking for access to live data or permission to run code on live data, but don't give me bullshit data, or keep me in the dark on production code. Hey, wait, isn't this what they call a mushroom farm?
My other car is a 1984 Nark Avenger.
Never allow access to PROD. They will do their testing in PROD, then "promote" those changes to DEV. Seen it a million times. Have a QA/Test instance, script a nightly LUN snapshot from PROD, and let 'em have at it.
Hardware is cheap.
Developer time is less cheap.
Maintenance costs kill you.
Yes, you should have a test environment - several, probably, but you should engineer the system properly up front so that you have knowledge of what the production limitations are. You should also instrument your whole application so that developers don't need to add debugging code to the live system - they just need to look at the instrumentation and they should know what all their variables are doing and how many resources everything takes.
If you're failing to do that, you're failing at maintenance.
If I screw up, people can't get the correct pills. It's fun to make other people live dangerously. :-p
FTFY. Well, for certain values of "pharmacy benefit management system". If your production hacking can botch scrip fulfillment, please say what company you're working for so I can try to avoid it like the plague it is.
I don't know if Blue Cross Blue Shield has fixed this but, as of a few weeks ago (and this probably has existed for a while), living in EST has made it impossible for scrips to be fulfilled via insurance between midnight and 3AM. This is because, according to the late night pharmacist who is familiar with the issue, the servers are in PST and won't allow fulfillment from the anything but the "current day" regardless of time zone. Too bad the devs there don't understand time zones adjustments / UTC/GMT. Yet again, non-profit environments don't tend to attract the swiftest of folk in general.
No, developers should not have access to production except under very specific circumstances where it is truely a need (such as critical, production only bugs that have been escalated). This is partially a stability concern, but even more largely a security concern. The ideal is that most system admins should not be able to access production data without going through the software and software developers should not be able to access production without going through the admins. This seperation of powers tries to ensure that a developer and an admin must be working together at minimum to compromise the system.
Stability is kind of a minor issue in the whole debate. If you have good developers and bad admins, the developers won't be doing stuff they shouldn't in production and will avoid stability issues that would be caused by the sloppy admin. Conversely, if you have sloppy developers, even the best system admins won't be able to keep the system stable. Ideally both do their jobs well and only limited developer intervention is ever neccessary if at all.
For full disclosure, I am a senior developer at a fairly small company that gets double dipped in to admin duties as I have training and experience in both areas, but I try to keep as much seperation as I can and hand off as much as possible to our dedicated admin staff when I can. At the end of the day, if there is ever a security breach, it isn't a fun place to be in when you have the most access to the system of anyone.
boot the server to a ramdisk. That way you know it is byte for byte identical .put all configuration in svn and distribute it using cfengine or similar.you get guaranteed identical performance . An os image can be as little as 100mb using a normal distribution.
Deleted
For a one man show the answer is self evident.
For a small web company developing "brochure-ware" - probably more efficient.
For a small team it's ideal to have individual sandboxes - with one sandbox listed as "staging". Assign the lead developer to turnover code to production. Individual developers have access but are told not to touch anything. They will typically sift through live environment making sure it matches what is in their sandbox, looking at logs, etc.
For a mid-size team you need one person for maintenance (which includes monitoring nightly builds, responding to code turnover requests, managing automated testing). Even more critical if the code you write is compiled, fragile, or highly sensitive. - Individual developers don't have access to the live box - maybe the team lead will.
For large teams or small team "units" part of a large production shop : Several layers of "staging and testing" will exist. Code turnovers are mostly automated. Developers don't have access. Automated rollbacks are possible from a robust code management system.
The key is discipline. If you find yourself modifying live code - you're not disciplined. It means you're not willing to insert logging code and would rather pollute the production environment. There should never be a need to copy from production back to a sandbox (that is what version control software is for!) And version control files should never live on the production server (i.e. in Subversion you never do a checkout of code on the production server - you do an export instead).
Even with controls in place, there may be a tendency to "develop on production by proxy". Which means instead of re-creating the problem in development, the developer is saying "here try this, here try this, here try this". The team lead should recognize this and put a stop to it.
-CF
...but read-only and with sensitive data obfuscated. Some bugs are just not replicatable in dev and/or not enough information is present to start with a lead. I work for a top 3 global game developer and have access to salary info (obfuscated of course).
Tired of my customary (Score:1)
The question is what type of developers you have.
If you have professional and experienced developers - the question should be : should they give access to production to anyone else.
If you have newbies out of college, or someone who did not mature into a responsible professional even with years of experience - you should not give them access.
I'm a developer and the sysadmin for the webservers that run my code. I work at a mid-size manufacturing company and I'm kind of surprised to see there aren't more people filling both roles..
I went to eat some animal crackers and the box said, "Do not eat if seal is broken." I opened the box and sure enough..
While it would be lovely to have a perfect sys admin they don't exist. I would say about 1 in 10 problems in production are due to the admins not doing their part. I test like crazy but often times I find the following
- Crazy rules in production but not staging
- Mapped folders on production but not in staging or development
- Databases or Database permissions incorrectly configured
- Caching doing wierd things one does not see in staging
This only gets worse if you have load balanced servers etc.
I believe as a rule no developer should "develop" in production but having access to the production environment depends on your work environment and who would know best. Sys admins NEVER KNOW BEST when it comes to websites. They know their hardware but rarely do they know how to troubleshoot issues and almost always blame your app. However, at the same time developers often don't know enough about the technology their applications are running on so they instantly blame the sys admin.
Ideally, in a large scale deployment you have someone with the knowledge of both (that's me) who can identify and troubleshoot common issues. That individual would have access to production. Additionally, in smaller development shops not having access to production is just stupid. It doesn't make sense.
- Source Control is a great idea, but rarely used due to difficulty of use. Sometimes its not an option because someone has to support that.
- Financial limitations can reduce the ability to have shared development/staging areas. Working on your own machine is useless when testing. It does NOT test working in a production environment and thus... sometimes production is the test field.
We can play the blame game but in reality who has access to production should be limited to those trusted enough to not do stupid things without backing themselves up. Sysadmins should keep a backup of the production site at all times, developers should not mess with production unless it's urgent.
Every piece of specialized software will fail occasionally. Not talking highly productized software with a gazillion beta-testers, but the ERP integration layers and other type of software that has been more or less written for a single, or for a few customers.
No developer, and no amount of synthetical testing can ever cover all the possible angles in these highly complex systems the way the real world can. This is especially noticable for integration systems, highly dependent on external environment.
Some of this software is highly mission-critical, when it stops, business stops. In these cases, dev-access is probably the sound way to troubleshoot, and get things running smoothly again.
Most shouldn't but that doesn't apply to me.
I usually work for large companies with QA/Staging processes. When someone suggest I poke the production servers, I REFUSE to even be given any password related to those. The argument being, we have 3 steps before an application goes live, if there is an issue, it's either a bug that hasn't been caught early enough or there's a support group who has the authorization to help in investigating.
If a developer must access production servers, something in the bug detection process failed and it's way too dangerous to have anyone probe them. Also, in many organisations, the data is sensitive enough to not have the common human being even have a glance at it.
I do work mainly with LAMP stack apps, and one major step that we've taken is to work more CI magic into our workflow. I *love* Hudson, and have it setup to do everything from typical testing duties, to jobs for pulling sanitized production databases back for testing. The cool thing is that I can give some developers access to certain Hudson jobs, and let them trigger the production dumps whenever they want.
I've even taking to setting up jobs that will spin up a VM, that gets setup with puppet, and then load the app with latest production dump, with parameters for the name of the environment. Now developers can even build their own testing/staging environments with a click, and everyone gets hassled alot less, and production sits alot safer.
Can developer recreate issue in development environment? If yes. No. Stop.
Can developers recreate issue in test which is loaded with production data? If yes. No. Stop.
If developers can not obtain information in development and test systems then absolutely they need access to the production system. But that access is to diagnose, not debug! There is a difference. Logging may need to be enabled and production may slow. Generally speaking, if the administrators have done a proper job of specing and maintaining the system in the first place, the system will survive additional logging being enabled.
Do they carte blanche access? Generally not. But if you have really good developers who are capable of using good judgment and/or have a good relationship with the production guys, there's generally no reason things can't be discussed and worked out.
Really, this is a question of simple problem solving skills and relationships. Is an article really required?
Having access to and mutilating the environment are two completely different things. Treating developers as hostiles by server admins doesn't create the friendliest work environment.
There is a big difference between a bug and the reason why the bug occurred - having access to a production environment is paramount to understanding the underlying issue.
In most newspaper sites the headline / lede / seo field / first graf may is usually programmatically brought in as the META description for SEO purposes (unless specifically overridden). It's a fairly common assumption that this field would be pure text and overlooked in that it doesn't need sanitization. Of course, it's also a fairly common consequence that some silly editor eventually breaks the site by putting HTML code in fields they weren't meant to house. You'd be surprised how many (even big) media sites fail to sanitize these fields.
Onto my point: having HTML (or faulty HTML for that matter) inside a HEAD description field seems like a bug. Sure, you can replicate the error by copying the environment and fix it by stripping anything unexpected out. However - that may not be the root of the problem. Thus, developers end up putting bandaids on a system and treating symptoms rather than curing the problem.
When copying this environment to reproduce the issue, one might simply grab the part that's affected, ignoring the user - CMS preferences which would actually end up telling you what the problem was. What a developer SHOULD do instead is poke around the environment, notice that this was a common occurrence with particular users, talk to the editors in question and shadow them for awhile to understand why this sort of thing keeps repeating.
In many cases, I've seen editors take advantage of programming or security holes to produce far richer and personalized web content than the original design or system permitted. By treating this problem out of the production environment context, the editorial side is completely cut out of the feedback and has no say in the matter and their creative outlet disappears with no dialogue. This in turn breeds hostility and distrust between technology and content.
I can assure you that in reality there is minimal dialog between developers, designers, project managers - and - editors. In my past experience access to the production environment was the only means of lateral communication between abstract technology and persons who use it. In my cases, I've been able to provide editors with features they hesitated to request to circumvent problems while still tightening various holes. This, in turn, improved everyone's day.
Look at it from a philosophical view- you don't want developers to be "writin' them codes" in vacuum of non-editorial space. They need to be a bit more intimately involved with the entire publishing cycle from start to end and not be ticket-solving space cadets when it comes to solving problems.
Where I work as an Admin we filter out that crap for the developers. If its a environment issue then we resolve it on our side. If its a code issue we decipher the the who/what/where/when and why from the 'I got an error' and get our developers an more precise idea of where the problem lies. If our developers wanted to get the phone call at 2:30am when a system is down I am sure they could work out getting access...but so far they have declined and you know what..they kind of appreciate the fact that they can focus on writting code (i.e. doing their jobs) and less on Jane's IE6 SP1 not displaying an application correctly.
"Well look, I already told you! I deal with the goddamn customers so the engineers don't have to! I have people skills! I am good at dealing with people! Can't you understand that? What the hell is wrong with you people?"
Developers: "Yes" Everyone Else: "No"
We test the new blockware logic on test racks before installing it in the plant but the devs have full access to the running control system of the refinery plant.
We can at any time 'monitor' the values of all function blocks in the whole plant, and change configurations on the live system.
While this sounds iffy at best, it is the only way to do it. You can only test things so much before they have to be put into the live system. Replicating the live system is nearly impossible without spending silly amounts of money. So far no major accidents have happened but there is no guarantee.
In a perfect wooooorld and so on .
... in a split decision, vi wins. Oh wait ... wrong holy war!
Que Deus te de em dobro o que me desejas
[May God give you double that which you wish for me]
If you don't have a development server that closely replicates the installed state and environment of the production server, you're doing it wrong.
You've built inaccuracies into your development and test system, differences from the production system that will result in differences in the completed code which will then encounter those differences and behave improperly on the production server.
I.e., you're building bugs in deliberately.
Stop that.
Now, I understand, sometimes scalability is an issue. You can't replicate a whole server farm for the devs. But you can isolate that variable and design for it from the first day. It's the other variables that you don't anticipate and don't have visibility into that you need to keep from varying.
Question and Answer? Or QA as Quality Assurance? :P
Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
I worked as a developer at a Fortune 500 for about half a decade.
This is strictly my personal opinion based on observation as a cog in the machine, responsible for writing one small part of processing transactions:
We always had these fantastic procedures that made a lot of sense on paper : Developers normally had no access to production, but could request a 'readonly' id to review logs, and in an emergency could get management approval for a 'firecall' ID to make carefully supervised changes after discussion on a conference call if absolutely needed to resolve outages. It looked great on paper, controlled and thought out and even reasonable.
In practice, using the emergency change request procedure was political suicide because it gave another department's vice president ammo to hold over your department's vice president, so your department's vice president made it completely clear to his teams that his department didn't have any cowboys who needed to fix things on production or as emergency changes, no way no how not on his watch.
This led to an utterly dysfunctional environment where poor architectural decisions were intentionally made based on the knowledge that avoiding an emergency change was more important than any other consideration. Logs were made unsecure and exposed on message queues or in uncontrolled db tables or on uncontrolled network shares so that they could be reviewed without needing to request and justify a special id. For some reason there was a loophole in the process where database changes were often not treated to the same scrutiny as deploying a new binary, so code was moved out of the app layer into the database for no other reason than it would be easier to get a database change approved without informing a vice president. Nobody piped up about needing to get this loophole closed because it was the only way available to keep a system running.
Ditto configuration changes - code was moved into xml files because they were 'configuration' and it was easier to get approval for an admin to edit a config file than to deploy a new binary. For some reason I could never fathom, you could do configuration changes without requiring QA, while you couldn't do new code without requiring QA. Code tended to be defined as a binary file, so whole chunks of logic got moved into more xml. I assume the theory was that QA tested different configurations before certifying the code, but this was definitely not the practice. QA verified that if they pushed a button an appropriate answer showed up on screen, and nothing beyond that, and no negative testing, and no alternate configurations, and no checking that the right data was in the database, or that two people pushing the same button at once didn't blow everything up.
The business would have actually had a safer more stable more reliable system if the xml 'configuration' (code) was at least checked by a compiler, but then Vice Presidents had to get involved if something didn't work as expected.
The best part was interfacing with external systems. Interfacing with external systems often doesn't work in production exactly like it does in test, but if you can't touch production to fix something that isn't working... Eventually every communication with an external system was replaced with a new system whose sole job was to speak SOAP to external systems, put things into an internal queue, pull things from the internal queue, and then speak more/different SOAP to internal systems. Reason? The queues could be tweaked and the SOAP messages changed without approval from a VP. This increased perceived reliability and was then considered so successful that the communicate-with-external-systems system was now tasked with communicating with all internal systems too, and a similar parallel communicate-with-internal-systems system was also developed. There were now too many different systems to figure them out, so another system was given the task of routing messages to the right system and trying to translate into a common
We don't allow devs to write to production... and that makes them happy.
Often a pointy-hair type will impress upon workers "if you can, then you must," and that can include making cockamaimie changes IMMEDIATELY.
The devs can hide behind the wall I've erected, saying "sorry, no-can-do, I'm not allowed and I'm locked out, so there's no point in threatening me about it." Then the pointy-hair types have to explain to the sysadmins (who are in a different department, so they can't be easily intimidated) why something has to go out without being tested.
It's bureaucracy, it's red tape, but it also puts reins on running running the product off a cliff just because someone panicked and is used to throwing their weight around to get what they want.
Another good way to slow down the urgent pointy-hair types is to ask for the request "in writing, please."
Clones generally only duplicate installs and not data. The bug may be caused by a slow corruption of data on the prod server. It may only show up after weeks or months of use. It may only happen when certain data happens at a certain time (interaction with cron jobs). Sure one can clone the data too but when the data is multi-TB that will take time and money. Even at this point one can not clone the data flow into the server. Going into a prod server and adding logging lines should be allowed. (Sorry but the "log everything" solution is not a solution as one could create TB of logs in a single day on a prod server)
Access for debugging purposes:Yes
Bug Fixes: No (they need to be vetted first. If it is critical the vetting should have very high priority)
I told the new IT director that we needed a change management system.
What I thought it meant was a piece of software that would:
1. Back up the old website
2. Take the new website from a Staging folder on the Dev machine and move it to production, after hours,
3. Do some basic functionality testing (does the website even load,etc.)
4. Provide an one button restore function of the shit hit the fan the next morning.
What he thought it meant:
1) Fill out a piece of paper advising everyone that the application was going to change and who they could blame if it didn't work...two weeks ahead of time.
When Fascism comes to America, it will call itself Anti-Fascism, and tell you to give up your guns.
But I do need access to an exact copy of Production. One of my big annoyances was when I couldn't replicate a problem, then found out the data I was working with was different from the Live data that caused/revealed the problem. Or some supporting software had been changed in Production and I was testing with an obsolete version.
I don't want to mess with (or mess up) the Production stuff, but if my stuff is different from theirs, I can't guarantee that I can find their problem.
Slow down, cowboy! It has been 4 hours since you last posted. You must wait another few hours.
We have a pretty solid separation between devel, test, review/deploy & general admin. All of the developers and testers have Xen enviroments which our exact copies of our live servers. We use subversion to track multiple branches for code changes. We setup up automated tests and checks that can be run in short order on any proposed change to our live environment trunk. This process began two years ago... and the speed of coding to deployment is almost as good as before, but the quality and reliability of the code far exceeds what it was before.
HOWEVER: as a QA manager who used to have access to the live system, but does not have access anymore... I find it most frustrating when the cause of a bug is not do to the structure of the program, but rather some exotic input or unexpected database content. Because of the blind wall between me and the live data, I find that the source of these errors cannot be determined and subsequently resolved in a timely manner. Especially when the people who have access to the live datasets do not properly understand how it is being used in the code.
CONCLUSION: It is not a problem keeping coders away from the live/deployed code. A well configured 'dev box' will suffice. However, seeing/debugging the live dataset (CGI input, database content) that flows through their code is what they will miss.
The Mother of All Bad Deployments over at Digg today. Everything is broken. Good thing they didn't publish during peak usage. Oh, wait...
The more developers work in production, the more they can ONLY work in production.
I'm all for read access (the more eyeballs the better), but actual access to change anything is a train wreck. The devs will forget to check the changes in to the source repo, or they'll check them in differently (bad copy/paste), or they'll check them into the wrong branch/tag. Regardless the next release that goes out silently adds the bug back into production.
And if developers think it's difficult to fully clone a prod environment configuration into dev now, wait until they try to do it after developers have been hacking on it directly for a while.
Pretty soon every release is a train wreck requiring tons of post-release tweaking and hammering to get it in place. Every release is a stressful mess as you're all crossing your fingers because you really have no idea what you are actually changing and no way to find out.
Just don't do it. Hire a good build engineer/release manager/software configuration manager that can sort out, automate, and track environment management well enough that yes, you can reliably clone an accurate representation of production in a matter of minutes. He'll cost you about as much as a good sr developer, but the savings across the board will easily dwarf his salary.
My
1. No matter how perfect your setup is, there will always be variations between prod and the most meticulously setup prod clone, shadow, backup, standby, etc... that will cause some (hopefully not many) bugs to present themselves in one environment and not in others
2. Data changes behavior -- Even if the web apps are identical, a small change to a database table could cause breakage in ways you never imagined. This can be controlled through proper layering and decoupling, but it always seems to creep up when you don't even notice it
Developers should never have direct access to production. The usual steps for solving production issues should usually be solved as follows:
1. If a support person gets the defect (either directly finding it or from a customer/user) then they look up in their knowledge base to make sure that the company doesn't already know about the problem and have a reasonable solution.
If the problem hasn't been logged and is outside of the technical competence of the support staff, it usually gets thrown on IT / Operations support
2. Operations support should one again verify that the problem hasn't been addressed before. Often problems will just continue to re-occur. Some problems just never get solved for one reason or another. Some problems -could- be solved, but the fix is just easier than a large amount of investment in making the problem go away. Often 'reboot the server' is a good enough solution for problems with low frequency occurrences and high complexity solutions.
3. Operations doesn't know the problem, so its time to poke around the logs, system resources, network connections, etc... A good network and systems management solution makes this problem moot. I've seen tons of production problems end up being the result of rarely used disks getting full with nobody bothering to alarm on them. Finally, ops throws their hands up in the air and are like 'WTF.. this is a development issue'.
4. This step is the hand-off between operations and development. I say hand-off, but in reality, this is when the two teams need to start COLLABORATING to find a solution. Too often either operations chucks the problem onto development's lap, or else development takes the problem and ignores operations until they find what they think the problem is or give up. The best solution in dealing with development related production problems are for both teams to work together and use shared knowledge and experiences of both groups to diagnose and resolve the issue. I've played on both teams over the years and I know quite well that Developers too often want to assume they know everything about deployments, servers, environments, etc.. Often that knowledge is diluted and ends up blinding them to issues that could be diagnosed at least in part with a network administrator / operations support. The worst situation is for this step is to get into the cycle of blaming one another for the defect. Tough to diagnose problems too often flame into blame games which always exacerbate the time to find, diagnose, and resolve an issue.
5. Development asks for information X, Y, and Z. If there's enough information to gather an accurate diagnosis, a solution or a workaround is devised. If its a workaround, the problem and the solution should be logged somewhere so that all parties (support/admin/development) can see what problems are outstanding and if the problem comes up again, the workaround to address it. If the solution is a new release / patch / etc.. the fix should go through the general release cycle as usual (though much faster for more severe problems). Some problems don't and may never have evident solutions. The trick here is to find an adequate workaround that keeps stakeholder impact to the minimum.
Developers should always have access to:
- Server / Database logs (Just don't put anything confidential within logs that can compromise customer / user privacy)
- Production IP's, and other data that is unique per release environment
Developers shou
Bye!
Long answer, it depends on the situation.
I (and a couple of my coworkers) have access to production servers, but we don't develop on prod. End of story. We have other devs who do not have access to prod. Dev is for dev, prod is for prod and don't let anyone without the discipline to keep that rule have access to prod.
LK
"Hi. This is my friend, Jack Shit, and you don't know him." - Lord Kano
Reading a lot of comments it looks like there's a wide variety of definitions for some of the job titles and roles people are discussing here, so I'll list how I see them:
* System Admin - Person(s) responsible for the hardware and supporting (OS, Web service, code language and client libs, JVMs, etc) software. They do not in any way support the applications running on said system and would be incapable of debugging or supporting an *application* problem even with a gun to their head. Most can only describe 2-3 sentences of what the applications even do. They do not report to or answer directly to the application teams. They also do NOT install application code.
* Database Admin - Only want to address roles here. At every location, the actual application data stored in the database is NOT the role or responsibility of the admin. It belongs to the application team and any changes are their job and their accountability. The DBA only deals with schemas, packages, procedures, scripts, access roles and grants, etc. DBAs should NOT MANIPULATE DATA. Asking or allowing them to do so opens up a never ending blame game and is counterproductive. If you want to create some title and role within the application team where all data manipulation funnels through, that's the way to do it.
* Implementation Specialist (Code Migration) - Trained monkeys who are supposed to follow a set of pre-delivered instructions for deploying application changes. In my experience their technical knowledge is limited, they cannot verify copy/paste correctly, and screw up (transferring ZIP files in ASCII instead of binary) more than they succeed. I don't feel this position is even necessary. The PROCESS is necessary and it can be performed by anyone, even a developer, as long as they switch their role hats before starting and are held accountable for accurately following the deployment instructions given.
* Production support - They act both in a technical and relationship role, being the contact point between the customer (internal or external) and the application team when issues arise. Generally have read-only access to production. They are able to debug many problems and resolve a few, but definitely not all of them. They do not participate in any part of the development lifecycle processes.
* Developers - Not going to discuss or debate any pre-production roles here since it's irrelevant to the topic. Developers are the only ones I would be confident could debug ANY problem. They are going to need some reasonable level of access to production, logging, or information if you want to have an application that can maintain high availability and recover quickly from any type of outage.
If your definitions to these roles differ significantly, then my answer for your company's situation would change.
Depending on the size of the application and the team allocated to run it, I've performed up to 4/5 roles and was pretty much the 5th as well since the Sys Admin only could barely squeak by supporting Windows 2k and definitely had zero knowledge of any of the supporting software. Are you going to hire someone to do 6 hours of work per month just to separate the responsibilities? Of course not. So the OP's generalized question is open to a million different interpretations because of all the different variables that weren't specified.
My most recent application team recently went through our production lockdown after finally migrating over an application suite purchased from another company. Developers and Prod Support have read-only access. Database passwords used by the applications have been restricted down to just a couple individuals. When changes need to be made to either the application or data, an Emergency ID is checked out to a requesting individual with the appropriate access level,
To me this is the crux of the question: I have seen developers who were perfectly capable of managing individual servers due to long experience and having to perform DBA duties in the scale of the systems they have dealt with. I have also seen developers who know a language but complain about the SQL server when they write queries that run like batch processes through lack of understanding about the way the systems that they are writing generic SQL into work.
The first type I would probably allow access to the production system - provided it was not widely outside the developer's experience and did not have an uptime requirement meaning that it had to be strictly controlled and tested, the second is type is exactly why I would never allow a lot of developers near production systems - small scale or not.
True Story: As a software developer in my early 20's, I had root access to the mainframe that handled all shipping documents for a billion-dollar company in the transportation industry. Of course, I was on-call via pager and was responsible for smooth operation of said system. The Sys Admin kept trying to give me lesser levels of security like SUPER.OPER on a Tandem system rather than SUPER.SUPER (root) and constantly changed the password. But after enough times of me calling him in the middle of the night for the password, he gave up and just told me the new password every time he changed it. I wish I could blame it all on the architecture of our software or his administration, but there were certain things core things we were doing that made it dificult to setup security properly. Nothing bad ever came from this (thankfully), although today I think you could get nabbed easily, since security breaches back then weren't as numerous as today. Also, the systems were more closed to the outside world (we're talking nearly pre-internet here).
Source control is difficult to use? Where in the world are you working? Even the smallest of shops have access to free source control, which requires minimal administration. How much maintenance do you thing is needed in, say, a mercurial repository?
Hey, Oracle DBA. I aspire to that job someday. There's just...something sickly satisfying about writing good sql or pl/sql. No clue why.
Seriously though, can you please describe your setup (restoring to a dev machine nightly) in whatever level of detail you're comfortable in? I run IT and develop for a small biz (postgres). We dump our database out weekly and rsync the backups. Problem is, the whole DB is 76G. Even the backup is nearly 8G compressed. Even restoring that file on a new dell poweredge with ridiculous memory takes at least 4 hours (that's after copying the file onto disk, and gunzipping it into a pipe so it runs as fast as possible--no GigE LAN bottleneck or anything)
I can't move it over the network to the dev server in another location fast enough to do that in a day. Do you do some sort of fancy export/clustering where you share with a remote location, or is your dev server physically colocated with your prod server and fast/godlike pipe? Or maybe you guys just pay for enough bandwidth to be responsible in your backups...
When I started at the company, our database was only 40 Megs... was a piece of cake. Of course, the DB would do seq scans since it could keep everything in memory cache back then...
Anyway--a minute of your insight would probably be invaluable to me.
Hell. No.
I'm a developer as well as a sysadmin and I NEVER tweak anything in production and I have full access to it.
I have an exact copy of my production environment for development and I do all my tweaking/test deployments there.
In fact nothing gets deployed to production until everything has been checked in development.
My previous job had dev/qa/prod environments where the devs had full access to development and it was so bad that we had to virtualize it for them just so we could revert back to a pristine snapshot whenever they jacked up the dev server.
We don't give our fixes to a trained monkey we give them to System Administers
- you have Ministers that systematize ads?
You can't handle the truth.
No. Do not give developers access to the production machine ever, except me...just this once..
The Kruger Dunning explains most post on
I have systems which go out to site, and sometimes, I need root access to them because we simply don't have access to the hardware in our R&D lab, and need to test code on a production system.
Note "intensive" purposes and "begs the question".
It's deliberate fingernails squeaking on the blackboard.
Well the solution is simply. You should be able to ask your production / systems team to just "restore" the entire live system into a spare machine. All from backup of course. :)
If they cannot do it then the systems guys really do have a problem on their hands
Then reproduce the error on the test system.
Modify access? Absolutely not. Read access? Yes.
Because who is management going to run to at 3:00 in the morning when something isn't working? Not the system administrator - the developer. Let them see the logs. Let them see what files/versions/timestamps were deployed. Let them see what else is running on the box. And let them do it without giving instructions over the phone to some admin who is sharing their console over GoToMeeting or something. Or this is going to take all freaking night.
As a side note, how many times has a developer been dragged in to troubleshoot why "their program quit working" only to find that the real problem was something like OS updates were applied without being tested, or a new virus scanner was installed, or system X was installed on the same box?
The people running test environment and production environment are usually different groups. Communication needs to be clear in both directions so that the software levels are the same.
Blar.
Everything isn't a webapp/DB solution. Our networks are highly segregated for security. Who is in charge of making sure that 'different DB' has relevant and correct data in it? Who makes sure that the cloned prod image doesn't have sensitive info in it? Shops serious about security would laugh you out of the building for suggesting what have post.
Blar.