How Fast is Your Turnaround Time?
petrus.burdigala writes "I work for a mid-sized commercial software company (~20 Mloc) and we are frequently challenged by our supervisors to get fixes around the clock. Overall, we manage to get a 'bullet-proof' patch in about 4-5 weeks (from coding->QA->Build/Packaging->shipment), which I consider not so bad. But the other day, we got an urgent request from our support team to come up with a decent fix in 48 hours. I think they're a tiny bit unrealistic. So I wanted to get feedback from my peers: are we doing that bad? It takes months for other software vendors to issue zero-day exploit fixes, are our customers being unreasonable?"
It may just be me but I think that's why they are called "customers"
Excuse me while I gather the virgin sacrifice and assemble the pentagram required to solve your problem
How much of that 48 hour deadline did you waste reading /.
Get back to work!
You have to serve the client who is paying the bills - and we had a very vocal one (Nik*). We had a running joke about the release d'jour. But it wasn't a joke. We literally would push a new build to them every day which contained minor bug fixes. It was maddening! But no one had the balls to stand up to the 800lb gorilla, so the madness continued. As a side-note, they were acting as a beta tester and anyone in the software business knows what that can mean.
What was that exploit again?
You can't talk about Wikipedia's flaws on Wikipedia
For high priority bug fixes, it usually takes 1 to 2 weeks to get a patch out once we determine that a patch is needed.
ÕÕ
It depends upon the nature of the problem and the competency of the developers.
If you know enough of the code tree you can tell when first reproducing and examining the failure whether it is a one off mistake or a larger procedural fault.
Single instance stupid errors (doh! moments) can be rectified and put through testing fairly quickly, however if your initial examination uncovered a larger problem then obviously the process will take longer (if at all - consider workarounds).
If the original dev/test team has been replaced over time this becomes a more difficult issue and every bug must go through complete verification simply because the extent or ramifications of the code modification will not be known.
In some instances we have had fixes out of the door the same day an issue was noticed, in others months go by before a final fix is put in place.
liqbase
I can understand a week, but honestly...if you're leaving your customers vulnerable for over a month, they might start looking elsewhere
Exploits should be a high concern for any company
I work for a bank so we don't do box software, but our patches have to meet FTC standards and Federal bank standards.
It is uncommon, but not unheard of to have an 8 hour fix. In cases of customer data vulnerability, legislation has been made such that if we are aware of a problem, we have an automatic injunction against us continuing to do business unless the problem is resolved. So when we have a security flaw, our bank stops working untill it is fixed. So yeah 48 hours would have people fired for sure.
Compliance/security are the only two things that can spark a release with less than 72 hours notice though.
But the other day, we got an urgent request from our support team to come up with a decent fix in 48 hours. I think they're a tiny bit unrealistic.
Well, we really can't answer that question with knowing how big the problem is. If it's an embarrassing typo on a dialog box, then 48 hours is reasonable. If it's a windows vista security patch, then 48 days would be unrealistic.
-Grey
Silver Clipboard: Time Management Tips
It depends on what you're maintaining and how complicated it is. I've gotten fixes out in 2 or 3 minutes. That doesn't mean I'm fast and you're slow, though. "How fast is your turnaround?" is like "how long does it take to write a computer program?" It's hopelessly vague.
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
Yeah, your turn around time seems good and yes, the customer's request is beyond industry norm.
That might mean one of three things:
One: Customer is being foolishly optimistic.
Two: The entire industry is bad about turn around time, and can, if pushed improve it to 48 hours.
Three: Customer needs it really quick and is hoping to get it quicker by asking. They know 48 hours is well beyond the norm, but are hoping you can do it anyway, because the more time it is unpatched the more they are screwed. They know that if you don't ask, you can't get, so they are at least 'asking'.
Me, I think it is a combination of all three. Customer is being a bit optimistic, the industry is bad about turn around time, and also the customer knows it is a bit optimistic but is making the request anyway in hope you will provide amazingly good service.
excitingthingstodo.blogspot.com
Sometimes, customers are unreasonable and if they are, they should be treated with respect and the problem explained to them. Yes, they may be incredulous, but if you hold your ground (if they're being unreasonable), treat them with respect, they will come around.
The fact that the parent was moderated down just shows me that the arrogance, contempt, and stupidity in corporate America is alive and well - especially in IT.
I prefer Flambe as apposed flamebait.
With a little simplification, you have four parameters: Difficulty, quality, speed and available resources. Whenever you fix three, the fourth follows (with some unvertainity). It is well known, that there is a limit on how much you can improve the speed with more resources. So there is an upper limit on speed already. The second problem that difficulty is unknown when starting such a task. There is no fix for that.
So if these people fix speed and available resources, and difficulty is fixed by the task, quality is determined by these factors. Period. There is no arguing with hard, real limits. If they do also want to specify the result quality, then they have to leave speed open. Again, there is no way around that limitation. In fact they should be happy if the team manages the required quality at all in reasonable time. Not all teams do.
Maybe thisn will be an argumentation that is inderstandable for people with a business background. Engineers should already know this.
Software engineering is engineering. Engineering tasks in general have minimal time requirements. Look at structural engineering: Nobody would try to design and build a full-custom bridge in a week. Instead it takes up to a decade, depending on difficulty. And you can generally not speed things up by increasing the team size.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Overall, we manage to get a 'bullet-proof' patch in about 4-5 weeks (from coding->QA->Build/Packaging->shipment)
Not unreasonable, depending on the size of your release. (How many modules and how many LOC you're changing, the number of change requests or bug reports in the build).
But the other day, we got an urgent request from our support team to come up with a decent fix in 48 hours. I think they're a tiny bit unrealistic.
I think they're smoking crack.
So I wanted to get feedback from my peers: are we doing that bad?
With your regular release schedule, I don't think so.
are our customers being unreasonable?
Yes. That's what they do. If they want a crash development program to get this "patch" out the door that fast, they seriously risk software which does nothing but crash. Really, if they want it that bad, they run the risk of getting it that bad.
You have to ask yourself and your "support team" (sounds more like marketing to me): "Do we wish to ruin a perfectly good reputation for quality and reliability in one hurry-up bashfest followed by weeks of agonizing on-line debugging?" Really, advocate any kind of work-around and risk mitigation response before being pushed into an overly-hasty release that will linger on your reputation like a dead skunk.
Welcome to the Panopticon. Used to be a prison, now it's your home.
A patch (IMHO) is a bug fit to existing code. Given the resources we should be able to get a PATCH out in a week. However, if you need a new version of the software to address the issue. Then we're talking longer development/testing/QA times if which case 4-5 weeks would not be unreasonable. Bugs should be fixed as soon as they are spotted. If their is need for a whole rewrite then you may want to talk to your staff
Ask not what you can do for your country. Ask what your country did to you
I know I'm going to end up baiting some developers, but I work for a specialized ASP and see a ton of third party software from a perspective few get...
Normally, the smaller the company the more agile. No surprise. They also get patches out faster too. Also no surprise.
When we look at vendors of equal size, the ones who are really quick at sending out patches are in that situation because their software is more buggy, and they have a *lot* of practice. It never fails.
In response to your question, I would suggest that you should look more at the frequency of patches and less at the duration. Sure, it might not be as fast as your support group wants, but if you start reflexivly sending out patches every time someone yells, then your overall product will suffer since you can't possibly do the proper QA to ensure THAT patch you just whipped up doesn't break something else.
That brings me to the age old choice:
Pick 2 of the following:
Speed
Quality
Cost
How much time do you spend on TPS reports?
The last time I did one I forgot the cover page and my 7 bosses all bugged me about it.
Ok, the name might suck, but the company I work at follows the Extreme Programming practice, a kind of agile programming. I have only worked there a few months, and had never herd of XP before, but am now converted. We work in pairs, which instantly adds a whole testing level. Deployments of code are done once every week, but sometimes more in an emergency. We write code test first, then run a build on our machines, then we upload it to a test environment where automatic tests are run. Finally on passing that, it moves to a stage environment where humans test the code, when they are happy a version number is noted, and that is uploaded to live. This means it can take a day for some code to be written, tested and deployed if required. It also means there is continual development, different departments can work on different versions, and then there is a weekly deployment of the latest stable code. It is a very interesting practice, and seems strange at first, but I would highly recommend it for certain types of companies. The company I work for took a few years to convert, and it was slow at first, but now it is an expert and even helps train other companies. It also builds its business upon being one of the quickest responders for code in our region.
At BSDi, the initial patch (which did have flaws, but it fixed the problem) for the f00f bug was same-day, I believe; might have been next-day, depending on where you're counting from. (Contrary to popular belief, this didn't violate any NDAs.) Now, that was an emergency patch -- it took a while to come up with a patch that fixed the bug without noticable ill side-effects.
We had a better patch later, but the initial emergency patch was VERY fast.
On the other hand, if the initial bug report is "Sometimes the program hangs, no, I don't know when. Maybe every week or two." -- well, that's gonna be hard. Exploits generally have the advantage that an exploit is by nature at least somewhat reproducible, and the hardest part is often getting a reproducer. I've had it take six hours to develop a usable reproducer, and three minutes to develop a patch.
Release time depends hugely on process and procedure. IMHO, an ideal procedure would have some kind of way to get a Temporary Patch out into the field ASAP when there's an exploit.
My blog: http://www.seebs.net/log/ --- My iPhone/iPad app: http://www.seebs.net/seebsfrac/
48 hours is tad bit tight. However, I've turned things around in a similar amount of time.
But, the old adage is true: you get what you pay for:
When faced with unreasonable deadlines in the past, I usually voice my opinion once, and just do the best I can. Your higher-ups are probably already quite stressed at this point, and adding stress to the situation doesn't do anything for your career or theirs. Rather, if you make the point that you're doing the impossible, you might just have a little bit more bargaining power when it comes time for raises.
But on the flip side of the coin, if management doesn't learn, and you find yourself constantly asked to do the impossible, you might want to consider employment elsewhere...
The society for a thought-free internet welcomes you.
You really have to supply some more detail to get any useful answer. And what is ~20 Mloc? About 20 million locations?
/.? Do you make internet-facing apps, and an urgent request means your customers just formed a spamming bot-net? Are in the medical/health care field, and an urgent request means folks might die?
What kind of software? What classifies an urgent request? Do you make games, and an urgent request means your bug just made front page
I think a better question is, how do you classify bugs? How do you make that trade-off between fixing a bug ASAP and taking the time to make sure the bug fix is done right?
Who is involved in the decision process? Is it just the technical & regulatory folks? Do you pull in business folks to help gage customer impact? Do you pull in sales and support to see if they can push a work around before the final fix is ready?
Those are all better questions than, "How fast do you do this task of unspecified scope."
I love the parent's subject-line analogy.
I'd add, it depends on product, the complexity of the codebase, the extensibility, modularity, readability, and extensibility of the codebase (eg, if it's highly modular it's easier to test a fix that's limited to the module/plugin)
I'd suggest that weeks sounds too long for an in the wild update without a security patch - or published workaround limiting your exposure. (eg, "use this method to restrict the IPs that can access it to trusted ones.") But that isn't me saying you're developing too slow, it's me saying that if it's going to take you that long you need to either find alternate solutions or create a architecture that allows you to quickly make access-limiting patches and layered security.
Actually, I'd make that more broad - if they want faster response to patches, what they need to do is to invest a lot on a highly modular, pluggable architecture so you can MAKE rapid changes. It's really a question of how much investment they want to make.
We routinely do same day fixes to certain kinds of things... but certainly the complex things take longer. And I think we're pretty unusual in that regard.
Looking for freelance Actionscript (Flash/Flex) or ColdFusion work and/or freelance developers. Email me, put Slashdot
*15 minutes.
It's bad enough that they directly state they're not really testing patches with a 15 minute turnaround, but the fact that they're making mistakes that can be fixed in 15 minutes speaks loudly as well.
--
Our running joke used to be:
Marketing: We need it real bad!
Engineering: How bad do you need it?
Marketing: <puzzled look>
Engineering: Careful what you wish for... OK, Ops. Ship it!
I've had situations with customers who require a fix as soon as possible, because if the system is down they are losing money. When this situation occurs, we have two goals in mind:
(1) Get the customer up and running again as fast as possible. This is as often as not some sort of workaround that is not pretty, nor is it permanent, but it works. The workaround does get thorough testing (impossible within the time frame) but the customer is aware of this and willing to accept the risks.
(2) Get the customer a proper, version controlled, patch that they can install to fix the problem permanently. This can take weeks, most of that time being testing. If the customer is insistent we will ship them the proper patch before it is fully tested (again, making them aware of the risks) and continue testing so that we can send the customer some warm and fuzzy news later on (or, if we find a problem, another patch).
Life is like a web application. Sometime you need cookies just to get by.
Show stoppers get immediate attention; whatever it takes. People are losing money because they're DOWN. Fix it now.
Criticals get next attention when show stoppers are out. 48 hours, depending on interdependencies and QA needed to make it work; it's not part of an official stable code tree until later.
Minors are in the next stable branch release; every whatever you can handle.
Nigglies are changed when the stable branch releases.
Don't deviate from your policy, and make sure the sales people KNOW AND UNDERSTAND what this indicates and implies. No exceptions; see above.
---- Teach Peace. It's Cheaper Than War.
I made them believe it was a hardware problem!
Engineering is the art of compromise.
Maybe the customer is being unreasonable.
Maybe the developer is being unreasonable.
It isn't possible to determine which from either person's viewpoint. You will ALWAYS think that you're right and that the other person is unreasonable.
Which is why you need criteria for bug escalation. Generating an incorrect response on 1 type of transaction for 1 specific scenario that may pop up once a year is far less important than a bug that corrupts the entire database.
And if your product is considered "mission critical", I would expect a data corruption bug to be fixed within 24 hours. Even if it is nothing more than rolling back the recent patches and re-issuing the previous version.
I'm an embedded developer, and when my stuff goes wrong, it can *really* do bad stuff. I've literally pushed fixed firmware to a controller running in a production scan/sort environment within five minutes of finding the bug, because it threatened to completely bring down a huge sort operation (and by huge, I mean 1 million+ pieces that day alone). I've also stayed up all night tracking down a bug crashing a device used by one of our larger (hundreds of millions of dollars per year) customers. Those, though, are the exception, and are driven by the massive financial and PR consequences of not getting it done right now. Throw caution to the wind, code and load if you are reasonable sure what's wrong and the stakes of not fixing it are high enough.
The usual bug fix cycle depends on complexity, impact, and risk. High risk of breaking things and low impact? Generally gets scheduled for the next release (4ish times per year). Low complexity and risk but medium impact? Code today, regression test the rest of the week, push this weekend. On average, mission critical bugs can get fixed in 8 hours or less around here, small to medium stuff is put on a weekly(ish) cycle with *lots* and *lots* of testing, and large stuff gets rolled to the next major release, unless it just can't wait that long.
does business with this company.
We generally get fixes for real bugs out within 24 hours, unless the problem is traceable to the OS, the only factor really out of our immediate control. Even then, we do a quick evaluation to see if we can replace the OS function. Over the years, we've replaced quite a few of them, but rarely within 24 hours.
But we know our code backwards and forwards; I wrote the majority of the current codebase myself, and I can generally get to within a few lines of the problem just by a bug's description... the rest is a matter of minutes and testing. This app is very large - comparable to Photoshop in terms of feature count - but it is also very stable after 15 years of whack-a-bug and a continuous drive to make the internal structure as orderly and regular as possible.
It is my observation that the more programmers you have involved, the slower your turnaround time (for everything from bugs to features) will be. Likewise the larger the entity, the slower it will generally move. Almost every layer of management and corporate compartmenting disease will contribute to slowing down the process.
For the apps that I use that I have had the experience of reporting bugs, it is my general experience that bugs often are never fixed at all. One browser, "Omniweb", truly my favorite in terms of features, has bugs that make it essentially unusable for me. Crashing, slowing, lockups and so on - really serious problems. I've reported them, they never were fixed, in fact the software was never updated. Eventually, I just went back to firefox. Then as Leopard came out, after years of doing nothing, they released a "Leopard version" in which, perhaps, I might find those bugfixes if I looked... but as I say, I have moved on and no longer have any enthusiasm for the product. Slow bug repair (or ignoring them) is synonymous with telling your customers you really don't care what kind of experience they have with your software.
Apple, with all their emphasis on customer experience, does this too. They've had bugs in hand for very long periods where they simply don't address them. If your bug isn't something they think will affect a lot of people, it isn't likely to be fixed. I've not yet purchased Leopard, preferring not to catch early-adopter syndrome bugs myself, but when I do, I would not be the least bit surprised to find you still can't refresh a remote share that's been changed by the remote OS; that the wifi differs hugely in compatibility between PPC and Intel hardware; that mail still hoses the sent mail box based on the return address; that shell fonts are poorly rendered; that shell ANSI compatibility is still broken; that the OS still provides locked-up beachballs at the most inconvenient moments; that the OS still puts the wrong things away on the HD when RAM gets tight, and consequently becomes massively unresponsive... Basically, Apple doesn't have good control of their OS, are unable to respond to bugs in a timely fashion, so much so that they triage out bugs based on report counts, and the common patter is that Apple provides a great customer experience. So while my own experience is that bug fixes are important and can be quick in turnaround, here's Apple showing us that you can make a complete thrash out of the entire bugfix issue and still come out smelling like roses. So is a few weeks too long? Probably not, if you have a good marketing department. :-)
I've fallen off your lawn, and I can't get up.
That's why there are companies who think a minor bug fix, or a small development, changing fonts or simple features, reconfiguring servers, restoring backups etc is something that doesn't need testing, concentration, at least little bit of planning and basic things like version control. So that's quite common in the industry: customers who think they are getting their product for less money because they can force every change as an emergency. They don't realize they are making development more expensive with hacks and constant build, tests and deploys overhead. Simple concepts from lean methodologies like Scrum should be taught to anyone who plans to spend more than someone's monthly wage on software. Everyone can benefit from a healthier development cycle and software will come out better and cheaper. But there are some clients learning to get the benefits of a constant release cycle and, as the poster said, they are getting the beta development cycles for free.
I was thinking about a joke on my subject on the lines of "people only know how to buy tech on Civ", but it's less important and I'll leave it on the jokes backlog.
^[:wq!
From nearly forty years of programming (yes, since the IBM 026 keypunch days), I can tell you with absolute certainty that the more that you do for management, the more that they will want from you. It is not your responsibility to bear all the punishment for the lack of foresight and resource allocation on their part.
Consider this: What would be the managerial response if you asked for a cost of living salary increase and that you needed it within 48 hours? Do you think that they would be willing to work day and night to make that happen?
Working in panic mode is not professional behavior, and it certainly is not conductive to good engineering practices. Furthermore, it is detrimental to long term company survival. Engineers who support continued unreasonable demands have only themselves to blame for enabling poor strategic planning by management.
Sometimes (just sometimes) it's obvious what the bug is, and it's obvious that testing is meaningless. Would you want to hire a company which does meaningless things to please you?
Even if the bug is obvious, it doesn't mean that your fix
1)Works
2)Works correctly for all corner cases
3)Does not have unintended side effects
4)Didn't accidently include some other changes you were working on before, which are not ready for production.
You still need to QA. Attitudes like yours are why the quality of software is so poor.
I still have more fans than freaks. WTF is wrong with you people?
The customer described a program they wanted (to run on an embedded system). I estimated 3-4 months. They asked for 30 days or less. I explained what they'd get if I banged it out that fast - something that would work most of the time and not lose too much data. They then explained that the program would save them over $1,000,000 a month. If it quit working, they quit saving money, but nothing else bad would happen.
So, I saluted and said I'd try really hard for 3 weeks for the first version, then about three months longer for a version that would work all the time. Which is what happened.
Do you know the impact on this customer of not having the fix that soon? Maybe it's worth it to them...
I work for a large healthcare organization and typically have very fast turn-around times (bugs often get squished within an hour). For clinical applications and other core applications, though, we're much more methodical and careful.
I often explain to the user that I can push changes out immediately, but it introduces certain risks. I then detail the risks they may face, and that if they say to go ahead anyway, at least they'll be aware of what might happen.
Really, it depends on your environment, and what needs to be done.
I'll use one of my web site as an example. It's all PHP and Perl, so ya, it's programming (I'm sure people will argue this).
Since I wrote all the code, I know it all inside and out. If you say "there's a problem [here]", I know exactly what file to look in, and what code to look for. I've banged out changes, tested them, and put them into production in a matter of minutes.
On a high traffic web site, we had a java applet which was being used by about 25,000 people per day. For little things, I'd change the code, test on all applicable platforms, and roll out the change in a few hours. Even then, the bosses were sometimes displeased with the time it took. Since I was careful to test, I never rolled out bad code, so I was never pushed into the long QA cycles.
Working with one company, things were a lot different. It went something like this.
1) Propose the change to your manager, with supporting documentation.
2) Manager would go to the project coordinator (i.e., customer liaison)
3) project coordinator would go to the customer
4) customer would approve the change.
Up to here was anywhere from an hour to a week. Sometimes the customer would put stipulations on the change, such as "there's a big event happening, or going to happen, don't make the change until X time."
5) document the proposed changes
6) hold a meeting with development, QA, the project coordinator, and management. Discuss the potential
changes.
1-3 days later
7) hold another meeting with the same people to rehash the changes.
1-3 days later
8) hold another meeting with the same people to rehash the changes.
9) Write the changes. Make them available to the QA team.
3-7 days later
10) Explain to the QA team that the errors they are experiencing with the fix have nothing to do with the fix, they were preexisting problems with another piece of code.
1-7 days later
11) hold another meeting with development, QA, project coordinator, and management, to explain that the error has been fixed with the supplied changes. The other problems are elsewhere.
1-3 days later
12) hold a strategy meeting to plan on how to fix the other problems.
13) fix the other problems, and break more things.
1-3 days later
14) have QA test the other changes.
14) roll back changes in step 13
15) beta test the previous changes, and notify customer
16) Customer balks at other pre-existing problems.
17) Repeat steps 5 to 15 again, until the customer gets tired of balking.
18) Implement changes.
Then start the process all over with step 1 to fix the other pre-existing problems.
The solution really is...
1) Identify the problem.
2) Gather together the appropriate staff who won't talk outside of your group.
3) Fix, internally test, and implement the resolution.
4) If anyone asks, there was no problem to start with, and you were all really working on steps 5 to 15 of the previous plan on another problem.
Funny how that works.
But, it's a matter of, is it a trivial fix, or something that requires serious rewriting? Did someone miss trapping invalid input in one line, or is it a poor coding practice through all of the code? Is it an included library that simply needs to be upgraded and recompiled?
Serious? Seriousness is well above my pay grade.
I'm in the same position -- I own and operate a small web/internet/custom software company. And certainly, things get published/pushed/shipped with bugs. Welcome to the software world taht we know and love. But when a bug creeps up -- when a bug is found by the client, or by the client's customers -- it gets fixed immediately.
And by immediately, I mean between 10 minutes and 10 hours -- if it's going to be fixed at all.
Certainly there are those minor cosmetic bugs that no one cares about -- client included. And there are those other usability bugs that have acceptable work-arounds. Those two get fixed in the next set of upgrades -- if the client ever wants upgrades.
But anything that actually affects the client's on-going business has to get fixed absolutely immediately.
And we're capable of this for a number of reasons:
- we build with "developer empowering" code, so it's easy to make small changes to significant areas.
- we don't have as many bugs and seems to be the average
- in general, much of our code promotes "self-healing" of user data
- sensitive data routes (financial, encryption, security, money, accounting) get extra care during initial development, so fewer bugs are emergencies.
As the owner of a business, I'm with the client on this one. If my web host had a bug that stopped me from writing a cgi script, I'd need it fixed pronto. If my pipe bursts, I need a plumber immediately. If my bank is closed when I need money, it's a problem. Any client whose business is affected by your bug is being very patient if they're willing to wait two days to resume their regular business operation.
You're stalling their business.
That said, client education is very important. That's why I've collected a list of almost 100 news articles of huge bugs in huge companies -- banks, NASA, various militaries, etc -- so when clients say rediculous things like "I'm paying for software, why does it have bugs", I can point to a billion dollar fighter jet, with four nuclear warheads, and say that it has bugs too. But that's not to get more time, it's just so that they understand there will be bugs, and they'll be fixed right away.
And yes, that's 24/7/52. (I take off the last Friday in July, not that anyone wishes me well for it)
At the risk of getting modded "offtopic" I will say what everyone is thinking and take a hit for the team
IS THERE ANY WAY TO BAN THIS ASSHOLE!!!! (pardon the little pun I threw in)
Goatse was funny 10 years ago but its really stale.
Make SELinux enforcing again!