App Developers Spend Too Much Time Debugging Errors in Production Systems (betanews.com)

← Back to Stories (view on slashdot.org)

App Developers Spend Too Much Time Debugging Errors in Production Systems (betanews.com)

Posted by msmash on Thursday November 3, 2016 @08:00AM from the not-sure-if-bad-thing dept.

According to a new study, 43 percent of app developers spend between 10 and 25 percent of their time debugging application errors discovered in production. BetaNews adds: The survey carried out by ClusterHQ found that a quarter of respondents report encountering bugs discovered in production one or more times per week. Respondents were also asked to identify the most common causes of bugs. These were, inability to fully recreate production environments in testing (33 percent), interdependence on external systems that makes integration testing difficult (27 percent) and testing against unrealistic data before moving into production (26 percent). When asked to identify the environment in which bugs are most costly to fix, 62 percent selected production as the most expensive stage of app development to fix errors, followed by development (18 percent), staging (seven percent), QA (seven percent) and testing (six percent).

5 of 167 comments (clear)

Min score:

Reason:

Sort:

inability to fully recreate production environment by MooseTick · 2016-11-03 08:09 · Score: 5, Insightful

This is due to finance cheaping out and not allowing the purchase of an exact "test" system to work on. Also, the rush to production is often more important than checking to be sure it all works.
That said, its all a risk/reward thing. Maybe its often better to screw up production here and there than to spend tons of money and time on testing. It all depends if you're building software for a web site or a Mars mission. What is the impact of a failure, and is it recoverable?

--
Ninjas don't carry tic tacs
Most common causes of bugs? by Anonymous Coward · 2016-11-03 08:15 · Score: 4, Insightful

How is "management telling people to put it into production as soon as the basic functionality works" not one of the common causes of bugs? At almost every job I've worked at, QA and Engineering would say "We need this much time to test and fix bugs before launch", and management would say "Too bad! Sales already told someone we're launching tomorrow, so we're going live with whatever we have then!"
It isn't the lack of a good test environment, or good test data, it's being told by management that you aren't going to have any time to test...
Been there, done that as an intern... by __aaclcg7560 · 2016-11-03 08:23 · Score: 4, Interesting

I did a six-month contract as an software tester internship after college, where I came across a crash bug on the test server that I could reproduced 100% of the time. My supervisor could not reproduced the bug, and approved the patch for production server. The production server crashed immediately from the patch. Engineers determined that a major code rewrite was required to fix the underlying problem. The production server was offline for three days and cost the company $250K in lost revenue. My contract wasn't renewed, one-third of the division got laid off after I left, and further budget cuts doomed the project. As for my supervisor, he got promoted into management.
Re:inability to fully recreate production environm by ghoul · 2016-11-03 10:51 · Score: 4, Interesting

Where I used to work - big telco software firm whose software generates 80% of the phone bills in the US we had a simple solution to the problem of testing to scale.
We had two identical setups one for production and one for staging. After UAT was almost over we would deploy to staging and then continue UAT on the staging with real world data till the day of cutover (Use Oracle Active-Passive to keep both in sync for the production data while not copying over UAT data to prod)
On cutover day we would change the network switch to now point to the new setup and run scripts to delete the dat created by UAT.
The nice part was now the Prod setup (a bank of 8 servers with 4 quad core CPUS each) was now our backup machine. We would switch it to passive and continue to keep it in sync with prod for at least 7 days. If something horrible went wrong with the new setup. Changing back to the earlier prod machine was a network switch flip. The scripts were a little more difficult this time over especially if the software bug had messed up the data but it was still easy.
Once a production was stable the old prod was now used as staging for the next prod.
What this meant is we did UAT on machines with identical config as the prod machines . It solved a lot of issues and since we also used the machines as the prod backup machine during cutover the cost was taken from the operations budget and not the testing budget.
Our System test and UAT environments were almost as good but not as good as prod and most testing and UAT was done there but the last batch of UAT on the big iron gave good confidence and made cutover day a lot less stressfull than it used to be.

--
**Life is too short to be serious**
Something is missing here by cerberusss · 2016-11-03 18:55 · Score: 4, Informative

App developer here.
Something is missing here; namely we spend more time debugging issues found in production, because they get reported. Almost every app nowadays has a crash logger that reports all crashes. Libraries like Twitter's Crashlytics are awesome like that. You get all crashes reported to you, including a ring buffer of the last 100 log messages. It's really, really awesome and I've solved problems in production that wouldn't ever be found normally.

--
8 of 13 people found this answer helpful. Did you?