Researchers Test Developer Biometrics To Predict Buggy Code
rjmarvin writes: Microsoft Research is testing a new method for predicting errors and bugs while developers write code: biometrics. By measuring a developer's eye movements, physical and mental characteristics as they code, the researchers tracked alertness and stress levels to predict the difficulty of a given task with respect to the coder's abilities. In a paper entitled "Using Psycho-Physiological Measures to Assess Task Difficulty in Software Development," the researchers summarized how they strapped an eye tracker, an electrodermal sensor and an EEG sensor to 15 developers as they programmed for various tasks. Biometrics predicted task difficulty for a new developer 64.99% of the time. For a subsequent tasks with the same developer, the researchers found biometrics to be 84.38% accurate. They suggest using the information to mark places in code that developers find particularly difficult, and then reviewing or refactoring those sections later.
Every time I hear about a terrifyingly invasive means of "improving performance" its targeted at developers. Is it just selection bias, or does the world actually hate us?
the researchers summarized how they strapped an eye tracker, an electrodermal sensor and an EEG sensor
I'm sure being hooked up to something similar to a polygraph doesn't make a developer more stressful at all. Was the fact that they had all this equiptment hooked up to them a factor in their statistics?
I'm writing a networking multithreaded program in ASM that need to use very little memory, hence complicated registry magic and blatant violation of calling conventions.
The sensors are going to freak the fuck out.
Now my boss is going to be watching the developer's eye movements instead of testing code... This will not end well.
There is no magic bullet and where this might find the sections of code that your developer finds difficult to understand, it still isn't going to give you any idea about the quality of the code they produce. All you will know is how hard they concentrated when producing it.
I remember when we watched SLOC, but it was of marginal value. Then it was logical edges and complexity which was sometimes useful, but not always. Now they want to use biometrics to figure out how complex I find my code? It won't be any more helpful than complexity was.
Keeping code understandable and bug free has always been about naming identifiers, formatting, comments and using standard patterns and TESTING it as much as possible. All these golden bullets will only end up in your foot if you choose to use them....
"File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
How about instead of spending a small fortune solving the handful of bugs caused by programmer typos, you spend that money on better requirements gathering, keeping specifications from changing constantly, and giving programmers time to actually unit-test and document their code?
If you want some fancy tech so you can write a paper on it, make an "electro-stimulus behavior moderation band", strap it onto clients/managers, and give them fifty thousand volts whenever they say or do something stupid.
When your developers cringe at a project, when they encounter a subroutine or callback that literally makes them groan, you've found exactly what needs to be refactored. if you find a python wrapper around a godforsaken class, or find explitives cursing a dead gods name in a forgotten universe, thats the code that needs your attention. Project managers, section leaders, whoever has direct line-of-sight communication with the dev pit needs to pay more attention.
the problem is 'refactoring' is a lie. as a DevOps (christ i hate that fucking word) engineer, I've been faced with rotting festering codebases for years in my career on a daily basis. the issue is business priorities interfering with good coding practices. I and 2 junior devs might want to go rip up a few thousand lines of horror-code to make everyone more productive, but we get denied. why?:
1. downtime is unacceptable for this application. this code controls so much, does so many things, and is so obscure (say it with me, payments processing subsystem) that to do ANYTHING to it is literally worse than pistol whipping the CEO's daughter.
2. New New NEW! we need to get in those swim lanes and stand up in those scrums nice and straight so we can deliver optimum ROI to our dear customers! who cares if the system crashes 5 times a month because this module is satans petrified asscrack, google just launched their new $app so our new $cloud_app_pro needs to go live NOW!.
3. we had the resources, but uber elite coders in our ranks were ganked to other projects months ago. they havent seen the code in 3 months, and we're sure they'll be along to help us again once they put in their 2 weeks and show up in flip flops for the knowledge transfer.
4. you were ganked from the refactor project and are now plugging away at an irritating new web 9.0 cash money matic piece of code that marketing wont stop skullfucking and your boss cant deliver fast enough. Catch this rabbit though and you'll be able to sit down and think through...wait....what was the refactoring project about again? oh christ is that CVS?
what this technology will get used for
efficiency sampling in your dev groups. eye tracking and biometrics will now subtly be included in SCRUM/ITIL/six sigma/devops/management wankfest.
Good people go to bed earlier.
They tested 16 developers and gave statistics with four significant figures. I think you would need to test at least 100,000,000 developers to get such precise measurements. Who do they think they are? Dr. Spock on Star Trek?
We don't see the world as it is, we see it as we are.
-- Anais Nin
The core implication here is that developers are the source of mistakes, and those mistakes must be minimized. Never mind that developers are also a source of productivity and innovation, and that dehumanizing them decreases both.
Should measure the interaction between developers and product managers before coding even starts - see how the developer responds to impossible requests, contradictory requirements, and meaningless buzzword filled descriptions...
Microsoft Research should also track how far the individual is working away from the main office of his company, because that has far more of an effect on bugs than any biometric reading. I recommend that they develop a special laser and a series of geostationary satellites and ground repeater stations. The total round trip time of the laser pulse will be a measure of how buggy the developer's code is.
1) Microsoft Research is wasting an awful lot of money to conclude that the reason why Microsoft's software is so terrible is that it's being written by outside companies in India.
2) Microsoft's well-paid American and European programmers are producing good or at least above average code, as would be expected.
3) Quality evaporates when they use foreign and H1B workers, who are educated in substandard universities, inexperienced in engineering, and/or do not have English superliteracy. [I've discussed elsewhere that basic language literacy is not enough to be a good programmer---you need to have enough language expertise to communicate without any ambiguity whatsoever to write good code because of the essential nature of interpersonal communication. By the same token, if you're writing software for Indians speaking Hindi your entire team should be Hindi-speaking Indians.]
They tested 16 developers and gave statistics with four significant figures. I think you would need to test at least 100,000,000 developers to get such precise measurements. Who do they think they are? Dr. Spock on Star Trek?
Naw, they just used a really accurate ruler, made each measurement 10 times and averaged their results...
You make an excellent point. There is no indication in the fine article about how accurate their results could be statistically, and given their really small sample size it doesn't seem likely 4 significant digits is justified.
"File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
well duh, that vein on my forehead starts to throb and my eye starts to twitch when I read dumb code.
oh wait, that was my commit
the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff
Using the same methodology, I'd be interesting to analyze differences in performance/productivity of developers. I'd expect to see something like normal or log-normal curve.
Example?
That's where we're at?
I see this garbage all the time... and I never understand it. Growing up my father ran a factory and was damned good at it. His people showed up on time, did great work, had low scrap and were very productive. How did he manage this amazing feat? Minders that followed you everywhere in the factory? Discreet blood samples taken hourly? No...
They had stats. That's it. You produce X parts per week. Go way above that get a bonus, go way bellow that, you get fired. If everyone is getting bonuses the stats would rise... if everyone was getting low stats they'd first check for procedural problems that might be hindering peoples work and baring that they'd lower their expectations. It worked marvelously well... people would think of ways to go faster and bring them up... because it meant a bonus until everyone else caught on. Anything that made production harder was immediately reported because people wanted their bonus.
Damn near every successful factory in the country works this way. Do the same thing for code. What my father always said was "They could spend 7hrs in the bathroom every day... I don't care... if they can come out at hit 1000% efficiency that last hour to make up the difference it's fine with me. But they better keep in mind their peers are going to eventually figure out how they pulled it off and change the curve."
1) Arrogance. You know that average developers have a hard time with some kinds of code, but you're a superprogrammer, and you don't have those problems. If someone decides later that there's something wrong with your code, well, they should've gotten their requirements straightened out before they told you to go and build it. The only time you lose your cool is when you have to deal with idiot managers, analysts, or users.
1) Complacency. You've been pounding on this code forever, and you just don't care any more. Yeah, there'll be bugs, people will yell, they'll get fixed. That's just the way development goes. Why get worked up about it?
I think it the extra bogus precision has something to do with the conversion between Imperial and Metric developers.
(Oh, and something something dark side.)
lol what the f*ck!
:p, everyone should bow down and say "amen you are right" :s. Stupidities
I quote from the Microsoft research paper:
"In this paper, we investigate a novel approach to classify the difficulty of code comprehension tasks using data from psycho-physiological sensors. We present the results of a study we conducted with 15 professional programmers to see how well an eye-tracker, an electrodermal activity sensor, and an electroencephalography sensor could be used to predict whether developers would find a task to be difficult."
15 developers is enough to reach a conclusion???
Note that the paper is about investigating the difficulty of code comprehension, not how bug are introduced, while it may be indirectly linked, it is not for a fact linked, it is just an assumption, there's no such direct relationship.
Also each developers codes differently and requires different working conditions, and feels different emotions while coding, how the heck can they assume that EEG results will show any similarity?
This is a bullshit research wasting time and money. Probably some kind of marketing feat that will be used to claim Microsoft no longer makes bugs in its code or just some sort of conspiracy to turn programmers into zombies hooked to machines.
Lol blame the programmer for the bug when bugs occurs for a lot more reasons than just one human doing the coding, building a program is more than just about a programmer.
Anyways, it is a research paper
While I agree the extra precision is misleading, the real problem is that they don't provide confidence intervals. Even a rounded 80% could be misleading if it could fall between 60% and 90% due to sparse data.
On the other hand, if in their report details they said "84.38% with a 99% confidence interval of 72.27% to 94.49%", then the extra precision is no longer necessarily misleading (it is just the calculational result of the model used) and, although it is a little pedantic and redundant, I would have no fundamental problem with it. It might even be argued that it is infinitesimally more precise, allowing the calculations to be confirmed by an independent researcher. However, for presentation in summary form, it would be much better to say "between 72% and 94% with 99% confidence".
I'm truly sorry, but an IT union isn't happening, until at least my generational cohort is out of the system.
A. Too many libertarians.
B. Too many people convinced of their own prowess and respect
C. None of us are at much physical risk
D. We get quite a bit more than a living wage, in general
Those factors add up to an insurmountable barrier, even if I personally think the idea is wise.
To be hooked up to some device while we work to measure how likely we are to wind up at the top of the stack rank. It could be completely automated, if it determines you just wrote sucky code it generates a pink slip email and a robot carries you out the door. Instant better code!
I highly doubt you can write a non-trivial C or C++ program without bugs, or really any language for that matter. I'm not talking about thousands of lines either. 100 or so should do it. The fact that you don't mention any kind of requirements spec, perhaps with the aid of some CASE tools, or at least a testing and feedback method, coupled with the fact that you think that you can do it "in your head" makes it clear that you have no idea how to develop code.
Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
While this information can be useful to software developers to help them optimize their own performance, it will likely prove detrimental to provide these metrics to managerial or production supervisors, as they typically only choose in intensifying the workload to improve efficiency. I would like to see more people tested and less emphasis on productivity, and more emphasis on how we can use this data to improve the experience of coding for the programmer. That would probably result in better code more readily than objectifying the software engineers through the use of data that might not speak to the experience of the vast majority of programmers.
Once again we have some big sister/brother company/government claiming that they can do the impossible with biometric data. They don't address the primary source of the problems, which you lay out in detail.
Why was security skimped on in the code? Funding.
Why did funding get dropped? So that someone could get a bonus.
Who was the person that had the demo code for security? Canned to save budget.
Can't our Outsource code it? Not in their contract or business statement.
None of those issues are the coders fault, and this is the majority of our "shitty" code today. Piles and piles of shit so that someone in the management chain (or several someones) can get bonuses/raises/justify their existence in a company.
I'll give an alternate method of finding better targets for biometric scanning. Randomly sample executive and management emails. If you can win buzzword bingo in 2 or less random emails, you have a valid target. Build a "shifty eye" detector into power point, and there ya go!
-The wise argue that there are few absolutes, the fool argues that there are no probabilities.
They can understand how a toilet is cleaned, how a sale is made, how a 1099 is filled out, how a fire drill works, how a sandwich is put together, how oil is changed, etc... but Coding might as well be a dark art.
Disclaimer: I am in hardware myself and may completely miss the point here. However, our software/firmware folks do agile programming involving dividing programming problems into pieces which are assigned to programmers, followed-up on large whiteboards and being daily discussed in "scrum meetings" etc. (I may be confusing some concepts here but that is of less importance). The point being that your statement, that programming is some sort of unique dark-art-which-cannot-be-measured-by-managers, appears untrue to me and, honestly, rather pedantic. What these guys are doing is quite measurable. Maybe not by a silly measure like "lines of code", but by the measure of number of problems being solved, having a complexity that apparently everyone of them agrees on.
Indeed, the CEO doesn't know the exact details of how this works, but neither does he personally count the number of cleaned toilets.
..how many swear words are in the comments.
Is there any company on earth that treats its customers with more contempt than Microsoft?
Comcast? AT&T? Anyone associated with the MPAA/RIAA?
-The wise argue that there are few absolutes, the fool argues that there are no probabilities.
2) Bugs
Agreed.
When talking with non-developers about developers, I use the simile that developers are like novelists, who work out stories in their heads, and commit those stories to paper.
A novel contains a set of symbols which, taken collectively, and written correctly, form an impressive body of knowledge that can change the world. (Tolstoy's "War and Peace" is my usual example.)
But if the symbols are faulty -- if the book is badly written, if the grammar and spelling are faulty -- then the book will fail to sell, fail to make its point, fail to change the world.
-kgj
*Microsoft* is working on intrusive software to predict buggy code? I can do that without software - just point at the Microsoft campus, and any and all products....
mark
So, Microsoft Research has developed a method to tell when a programmer is in a condition that tends to create bugs. That's nice. What happens with this?
I already know when I'm in a condition that tends to create bugs. It won't help there. It could be passed on to others, such as management.
Now, is management going to take action to reduce the amount of time I'm more vulnerable to causing bugs, by improving the office environment or discouraging overtime or making reasonable deadlines? Or is management going to find this a good evaluation tool and ding my performance reviews if I'm spending too much time in that condition? Is excessive stress going to become a thought crime?
The only way this would be useful is if management recognized that they needed to reduce buggy time and were rewarded partly on that basis. Anybody confident their employer will do that?
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
It's simple: Just install a program on your developers' computers that tracks how often (how many times in general, and for how long) the developer switches focus away from their IDE. If they're constantly googling, looking up reference docs or algorithms, etc., chances are they are doing something that's new, untested, uncharted territory for them. If they're just rattling off hundreds of SLOC at a time, while only needing IntelliSense as an aide, chances are most of it will work on the first attempt.
Programmers who use books made of real, physical paper foil this test and should be summarily fired.