Researchers Test Developer Biometrics To Predict Buggy Code
rjmarvin writes: Microsoft Research is testing a new method for predicting errors and bugs while developers write code: biometrics. By measuring a developer's eye movements, physical and mental characteristics as they code, the researchers tracked alertness and stress levels to predict the difficulty of a given task with respect to the coder's abilities. In a paper entitled "Using Psycho-Physiological Measures to Assess Task Difficulty in Software Development," the researchers summarized how they strapped an eye tracker, an electrodermal sensor and an EEG sensor to 15 developers as they programmed for various tasks. Biometrics predicted task difficulty for a new developer 64.99% of the time. For a subsequent tasks with the same developer, the researchers found biometrics to be 84.38% accurate. They suggest using the information to mark places in code that developers find particularly difficult, and then reviewing or refactoring those sections later.
Every time I hear about a terrifyingly invasive means of "improving performance" its targeted at developers. Is it just selection bias, or does the world actually hate us?
Thank goodness they are using this to double check the right parts of the code instead of trying to do something to make the developer's life a little more conducive to doing good work.
the researchers summarized how they strapped an eye tracker, an electrodermal sensor and an EEG sensor
I'm sure being hooked up to something similar to a polygraph doesn't make a developer more stressful at all. Was the fact that they had all this equiptment hooked up to them a factor in their statistics?
I'm writing a networking multithreaded program in ASM that need to use very little memory, hence complicated registry magic and blatant violation of calling conventions.
The sensors are going to freak the fuck out.
Now my boss is going to be watching the developer's eye movements instead of testing code... This will not end well.
There is no magic bullet and where this might find the sections of code that your developer finds difficult to understand, it still isn't going to give you any idea about the quality of the code they produce. All you will know is how hard they concentrated when producing it.
I remember when we watched SLOC, but it was of marginal value. Then it was logical edges and complexity which was sometimes useful, but not always. Now they want to use biometrics to figure out how complex I find my code? It won't be any more helpful than complexity was.
Keeping code understandable and bug free has always been about naming identifiers, formatting, comments and using standard patterns and TESTING it as much as possible. All these golden bullets will only end up in your foot if you choose to use them....
"File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
How about instead of spending a small fortune solving the handful of bugs caused by programmer typos, you spend that money on better requirements gathering, keeping specifications from changing constantly, and giving programmers time to actually unit-test and document their code?
If you want some fancy tech so you can write a paper on it, make an "electro-stimulus behavior moderation band", strap it onto clients/managers, and give them fifty thousand volts whenever they say or do something stupid.
When your developers cringe at a project, when they encounter a subroutine or callback that literally makes them groan, you've found exactly what needs to be refactored. if you find a python wrapper around a godforsaken class, or find explitives cursing a dead gods name in a forgotten universe, thats the code that needs your attention. Project managers, section leaders, whoever has direct line-of-sight communication with the dev pit needs to pay more attention.
the problem is 'refactoring' is a lie. as a DevOps (christ i hate that fucking word) engineer, I've been faced with rotting festering codebases for years in my career on a daily basis. the issue is business priorities interfering with good coding practices. I and 2 junior devs might want to go rip up a few thousand lines of horror-code to make everyone more productive, but we get denied. why?:
1. downtime is unacceptable for this application. this code controls so much, does so many things, and is so obscure (say it with me, payments processing subsystem) that to do ANYTHING to it is literally worse than pistol whipping the CEO's daughter.
2. New New NEW! we need to get in those swim lanes and stand up in those scrums nice and straight so we can deliver optimum ROI to our dear customers! who cares if the system crashes 5 times a month because this module is satans petrified asscrack, google just launched their new $app so our new $cloud_app_pro needs to go live NOW!.
3. we had the resources, but uber elite coders in our ranks were ganked to other projects months ago. they havent seen the code in 3 months, and we're sure they'll be along to help us again once they put in their 2 weeks and show up in flip flops for the knowledge transfer.
4. you were ganked from the refactor project and are now plugging away at an irritating new web 9.0 cash money matic piece of code that marketing wont stop skullfucking and your boss cant deliver fast enough. Catch this rabbit though and you'll be able to sit down and think through...wait....what was the refactoring project about again? oh christ is that CVS?
what this technology will get used for
efficiency sampling in your dev groups. eye tracking and biometrics will now subtly be included in SCRUM/ITIL/six sigma/devops/management wankfest.
Good people go to bed earlier.
They tested 16 developers and gave statistics with four significant figures. I think you would need to test at least 100,000,000 developers to get such precise measurements. Who do they think they are? Dr. Spock on Star Trek?
We don't see the world as it is, we see it as we are.
-- Anais Nin
if(SLOC>0)
{
buggy = true;
}
The core implication here is that developers are the source of mistakes, and those mistakes must be minimized. Never mind that developers are also a source of productivity and innovation, and that dehumanizing them decreases both.
Should measure the interaction between developers and product managers before coding even starts - see how the developer responds to impossible requests, contradictory requirements, and meaningless buzzword filled descriptions...
Microsoft Research should also track how far the individual is working away from the main office of his company, because that has far more of an effect on bugs than any biometric reading. I recommend that they develop a special laser and a series of geostationary satellites and ground repeater stations. The total round trip time of the laser pulse will be a measure of how buggy the developer's code is.
1) Microsoft Research is wasting an awful lot of money to conclude that the reason why Microsoft's software is so terrible is that it's being written by outside companies in India.
2) Microsoft's well-paid American and European programmers are producing good or at least above average code, as would be expected.
3) Quality evaporates when they use foreign and H1B workers, who are educated in substandard universities, inexperienced in engineering, and/or do not have English superliteracy. [I've discussed elsewhere that basic language literacy is not enough to be a good programmer---you need to have enough language expertise to communicate without any ambiguity whatsoever to write good code because of the essential nature of interpersonal communication. By the same token, if you're writing software for Indians speaking Hindi your entire team should be Hindi-speaking Indians.]
They tested 16 developers and gave statistics with four significant figures. I think you would need to test at least 100,000,000 developers to get such precise measurements. Who do they think they are? Dr. Spock on Star Trek?
Naw, they just used a really accurate ruler, made each measurement 10 times and averaged their results...
You make an excellent point. There is no indication in the fine article about how accurate their results could be statistically, and given their really small sample size it doesn't seem likely 4 significant digits is justified.
"File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
well duh, that vein on my forehead starts to throb and my eye starts to twitch when I read dumb code.
oh wait, that was my commit
the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff
"This code was written while stoned"
Using the same methodology, I'd be interesting to analyze differences in performance/productivity of developers. I'd expect to see something like normal or log-normal curve.
Example?
That's where we're at?
I see this garbage all the time... and I never understand it. Growing up my father ran a factory and was damned good at it. His people showed up on time, did great work, had low scrap and were very productive. How did he manage this amazing feat? Minders that followed you everywhere in the factory? Discreet blood samples taken hourly? No...
They had stats. That's it. You produce X parts per week. Go way above that get a bonus, go way bellow that, you get fired. If everyone is getting bonuses the stats would rise... if everyone was getting low stats they'd first check for procedural problems that might be hindering peoples work and baring that they'd lower their expectations. It worked marvelously well... people would think of ways to go faster and bring them up... because it meant a bonus until everyone else caught on. Anything that made production harder was immediately reported because people wanted their bonus.
Damn near every successful factory in the country works this way. Do the same thing for code. What my father always said was "They could spend 7hrs in the bathroom every day... I don't care... if they can come out at hit 1000% efficiency that last hour to make up the difference it's fine with me. But they better keep in mind their peers are going to eventually figure out how they pulled it off and change the curve."
They tested 16 developers and gave statistics with four significant
figures. I think you would need to test at least 100,000,000
developers to get such precise measurements. Who do they think
they are? Dr. Spock on Star Trek?
Mr. Spock.. "but it couldn't possibly matter less". ;-)
1) Arrogance. You know that average developers have a hard time with some kinds of code, but you're a superprogrammer, and you don't have those problems. If someone decides later that there's something wrong with your code, well, they should've gotten their requirements straightened out before they told you to go and build it. The only time you lose your cool is when you have to deal with idiot managers, analysts, or users.
1) Complacency. You've been pounding on this code forever, and you just don't care any more. Yeah, there'll be bugs, people will yell, they'll get fixed. That's just the way development goes. Why get worked up about it?
Errors happen whether you are stressed and/or find a particular problem challenging, or when you are cruising and/or oblivious to the actual complexity of the problem you are facing.
Not to mention it will make everyone nervous just thinking that their vitals are taken while they work, since this has the potential of becoming a performance goal. Lead: "You need to relax, we end up reviewing every single line of code you write" Poor developer: "Could you point me to the nearest suicide machine?"
Skynet
I think it the extra bogus precision has something to do with the conversion between Imperial and Metric developers.
(Oh, and something something dark side.)
>Arrogance
The problem is that it's often justified. The level of skill can vary wildly between the beginners who struggle to learn the simplest things, spamming ever forum so much that you can't help but wonder if that's the norm, and people who come up with jaw dropping solutions on the spot like it's natural.
It's easy to spot "stupid" people, and there seems to be tons of them, but it's hard to notice when someone is smarter than you, because good code is the standard and the occasion to write excellent code isn't always there, therefore one would tend to think than he's smarter than the average.
To sum it up, I have no idea what I'm talking about and should really just stop posting.
lol what the f*ck!
:p, everyone should bow down and say "amen you are right" :s. Stupidities
I quote from the Microsoft research paper:
"In this paper, we investigate a novel approach to classify the difficulty of code comprehension tasks using data from psycho-physiological sensors. We present the results of a study we conducted with 15 professional programmers to see how well an eye-tracker, an electrodermal activity sensor, and an electroencephalography sensor could be used to predict whether developers would find a task to be difficult."
15 developers is enough to reach a conclusion???
Note that the paper is about investigating the difficulty of code comprehension, not how bug are introduced, while it may be indirectly linked, it is not for a fact linked, it is just an assumption, there's no such direct relationship.
Also each developers codes differently and requires different working conditions, and feels different emotions while coding, how the heck can they assume that EEG results will show any similarity?
This is a bullshit research wasting time and money. Probably some kind of marketing feat that will be used to claim Microsoft no longer makes bugs in its code or just some sort of conspiracy to turn programmers into zombies hooked to machines.
Lol blame the programmer for the bug when bugs occurs for a lot more reasons than just one human doing the coding, building a program is more than just about a programmer.
Anyways, it is a research paper
While I agree the extra precision is misleading, the real problem is that they don't provide confidence intervals. Even a rounded 80% could be misleading if it could fall between 60% and 90% due to sparse data.
On the other hand, if in their report details they said "84.38% with a 99% confidence interval of 72.27% to 94.49%", then the extra precision is no longer necessarily misleading (it is just the calculational result of the model used) and, although it is a little pedantic and redundant, I would have no fundamental problem with it. It might even be argued that it is infinitesimally more precise, allowing the calculations to be confirmed by an independent researcher. However, for presentation in summary form, it would be much better to say "between 72% and 94% with 99% confidence".
All of this metrics do not get to the real issues and some cases lead to people to gaming the system / people just doing the minimum not get fired or have there 6 bosses tell them about each time they mess up.
Also Scope creep, deadlines, and more lead to errors
Sure, it is possible to see which parts of the code are trivial, and which parts require actual thinking. You don't need fancy machinery for that, anyone can see the difference. But try to apply the fancy machinery to a good programmer actually trying to solve a real life problem! Suddenly it is not about what lines they stare most at, it is about what else they google, what they grep for in the library headers, where they go for similar problems, how they test various hypothesis about the problem...
And the really, really good programmers, they do the same kind of troubleshooting. But knowing their stuff that well, they think about those problems before writing the first line of code, and do not run into this kind of problems so often. Try to find an eye-tracker or any other magic to pinpoint the mistakes those guys just don't make!
They miss the cause of the stress: IT is terrible and incomplete specs. The errors are found when the specs are completed AFTER the coding and sometimes after the code MUST to be put into production.
Shove some electrodes into the Project Managers and Spec writers.
All this talk about improving performance is from the people who brought you Windows 8, which is the worst step -backwards- in user productivity and interface in the history of microcomputers.
Let's see, it's a completely new way of interacting with PCs. Ah yes, Win8 is designed to only be used effectively with computers having primary interaction with users with a large touch screen. Then this touch-based OS is -forced- onto all new laptops: few if none of them having touch screen interface.
A completely new user-interface on a machine that only reason for existence is to improve its user's productivity. Where UI designers have been telling and teaching anyone who would/could listen that user productivity is maximized by allowing the user's to develop their own custom interface and to allow them to move this interface between platforms. As if anyone at Microsoft gives a shit about your productivity. As far as they're concerned, they are the only experts that matter. And they don't know shit about user interfacing.
For example, having a message box that pops up in the middle of the screen and then forces you to bend your wrist into a carpel-tunnel position in order to click on a tiny button to get rid of the message box. You can't just click on any button, with the mouse pointer anywhere to get rid of the message box. Which is the only thing any user wants to do immediately when they get a stupid fucking message box. This, after 25 years of focused research on interactive GUI operating systems.
Nobody should take anything that the Redmond Retards say seriously. Especially when they claim that it came from expert research. (i.e. blown out some idiot's ass).
Is there any company on earth that treats its customers with more contempt than Microsoft?
To be hooked up to some device while we work to measure how likely we are to wind up at the top of the stack rank. It could be completely automated, if it determines you just wrote sucky code it generates a pink slip email and a robot carries you out the door. Instant better code!
While this information can be useful to software developers to help them optimize their own performance, it will likely prove detrimental to provide these metrics to managerial or production supervisors, as they typically only choose in intensifying the workload to improve efficiency. I would like to see more people tested and less emphasis on productivity, and more emphasis on how we can use this data to improve the experience of coding for the programmer. That would probably result in better code more readily than objectifying the software engineers through the use of data that might not speak to the experience of the vast majority of programmers.
Once again we have some big sister/brother company/government claiming that they can do the impossible with biometric data. They don't address the primary source of the problems, which you lay out in detail.
Why was security skimped on in the code? Funding.
Why did funding get dropped? So that someone could get a bonus.
Who was the person that had the demo code for security? Canned to save budget.
Can't our Outsource code it? Not in their contract or business statement.
None of those issues are the coders fault, and this is the majority of our "shitty" code today. Piles and piles of shit so that someone in the management chain (or several someones) can get bonuses/raises/justify their existence in a company.
I'll give an alternate method of finding better targets for biometric scanning. Randomly sample executive and management emails. If you can win buzzword bingo in 2 or less random emails, you have a valid target. Build a "shifty eye" detector into power point, and there ya go!
-The wise argue that there are few absolutes, the fool argues that there are no probabilities.
They can understand how a toilet is cleaned, how a sale is made, how a 1099 is filled out, how a fire drill works, how a sandwich is put together, how oil is changed, etc... but Coding might as well be a dark art.
Disclaimer: I am in hardware myself and may completely miss the point here. However, our software/firmware folks do agile programming involving dividing programming problems into pieces which are assigned to programmers, followed-up on large whiteboards and being daily discussed in "scrum meetings" etc. (I may be confusing some concepts here but that is of less importance). The point being that your statement, that programming is some sort of unique dark-art-which-cannot-be-measured-by-managers, appears untrue to me and, honestly, rather pedantic. What these guys are doing is quite measurable. Maybe not by a silly measure like "lines of code", but by the measure of number of problems being solved, having a complexity that apparently everyone of them agrees on.
Indeed, the CEO doesn't know the exact details of how this works, but neither does he personally count the number of cleaned toilets.
..how many swear words are in the comments.
If their skull explodes while stressing out over the code.. bloody 'X' marks the Spock.
Is there any company on earth that treats its customers with more contempt than Microsoft?
Comcast? AT&T? Anyone associated with the MPAA/RIAA?
-The wise argue that there are few absolutes, the fool argues that there are no probabilities.
2) Bugs
The more the MS business units do of this stuff, the more of Bill Gates' heard stolen warchest they'll be eating into. Yes Microsoft, we want more of this sort of insane shit please! Next time, be sure to sample 1000 developers for at least a week each!
Agreed.
When talking with non-developers about developers, I use the simile that developers are like novelists, who work out stories in their heads, and commit those stories to paper.
A novel contains a set of symbols which, taken collectively, and written correctly, form an impressive body of knowledge that can change the world. (Tolstoy's "War and Peace" is my usual example.)
But if the symbols are faulty -- if the book is badly written, if the grammar and spelling are faulty -- then the book will fail to sell, fail to make its point, fail to change the world.
-kgj
*Microsoft* is working on intrusive software to predict buggy code? I can do that without software - just point at the Microsoft campus, and any and all products....
mark
So, Microsoft Research has developed a method to tell when a programmer is in a condition that tends to create bugs. That's nice. What happens with this?
I already know when I'm in a condition that tends to create bugs. It won't help there. It could be passed on to others, such as management.
Now, is management going to take action to reduce the amount of time I'm more vulnerable to causing bugs, by improving the office environment or discouraging overtime or making reasonable deadlines? Or is management going to find this a good evaluation tool and ding my performance reviews if I'm spending too much time in that condition? Is excessive stress going to become a thought crime?
The only way this would be useful is if management recognized that they needed to reduce buggy time and were rewarded partly on that basis. Anybody confident their employer will do that?
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
It's simple: Just install a program on your developers' computers that tracks how often (how many times in general, and for how long) the developer switches focus away from their IDE. If they're constantly googling, looking up reference docs or algorithms, etc., chances are they are doing something that's new, untested, uncharted territory for them. If they're just rattling off hundreds of SLOC at a time, while only needing IntelliSense as an aide, chances are most of it will work on the first attempt.
Programmers who use books made of real, physical paper foil this test and should be summarily fired.