Coding Styles Survive Binary Compilation, Could Lead Investigators Back To Programmers (princeton.edu)
An anonymous reader writes: Researchers have created an algorithm that can accurately detect code written by different programmers (PDF), even if the code has been compiled into an executable binary. Because of open source coding repositories like GitHub, state agencies can build a database of all developers and their coding styles, and then easily compare the coding style used in "anti-establishment" software to detect the culprit. Despite all the privacy implications this research may have, the algorithm can also be used by security researchers to track down malware authors.
We also discussed an earlier phase of this research.
Going to be lots of false positives on this one.
People have been analyzing writing styles for a long time to try to identify authors. Expecting your coding style to be obfuscated by compiling it has proven to be as wrong as thinking your identity is shielded if you publish under a pseudonym. If you make your code publicly available you really shouldn't have any expectation of privacy.
I'm a consultant - I convert gibberish into cash-flow.
Aren't we being tracked enough as it is?
Why for fucks sake why?
My new years resolution will to remove all my code from all public repositories.
Good luck when your programmer pool is a couple of thousand and your samples consist out of obfuscated and underhanded software which is often produced by malware creators.
If you RTFA it seems their sample size was 20 programmers. Occasionally they went up to 100 and they're getting something like 60-80% accuracy. BFD.
Guys - when you've sampled the compiled, optimised binary output (with all debug info stripped) of a million coders all using different compilers on different architectures and are getting at least a 99% accuracy rate, get back to us. In the meantime, I'm sure you'll get some nice marks from your supervisors but I won't be losing any sleep.