Anonymous No More: Your Coding Style Can Give You Away
itwbennett writes Researchers from Drexel University, the University of Maryland, the University of Goettingen, and Princeton have developed a "code stylometry" that uses natural language processing and machine learning to determine the authors of source code based on coding style. To test how well their code stylometry works, the researchers gathered publicly available data from Google's Code Jam, an annual programming competition that attracts a wide range of programmers, from students to professionals to hobbyists. Looking at data from 250 coders over multiple years, averaging 630 lines of code per author their code stylometry achieved 95% accuracy in identifying the author of anonymous code (PDF). Using a dataset with fewer programmers (30) but more lines of code per person (1,900), the identification accuracy rate reached 97%.
Can they do it with corporate code where there are naming and style standards in abundance, and code reviews to ensure those guidelines are followed?
I do not fail; I succeed at finding out what does not work.
If your coding is terrible and very newbie like, they can't single you out since your code is similar to the ocean of other terrible coders.
So if you are a paranoid freak, the best way to ensure your safety and keep the government off your back is to write terrible code.
Priest: "Universe from nothing, no laws of physics, sped up time"+ huge discrepancies. Creationism? No. Big Bang Theory
Write a version of pretty-printer that rerenders your code into a different style.
Have a lexicon of mipelled words for each "personality".
Another lexicon of variable names.
a vs inta vs int_a vs x.
Refactoring and unfactoring for subroutines.
Run the comments through google translate and back to english.
ukrainian
japanese
chinese
Synonym and antonym substitution in the comments.
The mind dances at the possibilities to mess with this algorithm.
She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.