Anonymous No More: Your Coding Style Can Give You Away
itwbennett writes Researchers from Drexel University, the University of Maryland, the University of Goettingen, and Princeton have developed a "code stylometry" that uses natural language processing and machine learning to determine the authors of source code based on coding style. To test how well their code stylometry works, the researchers gathered publicly available data from Google's Code Jam, an annual programming competition that attracts a wide range of programmers, from students to professionals to hobbyists. Looking at data from 250 coders over multiple years, averaging 630 lines of code per author their code stylometry achieved 95% accuracy in identifying the author of anonymous code (PDF). Using a dataset with fewer programmers (30) but more lines of code per person (1,900), the identification accuracy rate reached 97%.
1985 Hugo Winner
Really, the fact that coding style is recognizable was so well known it made it into pop culture 30 years ago.
Also, on the smaller sample size the program might just be recognizing the parts of the style that come from the corporate standards. It would be interesting to see if it could recognize code from people who all work at the same company.
You obviously haven't had to work in an environment where code has to be certified. I can tell you from first hand experience that coding in an RTCA DO-178B environment or similar has some pretty strict adherence to some very pedantic and strict coding requirements. You'll find this type of development in avionics systems (both civilian and military) as well as other industries like medical electronics where code safety is literally life-and-death.
Outside of that type of environment, I do agree with you. You'd be lucky if even half of the developers have seen a company coding standard. You'd be hard pressed to find any developers who really adhere to it even when they know the document exists. But in those small niche markets, you'd be surprised at how strictly they adhere to arbitrary coding standards (whether they really impact code quality or safety or not).