Slashdot Mirror


Anonymous No More: Your Coding Style Can Give You Away

itwbennett writes Researchers from Drexel University, the University of Maryland, the University of Goettingen, and Princeton have developed a "code stylometry" that uses natural language processing and machine learning to determine the authors of source code based on coding style. To test how well their code stylometry works, the researchers gathered publicly available data from Google's Code Jam, an annual programming competition that attracts a wide range of programmers, from students to professionals to hobbyists. Looking at data from 250 coders over multiple years, averaging 630 lines of code per author their code stylometry achieved 95% accuracy in identifying the author of anonymous code (PDF). Using a dataset with fewer programmers (30) but more lines of code per person (1,900), the identification accuracy rate reached 97%.

7 of 220 comments (clear)

  1. No Kidding by invid · · Score: 4, Insightful

    I can usually tell who wrote the code in the office by whether or not they put a space after their ifs: if(i == 0) vs if (i == 0); where they put their brackets, whether or not they replace their tabs with spaces, how they deal with bools: if (!var) vs if (var == false) and several other telling signs. There are so many combinations of variations no two programmers in the office (about 12 of us) have the same style.

    --
    The Moore-Murphy Law: The number of things that will go wrong will double every 2 years.
  2. Re:Demonstrates the need... by Anonymous Coward · · Score: 5, Insightful

    This is why people need to follow style guides, so that all source code is styled the same.

    There's a damn good chance 95% of coders are not criminals, nor would they care if someone identified their code.

    That said, this will become a legal nightmare is when this kind of profiling can be used to frame another coder.

    And with the laws wanting to treat any "hacker" as a potential terrorist these days, the consequences of even being accused can be rather severe to deal with.

  3. Re:Welcome to the party by Virtucon · · Score: 4, Insightful

    It's all about style. Writing software is very creative and it needs to have the authors fingerprints on it somewhere. If corporations don't like that they can suck the source code into a parser and spit out perfectly mundane crap that loses the intonation and the thoughts the original developer had for it.

    --
    Harrison's Postulate - "For every action there is an equal and opposite criticism"
  4. Re:Demonstrates the need... by Impy+the+Impiuos+Imp · · Score: 5, Insightful

    You want scary? The same can be applied to general text on the Internet, tying posters on different sotes together, including anonymous (not your real name avatar) to a site with your real name.

    Which the NSA probably has churning away on its databases. Which probably does little more than add confirmation of said links from watching and recording all traffic to any and all of a billion IP addresses.

    And I, for one, welcome our new panopticon overlords who won't abuse it, not one of their thousand agents, because they're supposed to check a got-a-warrant box on a piece of paper before choosing to abuse it.

    --
    (-1: Post disagrees with my already-settled worldview) is not a valid mod option.
  5. Most programming isn't new code by jgotts · · Score: 3, Insightful

    Most programming isn't writing new code. Most programming is working on someone else's crap you inherited. Invariably, you're going to be using that person's style or else the result will look like garbage.

    There is also the problem that most non-trivial code is worked on by multiple people at the same time.

    Writing some code from scratch as an assignment is a very artificial exercise nowadays, unless you're in a classroom setting. Therefore, you're going to get a signature from a programmer doing atypical work.

  6. Re:Up next, automatic intelligence rating... by lgw · · Score: 4, Insightful

    For lack of mod points let me just say: beautiful!

    It's like this in any engineering discipline:
    * The apprentice doesn't do things by the book, for he thinks himself clever
    * The journeyman does everything by the book, for he has learned the world of pain the book prevents
    * The master goes beyond the book, for he understand why every rule is there and no longer needs the rules

    Or put another way - the apprentice thinks he knows everything, the journeyman known how little he knows, the master knows everything in the field, and still knows how little he knows.

    --
    Socialism: a lie told by totalitarians and believed by fools.
  7. Re:Can they do it with corporate code? by war4peace · · Score: 3, Insightful

    *raising hands slowly* Is there a problem, Coding Officer?

    --
    ...gis sdrawkcab (usually not responding to ACs; don't bother posting as AC)