Slashdot Mirror


Anonymous No More: Your Coding Style Can Give You Away

itwbennett writes Researchers from Drexel University, the University of Maryland, the University of Goettingen, and Princeton have developed a "code stylometry" that uses natural language processing and machine learning to determine the authors of source code based on coding style. To test how well their code stylometry works, the researchers gathered publicly available data from Google's Code Jam, an annual programming competition that attracts a wide range of programmers, from students to professionals to hobbyists. Looking at data from 250 coders over multiple years, averaging 630 lines of code per author their code stylometry achieved 95% accuracy in identifying the author of anonymous code (PDF). Using a dataset with fewer programmers (30) but more lines of code per person (1,900), the identification accuracy rate reached 97%.

220 comments

  1. Can they do it with corporate code? by msobkow · · Score: 5, Interesting

    Can they do it with corporate code where there are naming and style standards in abundance, and code reviews to ensure those guidelines are followed?

    --
    I do not fail; I succeed at finding out what does not work.
    1. Re:Can they do it with corporate code? by Marginal+Coward · · Score: 4, Funny

      It seems like using the applicable features of the corporate version control system would be a lot easier - and possibly even better than 95% accurate.

    2. Re:Can they do it with corporate code? by Anonymous Coward · · Score: 1

      If your corporate code base has commits from anonymous developers, you're doing something wrong. If it doesn't, and you need this sort of analysis to determine who wrote a section of code, you're doing something wrong.

    3. Re:Can they do it with corporate code? by Penguinisto · · Score: 2

      That's what "git blame" is for...

      /me ducks and runs like hell...

      --
      Quo usque tandem abutere, Nimbus, patientia nostra?
    4. Re:Can they do it with corporate code? by TitusC3v5 · · Score: 1

      It's not just limited by corporate code. Good luck doing this on pep8 Python.

      --
      And the masses cried out, "09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0!"
    5. Re:Can they do it with corporate code? by dark.nebulae · · Score: 2

      I've always found that even with style guidelines in place, developers will still leave their fingerprints all over it.

      Some devs will be verbose in their comments, some less. Some devs will embrace IoC where others shun it. Some devs will create a single method with all code in it, some will refactor the heck out of it with many methods. Heck, devs can't even agree sometimes on what should be public, protected, and private (and rarely will style guidelines dictate this kind of thing).

    6. Re:Can they do it with corporate code? by MouseTheLuckyDog · · Score: 1

      They are talking about the corporate code as a baseline to compare to the anonymous code.

    7. Re:Can they do it with corporate code? by Anonymous Coward · · Score: 0

      You are kidding me right? No corporation (I don't care who it is) actually ensures that every 'standard' they have is followed, nor should they.

      So the quick answer is yes.

    8. Re:Can they do it with corporate code? by Anonymous Coward · · Score: 0

      It's not just limited by corporate code. Good luck doing this on pep8 Python.

      What a ludicrous idea. If my code's most distinctive feature is its style guide, then I feel like an illiterate buffoon.

    9. Re:Can they do it with corporate code? by grimmjeeper · · Score: 4, Informative

      You obviously haven't had to work in an environment where code has to be certified. I can tell you from first hand experience that coding in an RTCA DO-178B environment or similar has some pretty strict adherence to some very pedantic and strict coding requirements. You'll find this type of development in avionics systems (both civilian and military) as well as other industries like medical electronics where code safety is literally life-and-death.

      Outside of that type of environment, I do agree with you. You'd be lucky if even half of the developers have seen a company coding standard. You'd be hard pressed to find any developers who really adhere to it even when they know the document exists. But in those small niche markets, you'd be surprised at how strictly they adhere to arbitrary coding standards (whether they really impact code quality or safety or not).

    10. Re:Can they do it with corporate code? by Anonymous Coward · · Score: 0

      I wonder if there are a significant number of professional developers who (illegally) work under an assumed name.

    11. Re:Can they do it with corporate code? by jellomizer · · Score: 4, Interesting

      Perhaps not as well. If people are following the coding standards for the organization then the code for the most part looks far more similar.

      When I am working with a development team, I will tend to adjust my unique style to better match what everyone else is doing. Even if it means doing coding methods that I will normally disagree with.

      If the code tends to use a bunch of Goto's instead of Procedures or classes. I will use those GOTO not for my benefit, but for people who will maintain my code later on, so they won't have to change their mindset and debugging strategies to see what the program is doing to do future corrections.

      I will go full Object Oriented if the group of people that I am working with do their coding full OO.

      My personal style would be more procedural, than OO. Not due to lack of knowledge or not realizing OO advantages and disadvantages. But if I am to code on my own, I code in the way that My Mind handles the requirements, and how I feel would be easier for me to change and fix my code in the future.

      I think this method is best for ID based on personal code, vs group corporate code, where a lot of your particular style is hidden.

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    12. Re:Can they do it with corporate code? by Anonymous Coward · · Score: 0

      You are kidding me right? No corporation (I don't care who it is) actually ensures that every 'standard' they have is followed, nor should they.

      You've never worked at a AAA game studio.

    13. Re:Can they do it with corporate code? by ShanghaiBill · · Score: 1

      If it doesn't, and you need this sort of analysis to determine who wrote a section of code, you're doing something wrong.

      With pair programming, you may have two programmers sharing a keyboard, and alternating writing chunks of code.

      I can usually look at a section of code, and reliably know which of my coworkers wrote it, even when they follow the style guidelines. Do they use an if-else chain, or a switch statement? Do they use #define's or prefer enums? Bitfields, or masks? Often I can tell who wrote it just by looking at the comments. Some people are neurotic about grammar and using complete sentences. Others prefer minimally concise fragments.

    14. Re:Can they do it with corporate code? by AK+Marc · · Score: 1

      Even if they build up a database of 100% of written code, how can they identify me if I only copy and paste code from others?

    15. Re:Can they do it with corporate code? by rtb61 · · Score: 1

      Just curious, how are larger companies going with algorithm libraries and variable naming rules to ensure maximum re usability of code (variables named by function rather than named by application). Any change, is most of it done from scratch, any fancy algorithm data bases with search functions based upon algorithm descriptors and software engineering. Also things like software language translators or the same algorithms stored in different languages. Any shift away from writing code to more assembling algorithms that can expanded or reduced and snapped together.

      --
      Chaos - everything, everywhere, everywhen
    16. Re:Can they do it with corporate code? by Anonymous Coward · · Score: 0

      Hey now, RC doesn't pay you to surf slashdot! Get back to your test cases! :)

    17. Re:Can they do it with corporate code? by grimmjeeper · · Score: 1

      RC doesn't pay me at all. I haven't worked there for over 15 years now.

    18. Re:Can they do it with corporate code? by wolrahnaes · · Score: 1

      Similarly I was thinking this would probably be defeated by a "minifier", obfuscator, or anything along those lines. There are dozens to choose from for most languages and it would be trivial for anyone attempting to remain anonymous to use them on their releases.

      If you want the code to remain usable, there are tools to enforce a standard style instead, in which case just set it up with rules based on a popular project if your language of choice doesn't have a specific style. At that point you're down to comments and variable names. Don't get fancy with either and I'd bet the identifiability would go down significantly.

      --
      I used to get high on life, but I developed a tolerance. Now I need something stronger.
    19. Re:Can they do it with corporate code? by war4peace · · Score: 3, Insightful

      *raising hands slowly* Is there a problem, Coding Officer?

      --
      ...gis sdrawkcab (usually not responding to ACs; don't bother posting as AC)
    20. Re:Can they do it with corporate code? by Anonymous Coward · · Score: 0

      Who cares if they can do it with corporate code? I doubt that's the hard problem. Much harder: they haven't yet shown they can do it with an executable file, just source code.

    21. Re:Can they do it with corporate code? by Anonymous Coward · · Score: 0

      You are kidding me right? No corporation (I don't care who it is) actually ensures that every 'standard' they have is followed, nor should they.

      Corporations don't but some managers do.

    22. Re:Can they do it with corporate code? by rubycodez · · Score: 1

      "legal" of course meaning adhering to rules written and ratified by a group of power and money grubbing politicians in the pockets of large corporations.

    23. Re: Can they do it with corporate code? by Anonymous Coward · · Score: 2, Funny

      Drats! I was.sure that.everyone else wrote.stuff.like "if(user == 'dumbfuck"){exit 666};

    24. Re:Can they do it with corporate code? by bhcompy · · Score: 2

      Why is it illegal?

    25. Re:Can they do it with corporate code? by Gorobei · · Score: 1

      Can they do it with corporate code where there are naming and style standards in abundance, and code reviews to ensure those guidelines are followed?

      I was starting to wonder about that, then realized we at $BIGCORP are already generating ASTs from your input buffer, unifying those trees with a bunch of patterns, and telling your editor to flag questionable constructs. You type "if not foo in x" and 50ms later you get a proposed improved snippet. It's pretty rare to see quirky style in our codebase.

    26. Re:Can they do it with corporate code? by aliquis · · Score: 1

      Or what about in the real world than the numbers of "coders" are 1 000 times more?

      It's likely 1 out of how many?

      Also if everyone just replace all function and variable names with a, b, c, d .. after how soon they occur and put it all on one line?

    27. Re:Can they do it with corporate code? by Mr+Z · · Score: 2

      Did you read the part in the article where they're actually doing the matching based on the ASTs (abstract syntax trees), and so are able to identify authors even after the code goes through an obfuscator? Relevant quotes:

      Their real innovation, though, was in developing what they call “abstract syntax trees” which are similar to parse tree for sentences, and are derived from language-specific syntax and keywords. These trees capture a syntactic feature set which, the authors wrote, “was created to capture properties of coding style that are completely independent from writing style.” The upshot is that even if variable names, comments or spacing are changed, say in an effort to obfuscate, but the functionality is unaltered, the syntactic feature set won’t change.

      Accuracy rates weren’t statistically different when using an off-the-shelf C++ code obfuscators. Since these tools generally work by refactoring names and removing spaces and comments, the syntactic feature set wasn’t changed so author identification at similar rates was still possible.

      Regarding the first quote: The author of the article probably didn't realize that ASTs aren't a new thing; it's just this application of ASTs that's new. ASTs are as old as the hills. I learned about them from the Dragon Book, and by the time that was written they were old hat.

    28. Re:Can they do it with corporate code? by Anonymous Coward · · Score: 0

      If you do pair programming you have even bigger problems. I can't think of anything more unproductive and annoying.

    29. Re:Can they do it with corporate code? by s.petry · · Score: 1

      It's not just these type of environments that are strict. Well established companies have the same practices, because the only way to have controlled growth is to adhere to a set of standards. Sure, standards change over time but not quickly. For posterity, controlled does not imply restricted.

      --

      -The wise argue that there are few absolutes, the fool argues that there are no probabilities.

    30. Re:Can they do it with corporate code? by Dashiva+Dan · · Score: 1

      I can tell who wrote it just by looking at the comments

      Yeah, my first thought on this was "how accurate would it be if you a) stripped out comments, and b) ran through a code formatter (many code editors auto-formatting to a standard on the fly)"

      I think including comments is basically cheating, as they're super distinguishable. You can tell what code I've worked on cause I consistently type "teh", spell words like "colour" with my local spelling, etc. But recognising just the actual code itself, that's more impressive.

      --
      "lt;dr" is the correct response to most of my posts.
    31. Re:Can they do it with corporate code? by hcs_$reboot · · Score: 1

      Indeed. During a Google Code Jam contest, one has to be fast and the prog has to be fast also! During the contest, a lot of devs 1. don't use the language they would normally use for other programs 2. use tons of Defines to accelerate typing 3. don't care at all about readability, maintenance, code-style and the like. That makes the whole program unique in a way, a kind of signature, but hard to read. That identification algo would have a much harder time to identify devs based on corporate programs.

      --
      Slashdot, fix the reply notifications... You won't get away with it...
    32. Re: Can they do it with corporate code? by Anonymous Coward · · Score: 0

      like the ternary operator, use spaces or tabs, structure their logging for analyzing the logs vs analyzing a single invocation, white space preferences around operators, ability to pick multi character variables that can be seen in errors and grep the source to find them, so they code defensively against the user and test the edge cases, do they have tests, does their code suggest they have ever used the debugger or are they happy printing to the logfiles.

    33. Re: Can they do it with corporate code? by Anonymous Coward · · Score: 1

      I've already narrowed you down to a web developer

    34. Re: Can they do it with corporate code? by Anonymous Coward · · Score: 0

      He has a public website, yanno. It's not like it's hard to find details on him. He's going about this privacy thing the wrong way.

    35. Re:Can they do it with corporate code? by Anonymous Coward · · Score: 0

      And this, folks, is why Duke Nukem took Forever.

    36. Re:Can they do it with corporate code? by Anonymous Coward · · Score: 0

      I think the purpose of doing this is for fighting copyright infringement rather than playing the blame game.

      For example, I know when a website has stolen my javascript code or my html code because I know how distinct my coding styles are. I have no problem proving it.

      But the same thing applies to comments on websites. I make no effort to disguise my commenting style, but I do know what aspects of my written language are unique to me and I can identify comments I made years ago anonymously on slashdot.

      Your average person has a bias towards certain lengths of words and certain pronouns when assigning anthropomorphic properties to inanimate objects. For less intelligent people, the absence of spell-checking, over use of colloquialisms, slang, particularly regional slang is a dead giveaway.

      More than a decade ago, I correctly identified the author of a "fake" persona on livejournal, but to prove it to the other person I actually put a webbug in my comment on their post so I could verify their ip address. I then confronted the individual and they absolutely didn't know how to respond to computer forensics.

      The thing is, there's not a lot of practical use for de-anonymizing people unless you are seeking blame, or seeking restitution.

    37. Re:Can they do it with corporate code? by Anonymous Coward · · Score: 0

      I prefer the "not-invented-here avoidance" principle.

      Many many many many... many... times you start a new project, or join someones project, only to see everyone write their own damn framework, factory, wrapper, over whatever exists or was written by someone else. Why the hell do people do this? Because they want an abstraction layer between their coding preferences and the "inferior" preferences of the code they have to interface with.

      This is the dumbest thing programmers do. Next to abstracting every little thing into classes for the sake of them existing. I can agree with GET/PUT type of functions in classes, along with constructors/destructors, because these put things neatly into containers that make sense. What I can't get on board with is the constant "classname extends classname" and overloaded functions. These things confuse the hell out of people because their only purpose 80% of the time is to work-around something that another coder did earlier, instead of revising the function. These functions are supposed to exist in libraries to retain compatibility rather than creating a mess for every possible mistake made earlier.

      Measure twice, cut once.

    38. Re:Can they do it with corporate code? by Anonymous Coward · · Score: 0

      Still, even using rigorous code/style standards such as MISRA, many different programming styles still exist. For example do you use if/else if/else if or a switch statement? Or wether to use while(1) or for(;;) for the task loop on an RTOS? Do you prefer enums over defines? Do you use #ifdef or if(ENABLE_SOMETHING)? (compiler optimizes it out if enable something is defined to 0).

    39. Re: Can they do it with corporate code? by Anonymous Coward · · Score: 0

      Because the 1% says so. Do you have anything to say about it, citizen? We're watching you.

    40. Re:Can they do it with corporate code? by Anonymous Coward · · Score: 0

      Meanwhile, you seem to assume that all AAA game studios enforce coding standards.

    41. Re:Can they do it with corporate code? by RabidReindeer · · Score: 1

      I could do it with corporate code without any analytical software at all.

      One guy I know consistently introduced bugs because he didn't understand assembly language (ironically, he was an assembly language bigot).

      Another caused people to complain because he never coded a subroutine where he could simply cut-and-paste code. And that was in a shop with all sorts of standards.

      Then there are the comments (or lack of them) and their distinctive, but not always professional observations.

      So definitely.

    42. Re:Can they do it with corporate code? by RabidReindeer · · Score: 2

      A sonnet has strict rules, too.

      But I'd wager that someone could tell one of Shakespeare's from one of yours.

    43. Re:Can they do it with corporate code? by Kiwikwi · · Score: 1

      Can they do it with corporate code where there are naming and style standards in abundance, and code reviews to ensure those guidelines are followed?

      Presumably, yes. Style guides are 95% formatting, and if one RTFA (I know, I know), they look only at the structure of the parsed AST, not variable names, comments and whitespace. From the article:

      Accuracy rates weren’t statistically different when using an off-the-shelf C++ code obfuscators. Since these tools generally work by refactoring names and removing spaces and comments, the syntactic feature set wasn’t changed so author identification at similar rates was still possible.

      Since they look at code structure, they've even found identifying patterns that survive compilation and end up in the binary.

      This is one of the coolest data mining results I've seen in quite a while.

    44. Re:Can they do it with corporate code? by Anonymous Coward · · Score: 0

      we don't do avionics or anything else that is anywhere close to life and death, but we have coding standards that are required of the developers (it isn't just life and death, no user likes software to quite working unexpectedly and we don't want our servers hacked from bad coding). There was initially a lot of squawking ("how can I express my inner feelings") but after a couple of years even those who had resisted most were compliant. Still, having done lots of code review I can generally tell which of the programmers wrote a section of code (without looking over at the left where crucible identifies them).

      That said, I think this study is bollocks. In short, the smaller the number of coders in consideration the easier it is to determine which programmer wrote any given section of code. Analyzing 1000 programmers and identifying code only reached 95% and this can only be expected to go down when the set size increases. Most programmers do not have an individual style so much as fall into a style category. And as far as I can tell programmers are not consistent (I try to be, but on review have found my code sometimes deviating significantly in simple ways).

      Put another way, consider facial recognition technology. If you have a small set of individuals under consideration then a small amount of learning permits high precision identification. Expand the set (and it doesn't take very much) then misidentification becomes the norm.

      Work along the lines of this study may be used as supplementary evidence to establish the identity of a programmer, but is not likely to ever result in what TFS insinuates (particularly the headline)

    45. Re:Can they do it with corporate code? by jellomizer · · Score: 1

      Most companies don't.
      If it gets to a point where your program is changing its programming language for its code, chances are the entire workflow process will be evaluated, and will be coded from the start up. If there isn't a change in workflow, then there isn't a good need to change how the program is written, and they will just code the legacy system, in the style of the time.

      However your Old COBOL or Fortran system is being migrated to a newer platform, the new workflow means a lot of these cool tricks back then may be so simplified down to a built in language class, so that module that took weeks to perfect may be just as easy as x.dothis()

      --
      If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    46. Re: Can they do it with corporate code? by HornWumpus · · Score: 1

      My code tell would be comments threatening to break all of other coder's fingers. In extreme cases toes also, so the bastard can't code with his feet.

      --
      John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
    47. Re: Can they do it with corporate code? by Anonymous Coward · · Score: 0

      LOL. what a fucking joke. you must be a systems or embeded programmer. get off your fucking high horse. ive seen all types of coders from app coders to kernel coders copy, refactor and reuse code all over stackexchange.

      so once again, fuck off with this i am better than you attitude. your a prick.

  2. Next thing you know by Anonymous Coward · · Score: 0

    Next thing you know, they'll be able to use this text to determine my real /. login.

    1. Re:Next thing you know by Anonymous Coward · · Score: 2, Funny

      Why would they even bother with an algorithm to process your ramblings? Every time I see you post, I instantly think "oh here's this jerk again".

    2. Re:Next thing you know by Anonymous Coward · · Score: 0

      The real question though is whether I'm actually talking to myself and why...

    3. Re:Next thing you know by Mordok-DestroyerOfWo · · Score: 1

      I hate following your rambling, Anonymous Coward. Sometimes you get extremely schizophrenic and contradict yourself!

      --
      "Never let your sense of morals prevent you from doing what is right" - Salvor Hardin
    4. Re:Next thing you know by Anonymous Coward · · Score: 0

      No I donut.

    5. Re:Next thing you know by Anonymous Coward · · Score: 0

      Obviously, I am not.

      Banana.

    6. Re:Next thing you know by Anonymous Coward · · Score: 0

      I agree completely. No, I couldn't disagree more.

      Here's what I keep wondering: why do they let me keep posting here when I get marked to "-1" so consistently. It's almost like they keep me around as sort of post-er boy for exactly the sort of comments they don't want here...

    7. Re:Next thing you know by Anonymous Coward · · Score: 0

      Hammock

    8. Re:Next thing you know by Anonymous Coward · · Score: 1

      Ever since that corpus callosotomy, I try to remember to type in nice things with my left hand but then my right hand logs in and mods it down...

  3. Demonstrates the need... by JonSchell · · Score: 1

    This is why people need to follow style guides, so that all source code is styled the same.

    1. Re:Demonstrates the need... by Anonymous Coward · · Score: 5, Insightful

      This is why people need to follow style guides, so that all source code is styled the same.

      There's a damn good chance 95% of coders are not criminals, nor would they care if someone identified their code.

      That said, this will become a legal nightmare is when this kind of profiling can be used to frame another coder.

      And with the laws wanting to treat any "hacker" as a potential terrorist these days, the consequences of even being accused can be rather severe to deal with.

    2. Re:Demonstrates the need... by Impy+the+Impiuos+Imp · · Score: 5, Insightful

      You want scary? The same can be applied to general text on the Internet, tying posters on different sotes together, including anonymous (not your real name avatar) to a site with your real name.

      Which the NSA probably has churning away on its databases. Which probably does little more than add confirmation of said links from watching and recording all traffic to any and all of a billion IP addresses.

      And I, for one, welcome our new panopticon overlords who won't abuse it, not one of their thousand agents, because they're supposed to check a got-a-warrant box on a piece of paper before choosing to abuse it.

      --
      (-1: Post disagrees with my already-settled worldview) is not a valid mod option.
    3. Re:Demonstrates the need... by Anonymous Coward · · Score: 1

      The trouble with unbridled capitalism is that the government always ends up working on behalf of the powerful to preserve their status. De-embiggening government just means that less democratic power structures take its place. In the worst case, you have the state owning everything on behalf of its puppet-masters, like a single giant business ("state capitalism").

      As in all mature things, the solution is balance - an educated citizenry which trusts people to get on with with their own thing except when they end up with too much power. But who in power wants an educated citizenry? That part must be preserved from the bottom.

    4. Re:Demonstrates the need... by grimmjeeper · · Score: 1

      This is why people need to follow style guides, so that all source code is styled the same.

      Why does all code need to be styled the same?

      I can see a need in a safety critical environment like avionics or medical devices that needs strict adherence to rules to ensure that the code has been written correctly and with as few bugs as possible. But what difference does it make outside of that kind of environment? I mean, so what if there's a thousand different coding standards in the Chrome source? What difference does it really make?

    5. Re:Demonstrates the need... by Anonymous Coward · · Score: 0

      For many it is a problem of 'being in the zone'. Suddenly having to change styles pops them out. As they quickly start worrying about what it looks like. It does not fit their idea of 'code smell'. Even though the code is just fine. It is just different and that makes them do unnecessary work.

      When working in someone elses code I treat it as if I am visiting their house. I dont flop on the couch and crack open a cool one. I respect their property and write in the same style as them.

    6. Re:Demonstrates the need... by Anonymous Coward · · Score: 0

      No. What you need is an IDE or editor which allows code to be reformatted to a developer's particular tastes (which is, after all is what a style guide basically is - one developer or a group of developer saying "my personal taste is better than your personal taste". Style is subjective. You want the code to look that way? Fine, you can reformat my code when you check my code out, and I'll do the same when I check your code out.

      I stopped caring about the minutiae of coding style a few years back, as far as I'm concerned if you follow the following basic principles then your code is fine by me:
      1. Use consistent indentation, and be consistent about how you name identifiers.
      2. Comment your code to the level that someone who understands the business domain can follow what the heck you are doing
      3. Think about how you design your interfaces, and how other people will use your classes.
      4. Refactor (at a minimum apply the DRY principle, personally that's where I stop. Some people refactor more than that, and that's fine too.) ... and of course, that's just my subjective opinion too - mainly based around my personal experience that design issues are far harder to deal with simple implementation bugs.

    7. Re:Demonstrates the need... by Anonymous Coward · · Score: 0

      The trouble with unbridled capitalism is that the government always ends up working on behalf of the powerful to preserve their status.

      Is there a successful implementation of a different system where this is not the case?

    8. Re:Demonstrates the need... by EvilIdler · · Score: 1

      I wonder how this works for Go, where style is stricter, and people tend to use a formatting tool. Only the comments and naming schemes left to identify by, I guess.

    9. Re:Demonstrates the need... by harperska · · Score: 1

      Even when following a coding style guide 100%, there is still generally enough leeway to allow for plenty of personal style. There's the words you use to name things, use of whitespace and grouping of statements, basically everything about a piece of source code that's lost if you compile and then decompile a program. Just like the prose from two different authors are distinct from one other, even if they go through the same copy editor to fit a publisher's style guide. And if your corporate style guide requires your code to be indistinguishable from decompiled code, you need to find a new job.

    10. Re: Demonstrates the need... by Anonymous Coward · · Score: 0

      this. thank you for this insightful comment. I find myself doing the same thing.

      I also comment it that I wrote it and its not original from the author. If its a bug fix ill throw it to the authors upstream and let them patch it.

  4. Lol? by Anonymous Coward · · Score: 0

    Really? Code Jam? The clusterfuck of copy/paste crap?

    Go to something that has standardized formatting and enforced styled code, like Linux or even Qt and repeat your experiments.

    1. Re:Lol? by TWX · · Score: 2

      Heh. If it's effective in a clusterfuck of copy/paste, then it should be really effective when the bulk of the code is original...

      Sounds like the solution is to use an entirely different language than the bulk of one's work is in, if one wants to anonymously write malicious or otherwise legally complicated code.

      --
      Do not look into laser with remaining eye.
    2. Re:Lol? by Penguinisto · · Score: 1

      That kind of depends on the stylesheets, pre-compiler style enforcement routines, and the fact that a shit-ton of corporate code is often improved incrementally by multiple authors.

      'course, there's still the comments that you could use, but who does that?

      --
      Quo usque tandem abutere, Nimbus, patientia nostra?
  5. Useless by Anonymous Coward · · Score: 1

    Who releases source code without their name?
    Let me know when you can determine the author from just the binary...

  6. Up next, automatic intelligence rating... by TWX · · Score: 4, Funny

    ...based on the quality of that code...

    --
    Do not look into laser with remaining eye.
    1. Re:Up next, automatic intelligence rating... by halivar · · Score: 4, Funny

      goto blah;
      ^^ Idiot.

      // If you don't know why this is here, don't fuck with it.
      goto blah;

      ^^ Code guru.

    2. Re:Up next, automatic intelligence rating... by Tablizer · · Score: 1

      But readable code is often preferred over clever code by team members.

    3. Re:Up next, automatic intelligence rating... by lgw · · Score: 4, Insightful

      For lack of mod points let me just say: beautiful!

      It's like this in any engineering discipline:
      * The apprentice doesn't do things by the book, for he thinks himself clever
      * The journeyman does everything by the book, for he has learned the world of pain the book prevents
      * The master goes beyond the book, for he understand why every rule is there and no longer needs the rules

      Or put another way - the apprentice thinks he knows everything, the journeyman known how little he knows, the master knows everything in the field, and still knows how little he knows.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    4. Re:Up next, automatic intelligence rating... by Anonymous Coward · · Score: 1

      And the guru (guru > master) knows he knows nothing in his field, but still knows more than the corporate enterprise architect

    5. Re:Up next, automatic intelligence rating... by halivar · · Score: 1

      It's like jazz. You have to know know rules before you can break them.

    6. Re:Up next, automatic intelligence rating... by halivar · · Score: 1

      And, I accidentally repeated repeated a word.

    7. Re:Up next, automatic intelligence rating... by c · · Score: 1


      try { ...
            throw BlahException("blah");
      } catch(Exception& blah) { ...
      }
      ^^ Idiot.

      --
      Log in or piss off.
    8. Re:Up next, automatic intelligence rating... by ihtoit · · Score: 1

      if I were the programmer (I'm not, not since primary school when I programmed the TURTLE to draw stuff on large sheets of cartridge paper) I'd be dropping //remarks in everywhere. Back to when I did TURTLE programming, I got berated for wasting time on comments but when it came down to 1000+ lines of code, it was nice to know which draw routines drew what part of the image. My TURTLE St. Paul's Cathedral was 7,700+ lines of code, probably 3/4 of that was comments. If it were stripped of comments it'd probably have ended up way less than 2,000 lines but nobody (not even me) would've had the first clue about what drew what.

      --
      Political debates have me rolling my eyes so much I think I got optical whiplash. I should sue. - Foamy The Squirrel
    9. Re:Up next, automatic intelligence rating... by ranton · · Score: 1

      This doesn't seem so far fetched. I'm not sure the field of natural language processing is that far away from being able to create metrics which would determine the skill of developer by looking at their code. It could then be used by employers during the hiring process and during reviews.

      While that may sound like a nightmare scenario (and it very well could be), a more intelligent software system may even be able to show why it thinks the code is bad, and give an interviewer or reviewer the chance to ask why something was done. Taking 10,000 lines of code and narrowing it down to 100 lines that could help make the determination between good employee or bad employee could be useful.

      The big trick is how to train the system, since you would have to identify good and bad coders for supervised training. I doubt unsupervised training could do anything more than cluster like minded developers together. Although even that is useful, since you could identify a dozen good programmers manually and then have the system identify hundreds more by finding similar coding styles.

      --
      -- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
    10. Re:Up next, automatic intelligence rating... by gstoddart · · Score: 1

      // exception was found
      // beyond here be dragons, run
      // make your escape now
      goto blah;

      ^^ code master

      --
      Lost at C:>. Found at C.
    11. Re:Up next, automatic intelligence rating... by Anonymous Coward · · Score: 0

      You have been promoted from grammar nazi to grammar master.

    12. Re:Up next, automatic intelligence rating... by russotto · · Score: 2

      The guru knows the novice knows more than the corporate enterprise architect, but won't let on lest the novice get a more-swelled head.

    13. Re:Up next, automatic intelligence rating... by Anonymous Coward · · Score: 0

      And, I accidentally repeated repeated a word.

      It's all good. I feel the jazz.

    14. Re:Up next, automatic intelligence rating... by danknight48 · · Score: 1

      goto blah;
      ^^ Idiot.

      // If you don't know why this is here, don't fuck with it.
      goto blah;

      ^^ Code guru.

      Yep i hate them aswell, only ever had to use them once in coding. But there is a very rare case that goto is actually needed. Nested loops.
      http://pastebin.com/FBQMDBme

    15. Re:Up next, automatic intelligence rating... by Anonymous Coward · · Score: 0

      // If you don't know why this is here, don't fuck with it.
      goto blah;
      ^^ Code guru.

      A real code guru would write why the goto is there, instead of acting superior.

    16. Re:Up next, automatic intelligence rating... by cellocgw · · Score: 1

      And you still got the song wrong. It goes "to know, know, know you, is to love, love, love you..." See? Ya gotta repeat twice (yeah I'm a grammar pedant: say it 3 times is repeating twice :-) ) .

      --
      https://app.box.com/WitthoftResume Code: https://github.com/cellocgw
    17. Re:Up next, automatic intelligence rating... by atownsley · · Score: 1

      From someone who got his MSCS at Drexel, this is used to avoid someone copying code from the Internet and submitting it as their own. Sure people have tried to rename variables, and method / function names...but from what I have heard, they have been caught. The problem is the student didn't understand what they were doing, so structurally the code was the same, they just tried to change things. I don't know all he details of how it works, but the comparison is done a multiple levels (i.e. source, obj, and exe), and due to compiler optimizations patterns do emerge....

      None of the professors (at least in my experience) have a problem with students referencing outside source, figuring out what it does, and then writing THEIR OWN CODE to solve the problem.

      To lgw's post...
      * If you are a stupid apprentice trying to pass someone else's code as your own, it will catch you.
      * If you are a journeyman reading the book, trying to understand the concepts and use an internet code example for reference and then write your own code based your readings and the code example for guidance, you are probably going to be fine
      * If you are a master, why do you need examples...you are a master...go code the damn thing yourself...

    18. Re:Up next, automatic intelligence rating... by Altus · · Score: 1

      Yeah but it's really about all the words you don't repeat.

      --

      "In America, first you get the sugar, then you get the power, then you get the women..." -H. Simpson

    19. Re:Up next, automatic intelligence rating... by Anonymous Coward · · Score: 0

      A decent effort young disciple. You didn't get a reference to the season in but not bad otherwise. I would have also made the Goto line part of the haiku, but that's more down to personal coding style.

    20. Re:Up next, automatic intelligence rating... by ebvwfbw · · Score: 1

      You can' tune a tunafish, eh?

  7. Let's analyze the cyberspying code. by SeaFox · · Score: 1

    Using this technique, can they tell us if the NSA did write the Regin Malware now?

    1. Re:Let's analyze the cyberspying code. by Anonymous Coward · · Score: 0

      Fuck yeah they can!!.... But obviously they are the ones who wrote the code so they are not saying anything

    2. Re:Let's analyze the cyberspying code. by blackomegax · · Score: 1

      I want to see it run Regin against sections of code in gnu/linux/systemd and see if the same NSA shills wrote any of it.

  8. What about Bitcoin? by Anonymous Coward · · Score: 5, Funny

    Can we use this to find Satoshi?

  9. They might be cheating by Anonymous Coward · · Score: 0

    Are they just reading... // sample.c - takes slashdot comments and replaces troll comments with additional troll comments // by John Smith // v1.0.1

    or changelog.txt ?

  10. Shouldn't be hard to foil by SlideRuleGuy · · Score: 1

    With coding standards to follow, and tools that uniform-ify your code, it should be easier to anonymize it than with regular prose. And regular prose is apparently trivial to anonymize: see "Practical Attacks Against Authorship Recognition Techniques" by Michael Brennan and Rachel Greenstadt.

  11. This is true for /. comments, too. by Anonymous Coward · · Score: 1

    This has always been obvious.

    It's true for comments here, too. Only apk can craft a true apk comment. Others have tried, but they're never quite like the genuine thing.

    But we should be careful with such analysis, too. In some cases it can be totally wrong.

    There is a Slashdot-like site called Soylent News. There was once a guy over there who would claim that different posters were actually the same person, even when they weren't, and in some cases couldn't have been (one of the people he accused had died earlier).

    How did he "know" they were the same people? He said he had a "complex" algorithm that used bzip2 and a comparison of the size of the compressed comment text. Of course, his allegations were correct about 0% of the time.

    1. Re:This is true for /. comments, too. by ihtoit · · Score: 1

      there's a wiki site (can't remember the name) that takes great joy in posting accusations without attribution or evidence, and when called on them the Admins sit there and claim that the person who posted the slander is now the same person trying to get a retraction based on some sort of magic ring with a seekrit style decoder. Even when called out to post the evidence they claim to hold, they just dive straight in to claiming knowledge they can't possibly have for various reasons not least of which said claimed evidence not existing outside their imaginations.

      --
      Political debates have me rolling my eyes so much I think I got optical whiplash. I should sue. - Foamy The Squirrel
    2. Re:This is true for /. comments, too. by Anonymous Coward · · Score: 0

      Talk about unique style... Damn dude, did you run that paragraph through an obfuscator or something? I can't even parse it.

    3. Re:This is true for /. comments, too. by Anonymous Coward · · Score: 0

      Others have tried, but they're never quite like the genuine thing.

      confirmation bias

      APK

      This wouldn't have happened had you used the APK Hosts File Engine++.

      ...apk

  12. okay by Anonymous Coward · · Score: 0

    Newfags can't triforce

    1. Re:okay by lgw · · Score: 1

      Newfags can't triforce

      Slashdot supports too few entities to do this right, and forget about UTF8. But you can get sorta close.

        *
      * *

      Unless someone can do better?

      --
      Socialism: a lie told by totalitarians and believed by fools.
  13. No Kidding by invid · · Score: 4, Insightful

    I can usually tell who wrote the code in the office by whether or not they put a space after their ifs: if(i == 0) vs if (i == 0); where they put their brackets, whether or not they replace their tabs with spaces, how they deal with bools: if (!var) vs if (var == false) and several other telling signs. There are so many combinations of variations no two programmers in the office (about 12 of us) have the same style.

    --
    The Moore-Murphy Law: The number of things that will go wrong will double every 2 years.
    1. Re:No Kidding by Anonymous Coward · · Score: 0

      Just out of curiosity, why don't you have style guidelines in place at your office?

    2. Re: No Kidding by Anonymous Coward · · Score: 0

      Because we use JavaScript, which has no standard style, even within a single code base.

    3. Re:No Kidding by leonardluen · · Score: 1

      i could do the same. not only that but i could often also tell who had originally trained that person because often part of the trainers style often leaked into their style.

      i work at a university and we hire 100 level CS students. so we generally assumed they knew nothing and trained them from scratch.

    4. Re:No Kidding by Anonymous Coward · · Score: 0

      Because it doesn't matter?

    5. Re:No Kidding by invid · · Score: 1

      Actually they have recently introduced style cop, which enforces some things, but it ignores a number of discernible quirks.

      --
      The Moore-Murphy Law: The number of things that will go wrong will double every 2 years.
    6. Re:No Kidding by ThatsDrDangerToYou · · Score: 1
      Yeah, about that... I start twitching whenever my boss types: MyFunction (arg1, arg2) and so on. Who puts a space after the function name before the '('? People who must die, of course.

      OK, calming down now.. 1.. 2.. 3.. 4.. 5..

      No, I'm OK, really.

      I had an old boss who was a code style nazi. He was an asshole. And actually, my current boss is very cool, even if he codes like that.

    7. Re:No Kidding by Anonymous Coward · · Score: 0

      We use our IDE and the "auto format code" option to fix spacing and bracket placement before we check it in.

      Checkmate

    8. Re:No Kidding by Marginal+Coward · · Score: 1

      I once worked on a project that had a handful of developers, where each developer was in charge of one code for one of the software subsystems of the project. We didn't have much of a coding standard there - only about one page - but we ended up with a consensus coding style in the project that everybody could live with. Even so, you could always tell who wrote what by the personality shown around the edges of the coding style of a given module, function, or even over just a few lines.

    9. Re:No Kidding by Anonymous Coward · · Score: 0

      Kind of matters. I often find my code to be more readable (to me) than the code created by my subordinates. I'm pretty consistent with my use of white space, where I put my commas, how I indent, where I place my brackets, how I space out lines, and where I group lines together without spacing. I can tell at a glance where I'm declaring variables, doing a loop, making decision points, doing something complicated that requires close reading of the code, etc. When I read other's code, not so much. It's usually a problem of inconsistency: too much white space around things that don't matter, not enough around things that do, goofy situations where brackets are floating and it's visually unclear what they are connected to.

      Example: I find this much easier to read at a glance:

      if(!$variable) { doSomethingInteresting($variable); }

      than this:

      if( $variable == false )

      {

                doSomethingInteresting($variable);

      }

      Maybe it's just me.

    10. Re: No Kidding by Anonymous Coward · · Score: 1

      You should use an automatic JavaScript style tool, then. One of my favorites has a funny name - it's called "Obfuscator."

    11. Re:No Kidding by Anonymous Coward · · Score: 0

      It does matter. It makes reading the history of a change far harder than it should be because changes in syntax are jumbled in with changes in formatting. I rely on git blame and gitk several times a week to figure out why particular code is the way it is - I don't want to have to trawl through random formatting changes too.

    12. Re:No Kidding by AK+Marc · · Score: 1

      If the whitespace is meaningless, it should be eliminated (carriage returns excepted). However, I can understand people who add in meaningless whitespace, as some times a + b is easier to read than a+b, even if they are interpreted the same.

    13. Re:No Kidding by OSULugan · · Score: 1

      Should use if (false == var) to avoid incidental issues of assignment. A good pre-processor will catch the inadvertent assignment and flag it for repair, but it is a good practice to be in for code that you don't run through a pre-processor.

    14. Re:No Kidding by CannonballHead · · Score: 1

      So, you don't indent code? Or if you do, at what point is the indent meaningless (how many spaces/tabs) ... ? No spaces after semicolons? Or before/after braces? Or ...

      Readability should count as meaningful. It helps. And the compiler strips it out anyways, right, so ultimately it doesn't matter, just like comments, except in helping understand the code.

      I may be misunderstanding something completely in what you said... but I don't get why you would say it should be removed. Maybe in javascript for network performance reasons or something, but you should just minify or something in that case, because of variable and function name length and all that...

    15. Re:No Kidding by R3d+M3rcury · · Score: 1

      Actually, the one I hate is:

      if ($variable == false) {
            doSomethingInteresting($variable);
      }

      and one of my co-workers does:

      if ($variable == false)
            {
            doSomethingInteresting($variable);
            }

      Of course, my code is beautiful and everyone else's is terse and ugly and everyone should write code the same way that I do. Try suggesting that to a group of programmers and see how far it gets you. Generally, it's not worth the argument--you will waste tons of everyone's time trying to come up with an agreement.

      As the thread suggests, one advantage to different coding styles is that you can generally tell who wrote what and, if there seems to be a bug, you can track them down and tell them to fix it in that ugly mess. In our office, we have the rule that if you go around changing code style, you now own that code and are responsible for it. About the only issue we've run into is that people's styles evolve over time. So the guy right out of school may have a certain style that changes as he is exposed to more styles.

      My favorite story was when someone was trying to push variable naming standards. If it was a C string, the variable name should begin with "sz" (for string, zero terminated). I suggested that instead of doing that, maybe we should just put a dollar-sign at the end. Laughter ensued and that ended that.

    16. Re:No Kidding by PRMan · · Score: 1

      And in Visual Studio, I hit Ctrl+K Ctrl+D all the time, which puts my code into "Standard" Microsoft format. If everyone did this, I imagine the analyzer would drop to 50% or lower.

      --
      Peter predicted that you would "deliberately forget" creation 2000 years ago...
    17. Re:No Kidding by ihtoit · · Score: 2

      coding to book (sans comments) will kill the process of identifying authors stone dead, I think. If everybody's "Hello World!" was identical, how do you tell the difference?

      --
      Political debates have me rolling my eyes so much I think I got optical whiplash. I should sue. - Foamy The Squirrel
    18. Re:No Kidding by disambiguated · · Score: 1

      Use a diff tool that can ignore formatting changes. I'm a fan of Beyond Compare, but there are plenty of others.

    19. Re:No Kidding by disambiguated · · Score: 2

      Style guidelines should be about avoiding pitfalls of the language, using appropriate idioms, and not making life miserable for maintainers, not about where you put spaces and braces.

    20. Re:No Kidding by phantomfive · · Score: 1

      As the thread suggests, one advantage to different coding styles is that you can generally tell who wrote what and, if there seems to be a bug, you can track them down and tell them to fix it in that ugly mess. In our office, we have the rule that if you go around changing code style, you now own that code and are responsible for it. About the only issue we've run into is that people's styles evolve over time. So the guy right out of school may have a certain style that changes as he is exposed to more styles.

      git/cvs/svn/mercurial blame can tell you who wrote whatever code. Please tell me you are using some kind of source repository.......

      --
      "First they came for the slanderers and i said nothing."
    21. Re:No Kidding by AK+Marc · · Score: 1

      Indent isn't meaningless. But there's no reason to double-space an indent. It carries a reading meaning, related to nesting of code.

      Code "feels" smaller when it's compact. Also, having a single spacing method uniform across everyone makes for easier cut-and paste sharing. Having one person space things differently than another will result in decreased readability.

    22. Re:No Kidding by wasteoid · · Score: 1

      if (false == var) prevents accidentally assigning false to var if you forget to use double equals

    23. Re:No Kidding by burbilog · · Score: 1
      I can usually tell who wrote the code in the office by whether or not they put a space after their ifs: if(i == 0) vs if (i == 0); where they put their brackets, whether or not they replace their tabs with spaces, how they deal with bools: if (!var) vs if (var == false) and several other telling signs. There are so many combinations of variations no two programmers in the office (about 12 of us) have the same style.

      Can you do the same after indent -kr?..

    24. Re:No Kidding by Anonymous Coward · · Score: 0

      RTFA

    25. Re:No Kidding by Anonymous Coward · · Score: 0

      actually it also needs to dictate spaces and braces. Maybe you don't do code review or maintenance, but anything that facilitates that process saves the organization time. We do a lot of perl coding and use perl tidy with a defined set of options that all code is required to adhere to (in addition to other standards).

      Maybe you feel a need to express your individuality. If so, do it in your spare time and don't annoy the reviewers and maintainers.

  14. Not my Frankencode... by Anonymous Coward · · Score: 0

    ... a patchwork of open-source freebies.

    1. Re:Not my Frankencode... by Tablizer · · Score: 3, Funny

      ... a patchwork of open-source freebies.

      So, what's it like to work for FaceBook?

  15. That explains it by Tablizer · · Score: 2

    I suppose all those "// damn U bill gates!" comments gave me away

    1. Re:That explains it by Anonymous Coward · · Score: 0

      function calcMD5(str)
      { // rar oh no I am evolving
          return calcFreakMD5(str)
      }

      // delicious cookies, you must have some
      function getCookie( name ) { ...

      // finish me! Why won't you finish me father!
      function buildmenu_tabcontrol(id,tabarray,w,h,c,bc,bo){...

      There are so many stupid things littering my code.
      Some of it is just abandoned for so long it begins to cry for help.
      Some were made because I was lazy.

      // Hey, why not?
      function makeDIV(id, classname){...

      Hell naw am I typing that out all the time. Oh hey it is snowing.

  16. Welcome to the party by meerling · · Score: 2

    When I was a kid in the 80s we figured out we could identify who wrote a particular piece of software by looking at it's code. Those individualistic and identifiable features we used in the argument over programming being an art or a science when we wanted to support the art side.

    1. Re:Welcome to the party by Virtucon · · Score: 4, Insightful

      It's all about style. Writing software is very creative and it needs to have the authors fingerprints on it somewhere. If corporations don't like that they can suck the source code into a parser and spit out perfectly mundane crap that loses the intonation and the thoughts the original developer had for it.

      --
      Harrison's Postulate - "For every action there is an equal and opposite criticism"
    2. Re:Welcome to the party by Anonymous Coward · · Score: 0

      you are not an individual snowflake. Your code is to serve a function, not be a work of art. Come back when you have your source code presented in juried art shows.

  17. John Varley Press Enter by Crashmarik · · Score: 3, Informative

    1985 Hugo Winner

    Really, the fact that coding style is recognizable was so well known it made it into pop culture 30 years ago.

    Also, on the smaller sample size the program might just be recognizing the parts of the style that come from the corporate standards. It would be interesting to see if it could recognize code from people who all work at the same company.

  18. Vernor Vinge probably beat him to it by Crashmarik · · Score: 1

    But I can't recall an instance.

    1. Re:Vernor Vinge probably beat him to it by AJWM · · Score: 1

      Vinge is considered one of the fathers of cyberpunk because of his "True Names", which did precede Varley's chilling (and Hugo-winning) "Press Enter[]" (1981 vs 1985).

      On the other hand, Varley's much earlier (1976) "Overdrawn at the Memory Bank" was also one of the seminal works of the field.

      Been a while since I've read it, but the warlocks (hackers) in "True Names" would never have let their identity (true name) be determined from their coding styles.

      --
      -- Alastair
  19. Source of Future Data by Ronin+Developer · · Score: 1

    I guess we can expect that source code repositories will be scanned and processed. And, for code written by multiple authors, the modified code (from commits) will be scanned and indexed as well.

    But, I bet they will never figure out who writes the malware recently attributed to the three letter agencies. They should, however, be able to figure out which agency writes the stuff if they get a copy of the source code or maybe even from decompiling the binary.

    Additionally, if written from .NET, the CLR code can be reflected back to VB, C# or any other .NET language to retrieve the source code.

    1. Re:Source of Future Data by Anonymous Coward · · Score: 0

      Unless it went though a good obfuscater. Sure, you get some code back in the original language, but it will not resemble the original source and be almost impossible to follow.

    2. Re:Source of Future Data by Shados · · Score: 1

      Back in the days of .NET 1~2, decompiling via Reflector or whatever other tool got you back pretty good stuff. Today, there's a LOT more sugar, from LINQ to async/await and everything in between. If you go back to the original language, good decompilers sometimes infer what the original sugar was from the output following certain conventions and patterns...but moving that to another language will give you unreadable garbage.

      Reading F# in C# , this>but,worse>

    3. Re:Source of Future Data by Shados · · Score: 1

      Bah, formatter messed things up. The last line was me joking about the crazy nested generic chains that F# types end up looking like in a language that doesn't support the same syntax sugar.

  20. The key to this system being used is, ...... by Selur · · Score: 1

    "The key to this system being used is, of course, first obtaining the code stylometries for a wide range of developers. The authors didn't address how, say, a database of programmers’ styles would be compiled. Also, to identify the author of a piece code would require access to the source code, and not just executables, though the authors mention there is some evidence that style is preserved in binaries."
    -> so once you post to github and similar 'they' can link every code you ever write to you,....

  21. Was discussed at 31c3 by YoungManKlaus · · Score: 1

    so you are a good month late with the news

    1. Re:Was discussed at 31c3 by ihtoit · · Score: 1

      are the podcasts/videocasts out for that yet?

      --
      Political debates have me rolling my eyes so much I think I got optical whiplash. I should sue. - Foamy The Squirrel
    2. Re:Was discussed at 31c3 by YoungManKlaus · · Score: 1

      of course, since like 2 days after the conference ended. http://media.ccc.de/browse/con...

    3. Re:Was discussed at 31c3 by ihtoit · · Score: 1

      silly me. Thanks for the link anyway :)

      --
      Political debates have me rolling my eyes so much I think I got optical whiplash. I should sue. - Foamy The Squirrel
  22. All I can say is... by Anonymous Coward · · Score: 0

    BWAHAHAHAAAA!!!!

  23. Bad Coders Can't Be Identified by TrollstonButterbeans · · Score: 3, Interesting

    If your coding is terrible and very newbie like, they can't single you out since your code is similar to the ocean of other terrible coders.

    So if you are a paranoid freak, the best way to ensure your safety and keep the government off your back is to write terrible code.

    --
    Priest: "Universe from nothing, no laws of physics, sped up time"+ huge discrepancies. Creationism? No. Big Bang Theory
    1. Re:Bad Coders Can't Be Identified by ThatsDrDangerToYou · · Score: 1

      Ah, my work here is done!

    2. Re:Bad Coders Can't Be Identified by Anonymous Coward · · Score: 0

      Most of my coworkers already follow this advice.

  24. Quick! by Anonymous Coward · · Score: 0

    Grab the systemd source and see how much of it seems to be written by Satan.

  25. Oblig XKCD by Krazy+Kanuck · · Score: 2

    Not that many of us actually use comments.... http://xkcd.com/1421/

    1. Re:Oblig XKCD by Anonymous Coward · · Score: 0

      I wanted to go to Iceland!

      They needed the money and might have some "special" packages (like entering a volcano without the protective suit).

      Alas, the travel guy didn't have any option to go to Iceland (I'm in Brazil) and now they probably recovered... and Iceland must be expensive again... Oh, well...

    2. Re:Oblig XKCD by Anonymous Coward · · Score: 0

      Fellow users, i found a very interesting site where i can buy virtual credit cards to verify paypal ebay, amazon and others!
      www.anondebitcard.com
      they have acoupn running for 1st time customers: newcust10
      Use this when checking out to receive 10% discount on any of their products
       

  26. Most programming isn't new code by jgotts · · Score: 3, Insightful

    Most programming isn't writing new code. Most programming is working on someone else's crap you inherited. Invariably, you're going to be using that person's style or else the result will look like garbage.

    There is also the problem that most non-trivial code is worked on by multiple people at the same time.

    Writing some code from scratch as an assignment is a very artificial exercise nowadays, unless you're in a classroom setting. Therefore, you're going to get a signature from a programmer doing atypical work.

    1. Re:Most programming isn't new code by Anonymous Coward · · Score: 0

      Yep, step 1 when I started up a web dev team was to develop a 'template' site that we could use to derive all our other sites from. Nowadays almost all our sites have the exact same kind of 'lost password' link or contact form - all which could be traced back to just 1 developer who worked on the template site.

    2. Re:Most programming isn't new code by Anonymous Coward · · Score: 0

      Mod parent up.

      I have no mod points, but I tell you this.. after 15 years of software development of a single product.
      There is no file, that hasn't been touch by atleast 10 persons, some of the worsed offenders might have been touched by 30 people.
      Everyone adding or modding one single line of code.

      You keep the style that the file already had, DO NOT MAKE IT YOUR OWN, it should be readable for ALL.

      Also lesson learned, you do not want people to think its yours anyhow, cause they will come running to you with all crazy questions 10 years down the line.

      To know who wrote what, is only important when its a blame game going on, and that is what 'git blame' is for.

  27. What complete and utter bullshit. by MouseTheLuckyDog · · Score: 2

    95% of 250 coders. That means that out of a million programmers they will misidentify 200000.

    I suspect that there are few enough variances in style to make any coders style unique. For example whether to uses braces on a one line statement after an in if in C.

    With a few programmers it's likely to work, but when the possible source of programmers is the world...

    Not to mention emacs, Visual Studio and such enforcing some indentation standards and programming languages enforcing others.

    1. Re:What complete and utter bullshit. by Anonymous Coward · · Score: 0

      There are other things

      like if (xyz) or if (xyz != 0). Spaces between braces. How you name a variable, hungarian or not, camel case, all lower, first letter upper. Do you like one liner if statements or do you break them into multi lines. Do you put a empty line with spaces between to bits of code or do you leave the line completely empty. Do you align your comments on the right of a statement, or not, or put them before the line. What sort of brace style do you like? Do you put all statements in a if condition in parens. Do you put the comparator on the left or the right or back and forth? etc etc etc

      There are hundreds of little things that give you away. I could usually tell who in my office wrote some code. Variable names usually gave them away.

      Pretty printers and such can mask much of that though as you point out. But not completely.

      Your math is also off. It is 50k. 1000000 * (1 - 0.95) That means it got 950k correct. That is not bad.

    2. Re:What complete and utter bullshit. by Anonymous Coward · · Score: 0

      Or the fact that a lot of coders are self-taught and they'll keep changing their style as time passes ...

    3. Re:What complete and utter bullshit. by Rinikusu · · Score: 1

      Okay, I just woke up from a nap, but could you show your math there? Maybe I'm missing something because I come up with.. 50k, not 200k...

      --
      If you were me, you'd be good lookin'. - six string samurai
    4. Re:What complete and utter bullshit. by Anonymous Coward · · Score: 0

      It doesn't matter. Linearly scaling the accuracy of this application is garbage to begin with.

    5. Re:What complete and utter bullshit. by Ksevio · · Score: 1

      I find the statistics dubious as well - they also dropped the dataset to nearly 1/10 while roughly doubling the code input and the results were 2% better, so it's possible if we follow the trend it will reach the 20% you seem to quote.

    6. Re:What complete and utter bullshit. by Kjella · · Score: 1

      What complete and utter bullshit.

      95% of 250 coders. That means that out of a million programmers they will misidentify 200000.

      You know it's not a contest to come up with the worst bullshit. If you're left with one person 95% of the time when you have 249 possible wrong answers, it's like being left with 4000 people when you have 999999 wrong answers. If all those are too close to tell apart you'll misidentify >99.9%.

      Imagine for example that you wanted to find people by height and weight, as measured to nearest cm and kilo. It might work decently on a small group, but if you scale it up to a million people there'll be a lot of duplicates and then you're just guessing, double the population and you halve the chance of being right.

      --
      Live today, because you never know what tomorrow brings
    7. Re:What complete and utter bullshit. by steelfood · · Score: 1

      It's 50,000.

      Or for the study, the 12 people who code exclusively in assembly.

      --
      "If a nation expects to be ignorant and free in a state of civilization, it expects what never was and never will be."
    8. Re:What complete and utter bullshit. by Anonymous Coward · · Score: 0

      95% of 250 coders. That means that out of a million programmers they will misidentify 200000.

      I can tell that you must be a coder.

    9. Re:What complete and utter bullshit. by Anonymous Coward · · Score: 0

      Do you like one liner if statements or do you break them into multi lines. Do you put a empty line with spaces between to bits of code or do you leave the line completely empty. Do you align your comments on the right of a statement, or not, or put them before the line. What sort of brace style do you like? Do you put all statements in a if condition in parens. Do you put the comparator on the left or the right or back and forth?

      Yes.

    10. Re:What complete and utter bullshit. by secret_squirrel_99 · · Score: 1

      yes, but when the possible set of coders is everyone in your class, and what you really want to see is if the same kid wrote 5 other students submissions this is perfect and is at least one of the obvious use cases.

      --
      If privacy had a tombstone it would read "We did it for your own good" . -- John Twelve Hawks
    11. Re:What complete and utter bullshit. by Anonymous Coward · · Score: 0

      Almost all the "variances" I see mentioned here - whether to include braces on a single-line if statement, where to put the braces, where to include spaces around operators and brackets, whether to indent with tabs or spaces, how much to indent, when to change indentation - are comprehensively covered by any medium-sized company's coding standards. Meaning, 100+ coders will be producing code where these differences just won't be present. In principle, anyway.

      There must be more differences than these? Or did they only perform this "analysis" on code produced by CS students working on their own one-person exercises?

  28. style modifiers? by Anonymous Coward · · Score: 0

    I recall some program years ago that claimed to be able to convert your prose into the style of Hemingway/Dickens/. I wonder how easily this
    tool could support a similar feature - convert your code to Linus' style! Code like RMS!

    I even had a plan to add a formatter to the CVS to convert all code at checkin to a single style, so that diffs between versions would be guaranteed free of coder style quirk differences (tab size, spacing, brace placement). And I would have gotten away with it, if it weren't for that meddling C++ unparseability!

  29. So you could use this tool to make your code anon. by Maxo-Texas · · Score: 4, Interesting

    Write a version of pretty-printer that rerenders your code into a different style.

    Have a lexicon of mipelled words for each "personality".

    Another lexicon of variable names.
    a vs inta vs int_a vs x.

    Refactoring and unfactoring for subroutines.

    Run the comments through google translate and back to english.
    ukrainian
    japanese
    chinese

    Synonym and antonym substitution in the comments.

    The mind dances at the possibilities to mess with this algorithm.

    --
    She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
  30. Hah. I write everything in Fortran.. by toonces33 · · Score: 2

    and then use F2C to convert it to C code before I check in.. Try analyzing that!

  31. Spotting GCJ cheating would be an interesting find by jasax · · Score: 1

    Ditto. They also could have researched if submissions in a given (same) GCJ identity have been (or had a high probability of being...) written by two or more different coders...

    The submissions' speed of top ranked coders seen in early stages of the GCJ contest always amazed me (compared, of course, with my turtle sluggishness...)

    ;-)

  32. Re:So you could use this tool to make your code an by toonces33 · · Score: 1

    I can just imagine how unreadable such code would end up being, as any comments would look like they were written by some sort of AI tool.

  33. Obfuscator? Or just translate A-B-A? by RandCraw · · Score: 1

    Of course you could anonymize source code using an obfuscator.

    But maybe the simpler way is to compile Java to bytecode, then decompile it back to Java. I suspect that's as effective as most obfuscators.

  34. It's awesome! by Anonymous Coward · · Score: 0

    Fukkin great man! I get paid extremely well and I get all of the pussy I want. Granted, it's usually overweight, ugly pussy, but still better than what my last job provided.

  35. Code beautifier by mrflash818 · · Score: 1

    Perhaps something like Artistic Style might help.

    http://astyle.sourceforge.net/

    --
    Uh, Linux geek since 1999.
  36. Easy Solution by marciot · · Score: 1

    Someone just needs to write a tool that takes source code and translates it into an obfuscated form that only the CPU can understand. Is anyone working on this type of privacy tool?

    1. Re:Easy Solution by Anonymous Coward · · Score: 0

      Yes, and the solutions are called 'text editors'.

    2. Re:Easy Solution by Anonymous Coward · · Score: 0

      Sadly this trick won't help with a Quine.

  37. emacs by Anonymous Coward · · Score: 0

    i did something similar when I used to teach C. I neglected to realize that emacs and IDEs will often produce identical whitespace for simple programs.

    1. Re:emacs by __aaclcg7560 · · Score: 1

      I had a Java instructor who informed the class that he talked to two students in private because their code was nearly identical except for one small detail: one used the x variable, the other used the y variable. The program was so simple that he couldn't flagged the students for cheating.

    2. Re:emacs by ChunderDownunder · · Score: 1

      I once marked CS homework and uncovered cheating for an 'individual' assignment.

      A group of students had debug comments in their code - the giveaway? spelling mistakes.

  38. compiled? original language by Anonymous Coward · · Score: 0

    Not sure if this works as well for compiled code as it does for source.
    I bet there are templates and style-checkers in existence that would make source-code based author identification an issue.

  39. Whitespace, you're doing it wrong by Anonymous Coward · · Score: 0

    Okay, it won't eliminate fingerprinting completely, but using Kolmogorov Style would reduce variation quite a bit.

  40. Re:Obfuscator? Or just translate A-B-A? by Anonymous Coward · · Score: 0

    For good measure, toss in "run bytecode through proguard" between compiling Java to bytecode and decompile it back to Java. :D

  41. Pointless, but no doubt true by Kittenman · · Score: 2

    Wouldn't any programmer worth their salt identify themselves in the comments, or (if not) be logged as the last guy in that code on such-and-such a date, while working on such-and-such a patch number? (E,.g 'kittenman was here, 1/Jan/15, fixing Steve's crap').

    But I hope my code is easily recognizable. I'm proud of it. It may not be the smartest, slickest, quickest there is, but it's mine. And it works.

    --
    "The greatest lesson in life is to know that even fools are right sometimes" - Winston Churchill
    1. Re:Pointless, but no doubt true by Shados · · Score: 1

      People still use these stupid 90s style comments with authors and dates and shit? Really?

      Just use the source control system for that.

  42. harder to read if there is no consistency by Chirs · · Score: 1

    Generally speaking each project has a coding style that most code in the project adheres to, for the simple reason that it's easier to maintain when the code all looks more-or-less similar.

    If one area uses lowercase with underscores, and the other area uses CamelCase, and one area typedefs the heck out of everything while the other is explicit, then for someone coming in and trying to understand the code it makes it harder than necessary to figure out what's going on.

    So if you look at the linux kernel, or glibc, or firefox, or Chrome, or any other similarly large project, there will be some sort of coding style that applies. This is not to say that the style applies blindly. For example there are areas in the kernel where they basically imported a driver that is written in a different coding style. Since that driver is maintained out of the linux kernel tree and is largely self-contained, that was deemed to be acceptable. And even in that case, the driver used an internally-consistent coding style for all the files involved.

    1. Re:harder to read if there is no consistency by ChunderDownunder · · Score: 1

      Coding standard adoption can provoke holy wars but at the end of the day, you're a team. Though idiosyncratic decisions irk me, such as prefixing instance variables with underscore. Any decent editor will make such a distinction between scope via colours.

      Pretty printing tools and style checkers present in any decent editor will enforce coding standards with minimal fuss.

    2. Re:harder to read if there is no consistency by Anonymous Coward · · Score: 0

      "it's easier to maintain when the code all looks more-or-less similar"

      Not if the developers pushing/following said coding style are cretins. And in our autocratic fail up corporate society, that's practically a given.

      I can't tell you how many places I've worked that don't even check for nulls or type safety. At an insurance company in Marietta, one such null traveled 20 layers down to a VBscript that popped up a dialog on a headless server that locked up the entire system. Lots of blind attempts to convert "" to an integer did that too. So did a null reference exception inside of one of the few catch blocks that should've been logging this stuff (God knows where though). There was a veritable flood of showstopper bugs all over the codebase wasting hundreds of man hours every single week. This is just the benign stuff, too, _malicious_ input would've doubled that easily. And what comes up in the meetings? Herp derp why are you typing System.Int32 instead of int or var?

  43. Re:So you could use this tool to make your code an by Anonymous Coward · · Score: 0

    Yeah. Go throw the AI in jail. I dare ya.

  44. There's an easier way... by senedane · · Score: 1

    I just use 'git blame' to figure out who to yell at....

  45. will they show the method? by ihtoit · · Score: 1

    I doubt it. Therefore, this is about as reliable as graphology (handwriting analysis).

    If you take two programmers who code to book standard, how do you tell the difference between them using the same strict problem?

    --
    Political debates have me rolling my eyes so much I think I got optical whiplash. I should sue. - Foamy The Squirrel
  46. Here's a great idea... by Lodragandraoidh · · Score: 1

    You can have/use this idea for free:

    Before a system will build said code, have the build system verify the code not only by the public key/code hash, but as a secondary method - the code fingerprint of the author in question.

    This turns a creepy idea into something worthwhile.

    --

    Lodragan Draoidh
    The more you explain it, the more I don't understand it. - Mark Twain
  47. Re:Hah. I write everything in Fortran.. by rubycodez · · Score: 1

    That's one way to make your ForTran run slower

  48. NO I DO NOT! by Anonymous Coward · · Score: 0

    I don't even have mod points today! You're always sitting next to me posting this leftist crap and you take forever to type it!

  49. Fun fact, everything can be used to track you by Anonymous Coward · · Score: 1

    Case in point, I am a guitar player, and so was my college roommate. We didn't necessarily play together much, but we both heard each other play a lot, over the course of years.

    I'd be able to place his playing anywhere.

    For that matter, we used to have a game where we'd try to stump each other by playing clips of guitar players and guessing who they were. This was often improvisational jamming, very obscure recordings from established artists. We usually had to go through 3-4 rounds before someone would get one wrong.

    This isn't really much different than handwriting, speech patterns, writing patterns...

  50. yes you do by Anonymous Coward · · Score: 0

    yeah right, sitting over there with those right fingers on the mouse while I'm forced to use the touchpad!

  51. You're sure one to talk by Anonymous Coward · · Score: 0

    Which one of us pushed down the caps lock key when I was typing in the subject line up there? HEY STOP IT!

  52. Righteous prick! by Anonymous Coward · · Score: 0

    Hey, stop hitting yourself! uiopuip[';iulpol,.]]p??=` Stop hitting yourself! jkl;jkl;89uiopnm,.jkl Stop hitting yourself! mkpmkplkjklnnnnNNN BWAHAHA!

  53. Re:So you could use this tool to make your code an by physicsphairy · · Score: 1

    "Hey, you notice some odd grammar, word choice, and spelling variance in this code?"
    "Oh yeah, must be Maxo-Texas. That's his anonymization software."

  54. Re:So you could use this tool to make your code an by steelfood · · Score: 1

    If you did this every time, you'd be identified as the guy who runs his code through Google Translate prior to release.

    Non-normal behavior is the most easy to single-out. In order to avoid detection, you basically have to become noise. And if you're the only one, then even that is a pattern.

    Sure, you could run some things through Google Translate and leave some things alone, but that'd be the equivalent of having two online personas.

    --
    "If a nation expects to be ignorant and free in a state of civilization, it expects what never was and never will be."
  55. Not news by Anonymous Coward · · Score: 0

    I noticed years ago that I could identify which of my coworkers wrote a piece of code simply by the style.

  56. Re:So you could use this tool to make your code an by Maxo-Texas · · Score: 1

    Absolutely- if you were the only one using the tool.

    --
    She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
  57. Anonymous? by Anonymous Coward · · Score: 0

    You mean like the people who wear masks of Guy Fawkes?

  58. this perltidy sure gets around by wardk · · Score: 1

    seems to be a very prolific coder

  59. Re:Hah. I write everything in Fortran.. by Anonymous Coward · · Score: 0

    It runs a lot faster on platforms that don't have a Fortran compiler.

  60. Obfuscator? Or just translate A-B-A? by Anonymous Coward · · Score: 0

    Of course you could read the article first, and see that "Accuracy rates werenÃ(TM)t statistically different when using an off-the-shelf C++ code obfuscators. Since these tools generally work by refactoring names and removing spaces and comments, the syntactic feature set wasnÃ(TM)t changed so author identification at similar rates was still possible."

  61. Truecrypt? by omnichad · · Score: 1

    Time to run this against the 7.2 version of Truecrypt.

  62. Re:So you could use this tool to make your code an by Anonymous Coward · · Score: 0

    The article reveals that: "Accuracy rates weren’t statistically different when using an off-the-shelf C++ code obfuscators. Since these tools generally work by refactoring names and removing spaces and comments, the syntactic feature set wasn’t changed so author identification at similar rates was still possible."

    It would be interesting to see what it would take to reduce the probability of identification. I should probably RTFP, tho. Might be in there.

  63. Re:So you could use this tool to make your code an by Maxo-Texas · · Score: 1

    aye!

    If everyone used it then we'd all be spartacus.

    What I was implying also in my parent post was using the tool the article is about to confirm your code had reached the ambiguous level.

    --
    She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
  64. Re:So you could use this tool to make your code an by Maxo-Texas · · Score: 1

    That's a good point. I also mentioned arbitrarily factoring and refactoring subroutines and I did not state clearly enough that i was suggesting using the tool mentioned in the article to confirm your code was giving a false result.

    --
    She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
  65. Re:Hah. I write everything in Fortran.. by rubycodez · · Score: 1

    Wrong, the machine code emitted by one of the industry heavyweight Fortran compilers will kick the ass out of C's

  66. Good Example of Reverse Chronological Chauvinism by Doctrinsograce · · Score: 1

    Duh. They are, like, just seeing this today? We knew this back in the seventies... and I am sure that earlier programmers knew it too.

  67. Fairly low tech by ebvwfbw · · Score: 1

    Used to be able to tell which student's code I was looking at towards the end of a semester, in the 1980s. No need to look at who submitted it. From time to time I'd find one student's work turned in by someone else. That would result in an inquiry and usually an action against that student. Ye old dumpster dive.

    Years later I would do code reviews. Hardly any time I could tell you who wrote it. Even if they had departed the company. Certain people do certain things predictably.