Slashdot Mirror


More Than Half of GitHub Is Duplicate Code, Researchers Find (theregister.co.uk)

Richard Chirgwin, writing for The Register: Given that code sharing is a big part of the GitHub mission, it should come at no surprise that the platform stores a lot of duplicated code: 70 per cent, a study has found. An international team of eight researchers didn't set out to measure GitHub duplication. Their original aim was to try and define the "granularity" of copying -- that is, how much files changed between different clones -- but along the way, they turned up a "staggering rate of file-level duplication" that made them change direction. Presented at this year's OOPSLA (part of the late-October Association of Computing Machinery) SPLASH conference in Vancouver, the University of California at Irvine-led research found that out of 428 million files on GitHub, only 85 million are unique. Before readers say "so what?", the reason for this study was to improve other researchers' work. Anybody studying software using GitHub probably seeks random samples, and the authors of this study argued duplication needs to be taken into account.

3 of 115 comments (clear)

  1. Re: How could more than half be duplicate? by joelsherrill · · Score: 3, Informative

    Even then, the original code may not be on GitHub. Peojexts like GCC, RTEMS and FreeBSD have the original code somewhere other than GitHub. So all of the code there for these and other projects is not original.

  2. Pull requests by manu0601 · · Score: 4, Informative

    No surprise here, this is how this stupid thing works: in order to submit a one-line bugfix, one have to fork the repository, patch, commit, pull request.

  3. Re: no surprise by MightyYar · · Score: 5, Informative

    And the only way to push a change back to a repository you don't control! You fork, push your change to your fork, then create a pull request. This is by design - I have no idea why this is in any way a surprise.

    --
    W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.