Slashdot Mirror


Your Java Code Is Mostly Fluff, New Research Finds

itwbennett writes In a new paper (PDF), researchers from the University of California, Davis, Southeast University in China, and University College London theorized that, just as with natural languages, some — and probably, most — written code isn't necessary to convey the point of what it does. The code and data used in the study are available for download from Bitbucket. But here's the bottom line: Only about 5% of written Java code captures the core functionality.

4 of 411 comments (clear)

  1. Fluff piece makes readers overlook core point by Anonymous Coward · · Score: 2, Interesting

    Well, no. They're doing none of that.

    From a quick skim through the paper, they more or less conclude that java program text compresses really well, since it's full of redundancy, scaffolding, and so on, and so forth. I'd say they need quite a few words to beat around the bush and imagine all sorts of more or less related things, but this is the core of their findings.

    This finding is fairly obvious since well-known, certainly compared to certain other languages, but now in some light science sauce made with questionable methodology. That last bit again from skimming.

    The piece written around it is equally fluffy and even the things mentioned to "improve" on this mostly involve writing more code, of which we already have a lot containing a large percentage of this "chaff".

    The real question is whether or not this scaffolding is a waste of time. One might say obviously yes, yet the market says no:

    There's a large market for (mediocre and therefore easily replacable) java programmers, and by extension a lot of money in grinding out this scaffolding, since without java programs are not complete and therefore won't do anything.

    Another point: There is also a large market for PHP "programmers" grinding out excreable code in an excreable language, with lots of padding to make up for obvious deficiencies in the fabric of the language -- as in PHP such things are very rarely the result of deliberate design choices, as they not unlikely are in java, instead usually the result of some incompetent code contributor missing a point or other while adding yet another misfit misfeature.

    There are other languages around that more easily facilitate much more concise code (such as lisp, mentioned as 'List' in the paper) but those aren't half as popular.

    Thus, if there is wisdom in markets and crowds, then this chaff must add some desirable property to the services of (mediocre) programmers. Therefore, the obvious follow-up on noting that this here programming language is rather verbose, the search for expressivity, is not something the market puts a premium on.

    IMO these people were having a good time crunching source in some number crunching tool and are mostly in search of more funding. This too is not unusual in that environment. IOW, dime-a-dozen study trotting out a well-known fact for great funding. What else is new in academia?

  2. Nonsense by Anonymous Coward · · Score: 4, Interesting

    the code written, in the summer of 2012 the researchers downloaded 1,000 of the most popular Java projects from Apache, Eclipse, GitHub, and SourceForge. From that they got 100 million lines of Java code and tossed out simple methods (those with less than 50 tokens).

    So they tossed methods that were wrtten well. (methods that only do one thing) So if you wrote a simple 2 line validation of an input field. Field must be populated. Field must match regex. They tossed that as chaff?

    1. Re:Nonsense by lgw · · Score: 5, Interesting

      So they tossed methods that were wrtten well. (methods that only do one thing) So if you wrote a simple 2 line validation of an input field. Field must be populated. Field must match regex. They tossed that as chaff?

      Why the Hell should you have to write code over and over to validate that a reference isn't null, or an int is positive, or other such cases. Sure that's all part of the interface contract anyhow, right? For that matter, why is "allowed to be null" the default rather than an exceptional special case. Why isn't there a simple operator that decorates a parameter as "nullable" with a single character.

      Why not simply

      public Foo foo;

      No getter or setter needed, by default it can't be null. For those odd cases where null actually means something useful, then just write:

      public Foo? foo;

      This goes double for C#, where "?" is already established as the "nullable" decorator.

      Worth noting that many Java coders use Lombock to effectively achieve this already, just with auto-generated getters and setters, since we lack the courage ad/or authority to just have public members instead of pointless getters and setters.

      And, above all else, give us a way to declare that the returned value can't be null, and auto-throw if it is, so the caller never has to check!

      --
      Socialism: a lie told by totalitarians and believed by fools.
  3. Re:Makes sense to me by Maxo-Texas · · Score: 2, Interesting

    I read the article but not the study and the article states an unstated amount of the "fluff" is required to execute the code.

    So I'm thinking

    int a; is being treated as "fluff"

    And perhaps even

    public void longersubroutinename (int longerparametername) is mostly treated as "fluff".

    i.e.
    p v srA (int l) is much shorter. The rest is just fluff. Sure it names it human readable, but it's fluff!

    But I'm only guessing here. The percentage seems way too high to be anything sensible tho.

    I'm betting their "essential 5%" would be illegible and wouldn't execute / lacks declarations of variables or something else goofy.

    --
    She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.