Your Java Code Is Mostly Fluff, New Research Finds
itwbennett writes In a new paper (PDF), researchers from the University of California, Davis, Southeast University in China, and University College London theorized that, just as with natural languages, some — and probably, most — written code isn't necessary to convey the point of what it does. The code and data used in the study are available for download from Bitbucket. But here's the bottom line: Only about 5% of written Java code captures the core functionality.
Well, no. They're doing none of that.
From a quick skim through the paper, they more or less conclude that java program text compresses really well, since it's full of redundancy, scaffolding, and so on, and so forth. I'd say they need quite a few words to beat around the bush and imagine all sorts of more or less related things, but this is the core of their findings.
This finding is fairly obvious since well-known, certainly compared to certain other languages, but now in some light science sauce made with questionable methodology. That last bit again from skimming.
The piece written around it is equally fluffy and even the things mentioned to "improve" on this mostly involve writing more code, of which we already have a lot containing a large percentage of this "chaff".
The real question is whether or not this scaffolding is a waste of time. One might say obviously yes, yet the market says no:
There's a large market for (mediocre and therefore easily replacable) java programmers, and by extension a lot of money in grinding out this scaffolding, since without java programs are not complete and therefore won't do anything.
Another point: There is also a large market for PHP "programmers" grinding out excreable code in an excreable language, with lots of padding to make up for obvious deficiencies in the fabric of the language -- as in PHP such things are very rarely the result of deliberate design choices, as they not unlikely are in java, instead usually the result of some incompetent code contributor missing a point or other while adding yet another misfit misfeature.
There are other languages around that more easily facilitate much more concise code (such as lisp, mentioned as 'List' in the paper) but those aren't half as popular.
Thus, if there is wisdom in markets and crowds, then this chaff must add some desirable property to the services of (mediocre) programmers. Therefore, the obvious follow-up on noting that this here programming language is rather verbose, the search for expressivity, is not something the market puts a premium on.
IMO these people were having a good time crunching source in some number crunching tool and are mostly in search of more funding. This too is not unusual in that environment. IOW, dime-a-dozen study trotting out a well-known fact for great funding. What else is new in academia?
the code written, in the summer of 2012 the researchers downloaded 1,000 of the most popular Java projects from Apache, Eclipse, GitHub, and SourceForge. From that they got 100 million lines of Java code and tossed out simple methods (those with less than 50 tokens).
So they tossed methods that were wrtten well. (methods that only do one thing) So if you wrote a simple 2 line validation of an input field. Field must be populated. Field must match regex. They tossed that as chaff?
I read the article but not the study and the article states an unstated amount of the "fluff" is required to execute the code.
So I'm thinking
int a; is being treated as "fluff"
And perhaps even
public void longersubroutinename (int longerparametername) is mostly treated as "fluff".
i.e.
p v srA (int l) is much shorter. The rest is just fluff. Sure it names it human readable, but it's fluff!
But I'm only guessing here. The percentage seems way too high to be anything sensible tho.
I'm betting their "essential 5%" would be illegible and wouldn't execute / lacks declarations of variables or something else goofy.
She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.