Your Java Code Is Mostly Fluff, New Research Finds
itwbennett writes In a new paper (PDF), researchers from the University of California, Davis, Southeast University in China, and University College London theorized that, just as with natural languages, some — and probably, most — written code isn't necessary to convey the point of what it does. The code and data used in the study are available for download from Bitbucket. But here's the bottom line: Only about 5% of written Java code captures the core functionality.
I'll admit I just read the summary article and not the paper itself, but I wouldn't say that this is overly surprising.
Right off the bat due to this preoccupation we Java types seem to have with accessor methods (which I think if we admit, do something besides just set or get a private member variable like 1% of the time, why the hell we still do this I don't know..), and the frequent necessity for hash, clone, and equals methods, most of which is auto-generated, you end up with a bunch of small methods that do very little but up the code count.
Beyond that, I think good design usually works out this way. You (or at least I like to) build up in layers, each layer using the previous layer at a higher level, until you get to the top where you have a few seemingly simple bits of code that pull it all together. When you get big complex functions doing a bunch of stuff vs the described small functions adding little bits of functionality along the way, I think you are doing things wrong.
That's not to say people (and this is common in Java) go way overboard and end up with huge chains of methods that just pass the buck and complex control structures where you need a debugger to figure out whats going on, but if done right it can make for easily maintained and readable code.
This article uses a lot of words to say absolutely nothing.
In my experience, 80% of my code deals with checking for user error and thing like that (i.e not enter a string where i expect a number, does this socket really exist). This is important functionality, but indeed, it is not 'core'...
If an experiment works, something has gone wrong.
No. This is what happens with a language with an extremely verbose API and extreme boiler-plate requirements. The best Java developer in the universe isn't going to be able to get around this.
Imagine a language with no fluff, no cruft, no boilerplate. Everything is essential and concise. You have something akin to either assembly or too-clever Perl. The fluff is necessary. The fluff provides context, readability, and maintainability.
But I shoot to make 100% of the code I write fluff.
Democracy Now! - your daily, uncensored, corporate-free
A couple of important points to keep in mind here. First, the MINSET itself is not executable; itâ(TM)s merely the smallest subset of the code which characterizes the core functionality. Some of the other 95% of the code (the chaff) is required to make it run, so itâ(TM)s not useless.
So, we can do a computer transform on it to make it into something a computer can express efficiently, but we ignore the fact that the other 95% of the code is the error checking and other shit which you can't do without.
The whole premise of this "study" has nothing to do with code, how to write it, or what that entails.
I once had a co-worker who kept telling me that lisp or scheme would magically make it so you just wrote a two line program -- something like "getReady; justDoIt".
When I asked him who the hell would write "getReady" and "justDoit", he seemed to think it would be some magic step which sorted itself out. The hard parts don't just magically happen. I can write main() in C which says "getReady(); justdoIt();" -- that doesn't mean that I don't need to implement those parts.
This sounds equally stupid.
Since when have coders started subscribing to wishful thinking where you just wave your hands and the computer does all the hard stuff?
Lost at C:>. Found at C.
Really? Are they just pointing out that source code is meant for human readability, and the actual instructions are more concise? Is anyone surprised by this? Even a quick compression test shows me 80% reduction without even removing the most obviously human-oriented stuff like comments and long variable names.
Can I get some of this research grant money? I've got a theory about sparse matrices mostly containing zeros.
Stop-Prism.org: Opt Out of Surveillance
90% of the time is spent executing 10% of the code. But when something goes wrong you want that other 90% of the code to be there so that you don't l lose 100% of your work :)
Yes
"In America, first you get the sugar, then you get the power, then you get the women..." -H. Simpson
Did you know that only about 5% of the average house is actually load bearing? The rest is just fluff. Why are we wasting so much valuable material in houses?
If every single program in the universe contains the same boilerplate strings... They are indeed unnecessary. Java is just about the worst for this. Python requires drastically less redundant meaningless fluff.
Indeed.
I love Java, but not even a diehard fanboy will argue that it isn't excessively verbose and loaded with boilerplate code. The amount of code attributed to various getters, setters, and comparison methods alone often eclipses the actual functionality of a class. Not to mention doing just about anything with most Java APIs involves all kinds of intermediary wrapper objects.
It seems like the Java ecosystem is fine tuned for producing a high signal to noise ratio as far as intent of code is concerned. So much of the ecosystem stresses templates, massive IDEs and other automated tools that make the production of thousands of lines of unnecessary boilerplate incredibly easy. Besides, isn't this the nature of Java anyway? It seems like it's designed to produce the most verbose code possible in the hope that if everything is explicit more bugs can be diagnosed since the compiler has more to work with. It's almost a troll article, seriously, it's like the guy is just tryiing to piss people off.
I have a theory that the truth is never told during the nine-to-five hours. - Hunter S. Thompson
Hmmm, I don't know MakeRocketLauncherGoNow() vs Foo() ... yeah, I think having the code read like sentences makes a lot of sense.
If the onus is on human readability, that simple sentence is more than I've seen many coders put in comments.
Lost at C:>. Found at C.
Any decent code written to be readable and maintainable has lots of "fluff". That's what makes it readable and easy to maintain.
Much preferable to the mishmash of one line wonders that do ten different functions.
When Fascism comes to America, it will call itself Anti-Fascism, and tell you to give up your guns.
You forgot the MakeRocketLauncherGoNowFactory, the MakeRocketLauncherGoNowFactoryFactory, the MakeRocketLauncherGoNowException, the ...
If I have been able to see further than others, it is because I bought a pair of binoculars.
Ok, here's the deal, sometimes readability is in fact a function of how succinct something is, not how verbose it is. In human (verbal) languages and in cross-cultural communication we refer to this as high-context and low-context language. In code, a parallel could be applied. Succinctness is not a value in itself (read Paul Graham's defense of Lisp vs. Python, I disagree with Graham), but it can often be a good means to an end when context surrounding your identifier choice is clear as freakin' day.
But contrary to python or ruby code, for example, most Java code is not written by hand. No one ever writes import statements for example. Eclipse is so excellent at understanding Java code structure that the writing efficiency is comparable. It brings other benefits too -- I have found re-factoring of large code bases is substantially easier in Java than any other language. This is thanks to the strong structure implied by the language, which can be exploited by tools. In other languages this is prohibited, e.g. Ruby, where every word can mean something different and you can not know until runtime, or C when cluttered with macros.
NB: The message above might reflect my opinion right now, but not necessarily tomorrow or next year.
Yes, but the point is silly anyway.
The notion that everything that isn't core functionality is "fluff", gives the impression that it is non-essential.
Let's say I have a weather application that reports meteorological data for a specific zipcode. Let's say that I have a super slick user interface, and I display animated weather graphics in HD.
Fluff?
Not at all. A spartan application which displayed a bunch of plaintext data might have zero downloads. Sexy, eye candy might equate to 20 million downloads.
Which raises the question: What is the actual point of this app? Is it to display weather information?
No. The point of this app is to get downloaded.
So what's "core" again?
------ The best brain training is now totally free : )
This sounds like the same hand-wavy BS that spawned our current infestation of Agile consultants.
They aren't even trying to be scientific here; this is just baldfaced click-bait, likely commissioned by some unproductive company who wants to look like a "thought leader." What are they even defining as "wheat" and "chaff"? Who decides which lines of code are which? Who decides who gets to decide that? What does it even mean to describe what code "does"?
Smart people can disagree about best practices and what constitutes "good" code - ultimately, I think most of it boils down to personal taste rather than any notion of objective correctness or big-picture productivity. Personally, I feel most productive in Java - but that's because of an interlocking mesh of many subtle reasons and has nothing to do with how many bytes my code files take up.
Any decent code written to be readable and maintainable has lots of "fluff". That's what makes it readable and easy to maintain.
In my experience with real-life code bases, the more 'fluff,' the less readable and harder to maintain it becomes. If your hypothetical example of the on-line wonder has a problem, it is easy to rewrite. If a thousand-line program has a problem, it's harder to replace, even if (especially if?) it used many design patterns.
My point there is, the more lines of code you have, the harder it is to maintain. I don't think that's controversial.
Flexibility and maintainability come from well-defined interfaces between sections of code. It doesn't come from adding fluff.
"First they came for the slanderers and i said nothing."
Yes, if you want your code to be human readable and self-documenting. If you want something with little or no fluff, maybe go the assembly language route?
You don't need (and no one does do it) a RocketLauncherFactory if you only have one RocketLauncher type.
However such a Factory is quite useful if you happen to find a 'rocket' used in 'rocket launchers' and you need either an instance or a description of a launcher that actually can use that rocket.
Also factories are quite interesting as you can sent them usually 'orders'. That means if you order a few rocket launchers you can make sure when you unwrap them, that there is an assorted set of ammunition in the case as well.
But if you hate Factories and Exceptions ... no one can help you.
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
I wouldn't call all of that eye candy. Dynamic graphics can display a huge amount of information really quickly. Your example, ironically, is where 'eye candy' is really useful.
putting the 'B' in LGBTQ+
I still have visions of layers of adapter classes, which serve absolutely no purpose other than to appease Java.
Those adapter classes exist to make interfaces with lots of methods easier to manage. I've learned and forgotten many languages over my 30 years of programming, but Java is one of those elegant languages that makes programming pleasant. The only thing I truly hate about it is the stupid memory limits imposed by its early life for applets. That one thing makes desktop programming more irritating than it needs to be.
Well, no. They're doing none of that.
From a quick skim through the paper, they more or less conclude that java program text compresses really well, since it's full of redundancy, scaffolding, and so on, and so forth. I'd say they need quite a few words to beat around the bush and imagine all sorts of more or less related things, but this is the core of their findings.
This finding is fairly obvious since well-known, certainly compared to certain other languages, but now in some light science sauce made with questionable methodology. That last bit again from skimming.
The piece written around it is equally fluffy and even the things mentioned to "improve" on this mostly involve writing more code, of which we already have a lot containing a large percentage of this "chaff".
The real question is whether or not this scaffolding is a waste of time. One might say obviously yes, yet the market says no:
There's a large market for (mediocre and therefore easily replacable) java programmers, and by extension a lot of money in grinding out this scaffolding, since without java programs are not complete and therefore won't do anything.
Another point: There is also a large market for PHP "programmers" grinding out excreable code in an excreable language, with lots of padding to make up for obvious deficiencies in the fabric of the language -- as in PHP such things are very rarely the result of deliberate design choices, as they not unlikely are in java, instead usually the result of some incompetent code contributor missing a point or other while adding yet another misfit misfeature.
There are other languages around that more easily facilitate much more concise code (such as lisp, mentioned as 'List' in the paper) but those aren't half as popular.
Thus, if there is wisdom in markets and crowds, then this chaff must add some desirable property to the services of (mediocre) programmers. Therefore, the obvious follow-up on noting that this here programming language is rather verbose, the search for expressivity, is not something the market puts a premium on.
IMO these people were having a good time crunching source in some number crunching tool and are mostly in search of more funding. This too is not unusual in that environment. IOW, dime-a-dozen study trotting out a well-known fact for great funding. What else is new in academia?
Why are people always brining up the die hard fanboy argument?
The least thing to say about it: it is un polite! What do you expect me now to answer? As a non fan boy but serious java developer I have to say ...???
Sorry, I don't have to need to defend myself all the time why I use a certain thing.
I use a Mac, for good reasons. I use an iPhone for other good reasons. I use Java, but I also use C++, I don't use C, for good reasons.
And after 35 years in the industry I can tell you: I'm very disappointed. The stuff that rules the world is run by marketing. Not by fan boys.
Not to mention doing just about anything with most Java APIs involves all kinds of intermediary wrapper objects. That is complete nonsense.
Starting a post with a general insult and then making a wrong claim is poor sportsmanship imho.
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
"all programs can be optimized, and all programs have bugs; therefore all programs can be optimized to one line that doesn't work"
I'm in my right mind and I have the answer to everything!
Most of the "modern" languages seem to have this addiction to overly verbose libraries and obscenely long syntax. Do we really need method names that could constitute a simple sentence?
Long names are fine and even valuable. The real gremlin is in overly-abstracted API's, code generators, verbose XML configuration files, and other tools/libraries that have sacrificed usability while pursuing long feature lists and total control over a particular problem domain.
It is, in a funny way, the opposite usability trajectory that Gnome and many others in the UX crowd followed when they went off and started zealously reducing features in the name of simplicity.
Personally, I think that the underlying design principles should be the same whether you're designing application interfaces to be used by the general public or whether you're designing API's to be used by developers: in both cases you're trying to take something complicated and make it simpler. Sure, add those new/advanced features when you can, but do so in a way that doesn't raise the learning curve for the most common use cases.
-1, Too Many Layers Of Abstraction
Your claim smacks of hyperbole, but that aside I've also had to use code from developers who like to name their methods x(), f1(), t2() you get the idea. I can't tell if they're too lazy to type more than that, or they are striving to make all code fit in a 40-column window (ala GW-Basic), or they hate the idea that anyone else would ever try to read and comprehend their code.
There's got to be a nice balance.
My complaint about perl (and for that matter clojure too now) is that so many symbols have special meaning. and sometimes it is context dependent too. If your code contains $#`'~_ all over the place it makes it hard to read for anyone not intimately familiar with it. Sure, there are some well used conventions like _ for anything or triangle brackets for collections of types, but there comes a point where using a symbol to convey really important and subtle meaning is far harder to read than just putting in a keyword. All I can say is thank god Unicode was not invented earlier or there woudl have been 1000s of other characters involved.
Nullius in verba
These violent delights have violent ends
And in their triump die, like fire and powder
Which, as they kiss, consume
->
Boy meets the wrong girl, they die for love.
->
Boy, girl, dead.
->
people.forEach(die)
I mean, sure it gets the job done, but man, might as well just pay someone in India to write and read it.
Seemed like a good idea at the time...Actually, it STILL seems like a good idea after 30 years...
Good quality code should be largely readable in and of itself. comments to be included only where needed to clear up unavoidable complexity. Consider the following code:
You'll notice only one line in 10 is a comment, but the intent of each line is very clear in and of itself because of the clear choice of function names, and variable names. The only line of comment is used to create a human readable explanation of what the if statement is testing for, since it is not necessarily clear from the math itself. (Pardon the lack of indentation, I dont feel like fighting with HTML/Slashdot at the moment)
I wish I had a good sig, but all the good ones are copyrighted
In an ideal world where everything always works, you could easily ditch 90% of your code that deals with exceptional situations, none of that is core code.
In the real world however, that 90% of extra code isn't nearly enough to catch even half the poop the monkeys will throw at it.
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
the code written, in the summer of 2012 the researchers downloaded 1,000 of the most popular Java projects from Apache, Eclipse, GitHub, and SourceForge. From that they got 100 million lines of Java code and tossed out simple methods (those with less than 50 tokens).
So they tossed methods that were wrtten well. (methods that only do one thing) So if you wrote a simple 2 line validation of an input field. Field must be populated. Field must match regex. They tossed that as chaff?
Although fluffy code was nearly ubiquitous in all code samples examined, the researchers found that the best quality code could be found at http://www.ioccc.org/
You forgot the MakeRocketLauncherGoNowFactory, the MakeRocketLauncherGoNowFactoryFactory, the MakeRocketLauncherGoNowException, the ...
Exactly. Check out EnterpriseQualityCoding/FizzBuzzEnterpriseEdition for a "proper" example:
https://github.com/EnterpriseQ...
Exactly right. Ugly one-liners, or even whole functions, eh, I'll pound my head on them for an hour, but I'll figure it out. The time-consuming part is the structure.
You can spend weeks trying to understand the structure of a program before writing a few extra lines of code. That's what really steals your time.
"First they came for the slanderers and i said nothing."
After actually reading a lot of the paper, the conclusions of commenting programmers is raw ignorance. It appears some of them read the introduction (abstract) and thought they "knew" what the article was about. If one reads it, one discovers that the goal of the work is to provide a means of doing several interesting tasks (that I'd like to see done in Eiffel and placed into the IDE): 1. Code search: In my universe, is there any code that ________? -- some form of "wheat-keyword" query that can be quickly matched against a database held as metadata about the code universe. 2. Code completion: As I am hand-coding my feature (not just the line or even instruction I am typing), is there some other feature that looks like what I have already typed that the remainder of that feature can be applied to the one I am typing to "auto-complete" it? 3. Code reduction: Is there a language subset, such that a reduced keyword language could be hand-coded and the "fluff" or "chaff" filler be computed rather than typed, essentially making for a smaller and more powerful programming language and paradigm (when linked to #1 and #2 above)? These are very powerful and interesting questions. They are not implying that Java is 5% meaningful and 95% meaningless. It is simply implying a systematic means of code-reduction in an effort to make tools that do #1, #2, and/or #3 above! Fine article and good find. Thank you for sharing!!!