C Code On GitHub Has the Most "Ugly Hacks"
itwbennett writes: An analysis of GitHub data shows that C developers are creating the most ugly hacks — or are at least the most willing to admit to it. To answer the question of which programming language produces the most ugly hacks, ITworld's Phil Johnson first used the search feature on GitHub, looking for code files that contained the string 'ugly hack'. In that case, C comes up first by a wide margin, with over 181,000 code files containing that string. The rest of the top ten languages were PHP (79k files), JavaScript (38k), C++ (22k), Python (19k), Text (11k), Makefile (11k), HTML, (10k), Java (7k), and Perl (4k). Even when controlling for the number of repositories, C wins the ugly-hack-athon by a landslide, Johnson found.
"oh, this is a hack!"
Pretty sure people already assume that every line of Perl code is an ugly hack anyways, so they didn't have to write a comment on it.
C Code EVERYWHERE has the most "ugly hacks"
Regardless this seems like a pretty crappy study. There's many other phrases like kludge or XXX to have considered.
Seriously guys. File this one under "NO SHIT" - Of course C is going to have the most ugly hacks. Why? Because it is by design able to access a hell of a lot more than other languages. How many languages have direct hardware access? Or inline ASM code? And does the word "hack" in the code really make it an "ugly" hack? Seriously? I wrote a micro-kernel for an ARM platform about a decade ago, and there was an assload of inline ASM code and direct pointer manipulation to access the underlying hardware, there is no other way to do this. Yeah, I'm sure the word "hack" appeared countless times in my code, because that's the general term we use. That doesn't make it "ugly" or bad by any means.
C coder know a ugly hack when they see one, and when they write one.
I would conjecture that nearly every line of Perl scripts is an ugly hack, so nobody bothers to add a comment... 8-)
How get C coder job.
Experience: 45 years experience with ugly hack.
It doesn't take into account that with Perl and PHP, "ugly hack" is implied.
Fast inverse square root (sometimes referred to as Fast InvSqrt() or by the hexadecimal constant 0x5f3759df) is a method of calculating x1/2, the reciprocal (or multiplicative inverse) of a square root for a 32-bit floating point number in IEEE 754 floating point format.
http://en.wikipedia.org/wiki/F...
Anybody got any better Ugly Hacks to share?
God I love C.
An analysis of science papers shows that the social scientists have the most stupid conjectures. This is based on estimating the relationship between the hypothesis being examined and the data presented.
I realize the analysis is probably a little tongue-in-cheek, but this is probably the worst analysis I've ever seen. Nothing of use was gained...
Perl was last on that list?
For those people who say that Perl coders only write incomprehensible gibberish, I say:
BWA HA HA HA HA!
The whole C language is one beautiful hack, scary at first but once you get to know it in some really messed up sw project you can't help but love it. The balance between freedom and structure is excellent.
Check it out yourself:
https://github.com/search?l=&q=%22ugly+hack%22+created%3A%3E%3D2013-01-01+created%3A%3C2015-05-01&ref=advsearch&type=Code&utf8=%E2%9C%93
#include "complex.h"
then it says ugly hack for uClibc ./configure && make && make install
since it does not support complex functions properly and they want
fail for such embedded C library
Sounds more of a copy-paste issue to make configure fail for uClibc, then an actual problem with the C language in general.
6 #include "complex.h" ... }
7 +#ifdef __UCLIBC__
8 +#error ugly hack to make sure configure test fails here for cross until uClibc supports the complex funcs
9 +#endif
10 int
11 main () {
#ifndef __INCif_etherh /* Quick and ugly hack for VxWorks */
// ugly hack because we don't have fscanf
/* ugly hack to make it compile on RH 4.2 - WA */
/* ugly hack GRR */ /* nothing */
/* XXX argh, ugly hack to make stuff compile! */ ...) sprintf(BUF, __VA_ARGS__)
int fscanf(FILE* stream, const char* format, int* value)
#else
#include
#endif
#if !defined(__GNUC__) && !defined(__common_include__)
#define __attribute__(x)
#define snprintf(BUF, SIZE,
I for one would love to see some examples of such "ugly hacks", and also how it should/could be done in a not so ugly manner.
Every single page has many occurence of the same "ugly" hack. If the folks who did the study had an ounce of legitimacy, they would have filtered for all those duplicates. If they had actually been competent, they would have done an in-depth study of all those "ugly hack". Of course, at this point, the article would have been worthless, but hey, they got their first page on /. ...
These numbers should be weighted to the amounts of code in the various programming languages on GitHub. There may be lots of C "codefiles" with the "ugly hack" string in them, but there probably is a lot of C code overall on GitHub, too.
"The agriculture ministry is not in charge of Gundam" - Japanese ministry official.
Yeah. I went there.
"Ugly Hack" very often means the programmer has done a smart thing, if not an exactly correct thing. Although sometimes an ugly hack is just an ugly hack.
The numbers does not show anything unless we have a point of reference - such as how many lines of code does each of those languages actually contain?
Lets say there are 200 million lines of C code -> 0.1% Ugly hacks.
Number of repositories for comparison do not sound like a factor of comparison unless we also have a average number of lines for each project.
"/* ugly hack to... */" is a modest expression of pride describing concise, functional, readable and elegant C code in the same way as the term "//elegant approach to..." in C++ describes some borderline-insane misapplication of the STL with the incomprehensibility of perl and the verbosity of java.
Nullius in verba
If a solution is stupid and it works, then it's not stupid.
How could "ugly hack" exist in perl code. Every single line ever written is an elegant beautiful hack, by design.
The only thing worse than a Democrat is a Republican.
I suspect that most search engines classify .h as a C file. In reality it could be either, and far and away the most common practice is for C++ to also use .h for header files.
.h for C++ headers. Once in a great while, I've come across .hxx (there used to be a company that wrote compilers for Windows shareware developers called Borland that used .hxx or maybe .hpp I think) On SGI, I think I've once or twice seen .hh
On in extremely rare cases have I seen anything other than
Yes, and how many files were scanned for each language. A proper analysis should show a comparison of the ratios not the absolute counts.
If I had 100 C files in which 20 had "ugly hack" and 20 java files in which 10 had "ugly hack", obviously Java would be a bigger hit. And why stop at "files" ... perhaps you should be comparing occurrences per line of code or even file sizes.
If your not going to express the survey in a percentage of total of that language then this whole conversation is stupid. What if c has the most lines of code in the repository? Then of course it will take the top spot.
WTF! An autoplay video on frontpage? Slashdot, you have reached a new low ...
Is it time for me to finally tick the "Disable Advertising" checkbox?
This just goes to prove what we all know - that it is impossibly to write ugly code in C# ;)
Because C is so sparse and clean (or primitive, ymmv!), and people using C tend to be more experienced (almost nobody starts with C anymore - you use it because the job needs it), I find C programmers are a lot more likely to recognize things as ugly hacks and label them. It's partly defensive, because other C programmers are also old and cranky, so you're tagging it with YES I KNOW don't start with me. You don't want to check this in and have, say, Linus think you don't realize compromises were required.
On the other hand, JavaScript people seem to be a lot more 'hey, doing this weird thing works without dying - I'll push it to production.' (YMMV, that's just my experience).
Which makes this all subjective. There are already comments by people who say that anything not done in C is ugly, so how to tell that these same people didn't pepper other's code with that statement? Many people think code is an ugly hack merely because it wasn't done the way they would code it.
I've seen code written with procedures named for Alice in Wonderland characters and activities. Yet, I've seen that kind of thing defended here as 'creative'. 'Ugly hack' in a comment is a worthless indicator.
Programmers in most other languages don't even notice.
Were these taken in context?
Was this about C hacks, game/os hacks, library work arounds, paradigm compensations, or just poor algorithm implementation hacks?
The usage of comments such as this one is rather open to interpretation. C being considered a root language for many languages today may require "hacks" to get around such issues. Similarly implementations of things such as inline assembly, naked functions and goto statements are often seen as hacks, even though they are all perfectly fine when given certain considerations.
To compound this issue, C is the lowest level language in the list of languages examined. There is a Huge issue with this as C becomes a choice language for development that requires low level access. For example drivers, IC's, and yes actual hacking. The problem with this is that C developers will often need to bend to the will of other systems they have little control over. Resulting in "ugly hacks", something that higher level languages often abstract away. Ever try implementing USB emulation on an AVR microcontroler? How about implementing red-black tree in C? Basically this article shows that a phrase shows up more commonly in the language, this phrase effectively means nothing at all. But may scare a few clueless employers into the thought that C is a poor language choice when it is one of the most powerful languages out there.
Honestly, I believe further analysis is needed here before such a judgement is considered.
That's because they aren't scientists at all. They're politicos looking to justify ideology.
And it has them for the same reason it has beautiful hacks:
The complete and utter lack of fucking memory management, forcing the development of such hacks.
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
I have numerous phrases to describe a shitty solution but I have never used "ugly hack" once.
Apparently I am a rockstar dev on Github or maybe just an ugly hack!
Percentage of files per language would be a more fair comparison. Maybe there's more C code than anything else.
Coming up after the break, new insights into why dogs lick their balls.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Do we know that the comments were added by the author? An alternative explanation is that the C code has had fuller peer review than other languages, and the reviewer has marked these areas for future refactoring.
Good thing we have Intel AMT/Vpro/VT.
We will not check his computer for illegal bytes remotely.
And then we will imprison him.
Maybe C developers are just honest and experienced and name what it is.
I won't accuse Java, with it patterns of patterns, when there is such a easy victim like PHP.
PHP developers start their first line virtually with /* big hack */ and finish the last line with /* this is cruel */.
How often is the comment "ugly hack" part of a longer comment like: "ugly hack for Visual C workaround"?
I think the article is misleading, he should get the number of ugly hack per number of line or per number of files.
Maybe there's 100x more js files than makefile hosted in github, thus it s a totaly different matter if it sum only 3x the number of hugly hack occurences.
I think the reason C came out on "top" is because C programmers are more conscious when they are doing something that the language doesn't accomodate natively but is flexible enough to get the job done and probably document this for their own sanity later on when revisiting that file.
At some point on every project an implementer encounter a situation where they question whether a different language should have been used because of some intractable problem that ultimately gets solved with a "hack", that's what engineers do.. they make things work, sometimes having to use the tools and materials on hand...
It's been my experience that in about 50% of hacks I've encountered or implemented was the result of poor design, 25% the result of language limitations, 25% the implementer was lazy and not wanting to fix the source of the problem and 25% due to not having time to fix the source of the problem (scheduling).
I wonder if 'C' encourages or has a culture of having more comments than some of the other languages.
And as other posters have hinted at when noting code that's trying to run in different environments, the environment C runs in (standard library, etc.) has varied longer (in time) and more (in versions) than the other languages mentioned. Seriously, is anyone writing new code for Ultrix anymore?
I wanna know!
C coders know when they are using ugly hacks and would take a moment to comment it or name the function with the term ugly hack. They realize it is not elegant and make a note so that future developers do not think it is a reference implementation worthy of replication and emulation. It is basically "this is probably not worth copy/paste, do a fresh implementation".
Other language coders might be using these ugly hacks with pride not knowing anything better.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
It's a sad day.
From, oh, at least a decade ago. My two, um, favourite ugly hacks found in cross-platform C++ production code have been:
A file decompressor, obviously reverse engineered from a dis-assembly to C then wrapped in a C++ class, with all the variable name, abused do-for-while-if constructs and readability issues that implies. The main function was a 3000+ line switch statement finite state machine. The FSM was initialized by a goto jumping over half the function's variable declarations into the middle of a multiple page case statement buried halfway down the switch.
IIRC there were also goto's jumping between bits of class member functions though that particular joyous bit of code reuse may have been found elsewhere.
It had worked reliably for several years before being found. We made one change - the comment "Here be dragons"
and
A C++ class whose humungous private member variable section had two unused variables named _first and _last at the beginning and end of the section and whose constructor had "memset( &_first, 0, ( &_last - &first ) );" to initialize all the class member variables.
We awarded this developer portability bonus points for not implementing their own faster memset() and adding the extra unused variables.
Don't get me wrong these were extremely bright, experienced, developers who knew the internals of their chosen platform, dev tools and machine architecture from the metal upwards. All submitted code worked with very few bugs The code just occasionally needed hitting very hard with the +10 hammer of portability.
Anonymous because I like working in the IT security industry.
Perl is a write-only language.
What was Perl originator Larry Wall thinking?
Perl is an ugly hack so all comments are unnecessary.
C has the most algorithms: https://github.com/search?utf8=%E2%9C%93&q=algorithm&type=Code&ref=searchresults
C has the most computation: https://github.com/search?utf8=%E2%9C%93&q=computation&type=Code&ref=searchresults
C has the most tests: https://github.com/search?utf8=%E2%9C%93&q=test&type=Code&ref=searchresults
C has the most characters: https://github.com/search?utf8=%E2%9C%93&q=characters&type=Code&ref=searchresults
C has the most stuff: https://github.com/search?utf8=%E2%9C%93&q=stuff&type=Code&ref=searchresults
C has the most things: https://github.com/search?utf8=%E2%9C%93&q=things&type=Code&ref=searchresults
C has the most stupid: https://github.com/search?utf8=%E2%9C%93&q=stupid&type=Code&ref=searchresults
C has the most smart: https://github.com/search?utf8=%E2%9C%93&q=smart&type=Code&ref=searchresults
In conclusion, this article is very good and insightful, and in no way a complete waste of time.
I love new hardware. I spend a double digit % of my income on it.
There is something to be said for people that get the full value out of the environmental impact required to produce their hardware. Right now I'm posting on /. I was doing this 10 years ago on, what is today, completely inferior hardware and it worked just fine. If someone kept that hardware alive and useful isn't that good engineering? Are people who restore/maintain classic cars not good mechanics because they should be working on a BMW-i?
How many lines of code per language are they comparing here?
phtt...
Java. Thats an ugly hack.
Did they even try to normalize the results to number of files? Lines of code?
First, the counts are lame, searching a source repository is searching through all the version comments as well. If you filter only 'C' the first 10 pages are from 2 files in the same project.
The vast majority are just hardcoding things, may be bad practice, but not really ugly hack. Most web scripting languages, PHP, JS, etc. people just do that. No reason to have header files or variable definitions like you would in C or other compiled languages that you wish to maintain well.
True hacks would be like code that traps an error and ignores it because you have no idea why it's doing it, but it otherwise seems to work.
catch(exception){ }
Ever wondered why Pascal had packed arrays of char? It was for the Cyber, running 60 bit words. Did a fine job.
C has the PDP 11 architecture baked into its soul, with 8 bit bytes being part of that.
Worse, C has influenced all modern architectures to live within that crude model. For example, how could we have 64 bit pointers without using the upper 16 bits as tag bits? (No, we are never going to use them in our life times, memory access for that much memory would just be too slow. 32 to 64 bit is not the same as 16 to 32 bit.)
Is that one equal sign, two, three?
Both PHP and JavaScript define one as assignment, two as comparison after implicit conversion of values to the same type, and three as comparison of both type and value. Douglas Crockford, author of JavaScript: The Good Parts, proposes never using == at all in these languages, instead explicitly converting everything before comparing them with ===.
Python handles it differently: one is assignment, two are comparison as defined by the types with fallback to is if not defined, and is is object identity (similar to Java == on objects). Python allows types to override operator == in much the same way that Java objects have an equals() method, and the built-in types do implicit conversions on numbers but little else.
How much is "duck" + 1 + orange()?
If you're going to overload operator +, there's a safe way and a less safe way. JavaScript performs the safer way of trying to convert things to strings. PHP, on the other hand, has implicit conversions biased toward numbers, which is why it doesn't overload +, instead using a separate concatenation operator.
Here's how it plays out in JavaScript:
1. Add parentheses on the left for infix operators of equal precedence: ("duck" + 1) + orange()
2. If one side is a string and the other anything else, concatenation is used. "duck" + 1 becomes "duck" + "1" which is "duck1".
3. "duck1" + orange() will also use concatenation.
How many languages have direct hardware access? Or inline ASM code?
A strictly conforming C program can't do either. Casting an integer to a pointer and dereferencing it is undefined by the International Standard, even if a particular implementation defines it to perform MMIO (memory-mapped input and output). Nor is asm a standard keyword. GitHub lumps C and C-as-extended-by-popular-implementations into one set.
It probably also horrifies some programmers who think it's premature optimization (if it's not yet the end of the world then it's too soon).
It's not premature if it's documented properly. Algorithmic optimizations can carry a big O analysis, and micro-optimizations can carry before and after reports from your profiler.
Right now I am working on a project with a 4,000 line function
Even when coding in assembly language for an 8-bit microprocessor, I'd probably extract methods an order of magnitude before 4000 lines.
related classes scattered across multiple projects so they can't compile easily
Create a new project whose purpose is to provide classes to these projects.
If your boss complains about not having time=money for refactoring, try first seeing whether your boss has heard of Dave Ramsey and his Total Money Makeover. If you're not familiar, Mr. Ramsey is a famous proponent of sacrifice to pay down personal debt. Then explain to your boss that your codebase is likewise deep in debt, and dealing with messy code like that is like having to spend a lot of your revenue on paying interest on that debt. Refactoring to pay the principal on your project's technical debt may delay getting the next feature out, but it might help you get the next six features out in same time that you otherwise would have produced only four.
If you're surrounded by co-workers who keep your team in debt, then you should work with your boss to start teaching them coding practices that will get your team out of debt.
So your boss is part of the problem. Does your boss have a boss?
Failing that: Do you have a Stack Overflow account? You might want to start posting answers there on your own time. By the time you get 1000 reputation or so, you should gain access to Stack Overflow Careers.
Since the entire PHP language is a dirty hack and every line written in it is as well, PHP is the ugly hack champ by a wide margin.