C Code On GitHub Has the Most "Ugly Hacks"
itwbennett writes: An analysis of GitHub data shows that C developers are creating the most ugly hacks — or are at least the most willing to admit to it. To answer the question of which programming language produces the most ugly hacks, ITworld's Phil Johnson first used the search feature on GitHub, looking for code files that contained the string 'ugly hack'. In that case, C comes up first by a wide margin, with over 181,000 code files containing that string. The rest of the top ten languages were PHP (79k files), JavaScript (38k), C++ (22k), Python (19k), Text (11k), Makefile (11k), HTML, (10k), Java (7k), and Perl (4k). Even when controlling for the number of repositories, C wins the ugly-hack-athon by a landslide, Johnson found.
Pretty sure people already assume that every line of Perl code is an ugly hack anyways, so they didn't have to write a comment on it.
Regardless this seems like a pretty crappy study. There's many other phrases like kludge or XXX to have considered.
Seriously guys. File this one under "NO SHIT" - Of course C is going to have the most ugly hacks. Why? Because it is by design able to access a hell of a lot more than other languages. How many languages have direct hardware access? Or inline ASM code? And does the word "hack" in the code really make it an "ugly" hack? Seriously? I wrote a micro-kernel for an ARM platform about a decade ago, and there was an assload of inline ASM code and direct pointer manipulation to access the underlying hardware, there is no other way to do this. Yeah, I'm sure the word "hack" appeared countless times in my code, because that's the general term we use. That doesn't make it "ugly" or bad by any means.
C coder know a ugly hack when they see one, and when they write one.
I would conjecture that nearly every line of Perl scripts is an ugly hack, so nobody bothers to add a comment... 8-)
It doesn't take into account that with Perl and PHP, "ugly hack" is implied.
Fast inverse square root (sometimes referred to as Fast InvSqrt() or by the hexadecimal constant 0x5f3759df) is a method of calculating x1/2, the reciprocal (or multiplicative inverse) of a square root for a 32-bit floating point number in IEEE 754 floating point format.
http://en.wikipedia.org/wiki/F...
Anybody got any better Ugly Hacks to share?
God I love C.
I realize the analysis is probably a little tongue-in-cheek, but this is probably the worst analysis I've ever seen. Nothing of use was gained...
Perl was last on that list?
For those people who say that Perl coders only write incomprehensible gibberish, I say:
BWA HA HA HA HA!
The whole C language is one beautiful hack, scary at first but once you get to know it in some really messed up sw project you can't help but love it. The balance between freedom and structure is excellent.
Check it out yourself:
https://github.com/search?l=&q=%22ugly+hack%22+created%3A%3E%3D2013-01-01+created%3A%3C2015-05-01&ref=advsearch&type=Code&utf8=%E2%9C%93
#include "complex.h"
then it says ugly hack for uClibc ./configure && make && make install
since it does not support complex functions properly and they want
fail for such embedded C library
Sounds more of a copy-paste issue to make configure fail for uClibc, then an actual problem with the C language in general.
6 #include "complex.h" ... }
7 +#ifdef __UCLIBC__
8 +#error ugly hack to make sure configure test fails here for cross until uClibc supports the complex funcs
9 +#endif
10 int
11 main () {
#ifndef __INCif_etherh /* Quick and ugly hack for VxWorks */
// ugly hack because we don't have fscanf
/* ugly hack to make it compile on RH 4.2 - WA */
/* ugly hack GRR */ /* nothing */
/* XXX argh, ugly hack to make stuff compile! */ ...) sprintf(BUF, __VA_ARGS__)
int fscanf(FILE* stream, const char* format, int* value)
#else
#include
#endif
#if !defined(__GNUC__) && !defined(__common_include__)
#define __attribute__(x)
#define snprintf(BUF, SIZE,
Really ? Most of those "ugly hack" are symbols visibility issues... Not worth a Senior position....
I for one would love to see some examples of such "ugly hacks", and also how it should/could be done in a not so ugly manner.
Every single page has many occurence of the same "ugly" hack. If the folks who did the study had an ounce of legitimacy, they would have filtered for all those duplicates. If they had actually been competent, they would have done an in-depth study of all those "ugly hack". Of course, at this point, the article would have been worthless, but hey, they got their first page on /. ...
These numbers should be weighted to the amounts of code in the various programming languages on GitHub. There may be lots of C "codefiles" with the "ugly hack" string in them, but there probably is a lot of C code overall on GitHub, too.
"The agriculture ministry is not in charge of Gundam" - Japanese ministry official.
C Code EVERYWHERE has the most "ugly hacks"
Ah, but C also has the most beautiful hacks.
"Ugly Hack" very often means the programmer has done a smart thing, if not an exactly correct thing. Although sometimes an ugly hack is just an ugly hack.
Exactly.... going on my gut of language popularity I'd say the worst is Text.
"/* ugly hack to... */" is a modest expression of pride describing concise, functional, readable and elegant C code in the same way as the term "//elegant approach to..." in C++ describes some borderline-insane misapplication of the STL with the incomprehensibility of perl and the verbosity of java.
Nullius in verba
If a solution is stupid and it works, then it's not stupid.
How could "ugly hack" exist in perl code. Every single line ever written is an elegant beautiful hack, by design.
The only thing worse than a Democrat is a Republican.
I suspect that most search engines classify .h as a C file. In reality it could be either, and far and away the most common practice is for C++ to also use .h for header files.
.h for C++ headers. Once in a great while, I've come across .hxx (there used to be a company that wrote compilers for Windows shareware developers called Borland that used .hxx or maybe .hpp I think) On SGI, I think I've once or twice seen .hh
On in extremely rare cases have I seen anything other than
There is 8700 fork of the linux kernel. My local copy of 3.17.1 has about 17 Mloc, so just the linux kernel fork's are over 147 Gloc.
WTF! An autoplay video on frontpage? Slashdot, you have reached a new low ...
Is it time for me to finally tick the "Disable Advertising" checkbox?
FWIW, just my local copy of the linux kernel has 15 occurrences of "ugly hack".
C code is ugly hacks. But how else are you going to write an efficient ring buffer?
Check out my sci-fi/humor trilogy at PatriotsBooks.
Because C is so sparse and clean (or primitive, ymmv!), and people using C tend to be more experienced (almost nobody starts with C anymore - you use it because the job needs it), I find C programmers are a lot more likely to recognize things as ugly hacks and label them. It's partly defensive, because other C programmers are also old and cranky, so you're tagging it with YES I KNOW don't start with me. You don't want to check this in and have, say, Linus think you don't realize compromises were required.
On the other hand, JavaScript people seem to be a lot more 'hey, doing this weird thing works without dying - I'll push it to production.' (YMMV, that's just my experience).
Beauty is in the eye of the beholder.
Is a shortcut reliant on the finer details of a cpu - say for example the finer points of 2s complement arithmetic for type int - elegant (because it's cool) or an ugly hack (because it will probably break on some future architecture and is hard to read for the newbie)?
Depends. Is it wrapped with #if __i386__ || __x86_64__ and followed by a #else clause that contains code without the insane optimization? If so, it is elegant. If not, it is ugly.
Check out my sci-fi/humor trilogy at PatriotsBooks.
I mean, a lot of code is only meant for one platform type. Not writing code compatible with obsolete processors is no great sin.
Which makes this all subjective. There are already comments by people who say that anything not done in C is ugly, so how to tell that these same people didn't pepper other's code with that statement? Many people think code is an ugly hack merely because it wasn't done the way they would code it.
I've seen code written with procedures named for Alice in Wonderland characters and activities. Yet, I've seen that kind of thing defended here as 'creative'. 'Ugly hack' in a comment is a worthless indicator.
True, but I don't think x86_64 is obsolete just yet...
Is a shortcut reliant on the finer details of a cpu - say for example the finer points of 2s complement arithmetic for type int - elegant (because it's cool) or an ugly hack (because it will probably break on some future architecture and is hard to read for the newbie)?
C originally supported many architectures when it came to number representation - 1 vs 2s compliment, 9 vs 8 bit-bytes and so one. But the current architecture is a self-fulfilling prophesy: software expects it as it's been the nom for so long, and hardware is built that way because software expects it. It's not going anywhere until we get some architecture that breaks everything anyway, even at the highest levels of abstraction, like a quantum computer. Standardization benefits everyone.
And if you're not comfortable with bit twiddling and bytewise struct layout, C probably isn't for you. Plenty of other languages in the world. Certainly if trivial stuff like (unsigned) -1 as a way to get max int bothers you, you'd probably feel more at home with high-level languages.
Socialism: a lie told by totalitarians and believed by fools.
I think the key difference here is that when someone uses C they want efficient code to some extent. Small, or fast, or both. In other languages the culture is often "do it in the method approved by the sacred elders", and so ugly hacks may be forbidden and the slow/bulky method is preferred, according to the mantra "do not reinvent the wheel because thou are not as wise as the wheel builder". Or the presence of an ugly hack implies that the novice must clearly have been prematurely optimizing, for as the wise men say tomorrow is too soon to optimize.
For example in Python the claim is that there's almost always only one way to do something, which either means ugly hacks are not possible, or else there's a lack of imagination amongst the programmers.
The higher level a language is, the more it seems that the goal is to get stuff done fast rather than efficient or elegant.
Finally, I have actually seen cases where code is labeled an "ugly hack" when it really wasn't a hack at all but rather not as tiny or or elegant as the author wanted.
Fair enough. Ideally, you should include a generic version without any hackish optimizations, but it isn't strictly required if you don't think you'll ever change CPUs in the future. Either way, if you're writing code that you know is likely to break on a different architecture because of its unique characteristics, IMO, you should at least make it fail to build on any other architecture than the ones you've tested....
Check out my sci-fi/humor trilogy at PatriotsBooks.
And it has them for the same reason it has beautiful hacks:
The complete and utter lack of fucking memory management, forcing the development of such hacks.
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
Coming up after the break, new insights into why dogs lick their balls.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Do we know that the comments were added by the author? An alternative explanation is that the C code has had fuller peer review than other languages, and the reviewer has marked these areas for future refactoring.
For ugly hacks, you can't beat trying to optimise string ops 8 bit bytes on a 60 bit (Cray) processor - they natively used 6-bit chars, and packed four 15 bit instructions in a word, but required jumps to be aligned on a double word boundary to avoid pipeline stalls. Apart from assembler, I think C is probably the only language that could do it at all.
(I it tried in Fortran and then realised there were better things to do in life).
Sent from my ASR33 using ASCII
Ah, but C also has the most beautiful hacks.
Absolutely; which reminds me of a piece of C code I saw years back, and which I haven't been able to find again - perhaps somebody here would happen to know it. If I remember, it was an algorithm to find the best approximation to a straight line in a bitmap, given the two end points. What I remember is that it featured a rather eye-watering construction of two overlapping switch statements (?) which was syntactically legal, but perhaps shouldn't have been. Anyway, if it rings a bell, please let me know :-)
Maybe C developers are just honest and experienced and name what it is.
I won't accuse Java, with it patterns of patterns, when there is such a easy victim like PHP.
PHP developers start their first line virtually with /* big hack */ and finish the last line with /* this is cruel */.
What I remember is that it featured a rather eye-watering construction of two overlapping switch statements (?) which was syntactically legal, but perhaps shouldn't have been.
Reminds me of this monstrosity. It's not two overlapping switch statements, but a switch entangled with a do ... while loop. If that sounds familiar, you may be able to find your code from the links in the External References section.
CJ
Ah, arrogance and stupidity, all in the same package. How efficient of you. -- Londo Mollari
Is that really something to brag about?
Yes. If you don't understand why, you have no place in engineering.
Ideally, you should include a generic version without any hackish optimizations, but it isn't strictly required if you don't think you'll ever change CPUs in the future.
And then your company upgrades its CPUs while you're long gone, and now they need to figure out who the hell wrote that crappy piece of code that keeps crashing on the new CPU, and some other programmer has to rewrite everything from scratch because they can't figure out how your code works and why it's not doing what it's supposed to be doing.
By the way, that other programmer may just be an older version of you who has completely forgotten what the younger version did there... (not that I have any experience with that, cough)
it's been the nom for so long
And it's everywhere, which makes it the om nom.
systemd is Roko's Basilisk.
well, it turns out that they evaluated the ioccc repository.
world was created 5 seconds before this post as it is.
I remember something similar, called Duff's device. Not two overlapping switch statements (I don't think that's possible), but an intertwined loop and switch. I don't see any references to lines in bitmaps, but it's entirely possibe that the same kind of construction was used for that purpose too.
send(to, from, count)
register short *to, *from;
register count;
{
register n = (count + 7) / 8;
switch (count % 8) {
case 0: do { *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
} while (--n > 0);
}
}
I've wondered if a software VM that runs (slowly) with a unusual architecture wouldn't be helpful for finding lots of C bugs. Old C wouldn't care if it was 60 bit with 9 bit char and middle-endian.
Tom Duff was working on very high speed rendering code for Lucasfilm when he found that.
I wonder if 'C' encourages or has a culture of having more comments than some of the other languages.
And as other posters have hinted at when noting code that's trying to run in different environments, the environment C runs in (standard library, etc.) has varied longer (in time) and more (in versions) than the other languages mentioned. Seriously, is anyone writing new code for Ultrix anymore?
What I remember is that it featured a rather eye-watering construction of two overlapping switch statements (?) which was syntactically legal, but perhaps shouldn't have been
Are you think of Duff's Device? It overlaps a switch with a do-while loop: http://en.wikipedia.org/wiki/Duff's_device
I always wonder when I see code like that, which is focused on optimisation, why they use divide and modulo? Aren't they slow? Why wouldn't the coder have used >> 3 instead of the divide by 8, and & 7 instead of the modulo 8? Genuinely asking because these strike me as obvious further speed boosts. Does the compiler automatically make the changes?
Any decent compiler makes those changes automatically. Probably even with optimisations switched off, since these are such a no-brainer.
C coders know when they are using ugly hacks and would take a moment to comment it or name the function with the term ugly hack. They realize it is not elegant and make a note so that future developers do not think it is a reference implementation worthy of replication and emulation. It is basically "this is probably not worth copy/paste, do a fresh implementation".
Other language coders might be using these ugly hacks with pride not knowing anything better.
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
Depends. Is it wrapped with #if __i386__ || __x86_64__ and followed by a #else clause that contains code without the insane optimization? If so, it is elegant. If not, it is ugly.
I would say it would depend even more on whether or not the programmer profiled the code in question prior to making that optimization change. If there is no perceived or noticeable benefit to this optimization, then there is no reason to put something unreadable and platform specific into your code base. I know I worked on a project once where the dev manager wanted to rewrite an entire library in assembly because there was a perceived performance problem in the library. While some more senior devs started the work on that, I profiled the library and saw that it was just one function call that was accounting for most of the performance problems. I also realized that this function call was being made 1200 times per second when we really only needed to make this call once and then we could cache that result. There is no point in putting in any sort of clever hack or optimization until you've identified a need for it. Most modern C/C++ compilers are pretty good at optimizing your code for you.
And per unique set of lines. Makefile and directories with js contains a lot boilerplate the same code.
Wrapping it in ifdefs doesn't make it elegant. It just makes it not a hack. Instead, it makes it ugly, and correct.
The only reason you would have that code still running on those chips is because it's not forward compatible with something more modern.
Or the bean counters don't want to pay the up-front cost of moving to something more modern. Many financial departments take a 'if it ain't broke' stance on computing hardware (and software!). many of them don't factor in electricity and maintenance costs, all they see is "$N Dollars for new server hardware? Why? That's rediculous!"
It's why many COBOL programmers still have a job.
The only reason you would have that code still running on those chips is because it's not forward compatible with something more modern.
Or because those platforms are running fine for the job they are required to do, and thus replacing them do yield any additional value? Proper engineering and business decisions call for changing things when you have to, when you have a reason, a valid reason.
If it ain't broke, don't fix it is not just an empty slogan.
Otherwise someone would have transitioned to some much cheaper, more recent, commodity hardware, and saved the business a lot of cash.
Why cheaper? What if the hardware is already owned? What if the systems therein are running just fine as expected? If it ain't broke, don't fix it
That's definitely not good engineering, or something to brag about.
If it ain't broke, don't fix it. If you don't get it, you are not cut for this business.
Do we go about rewriting all those lines of working-fine COBOL code into whatever is in vogue nowadays? Do you change your working-condition car every time a new model comes?
The only time you change something is if you have a reasonable expectations of gains, or if the cost of keeping something that old goes above a certain threshold. Otherwise, you let it be.
You know you are doing it right when your creation becomes legacy, and it runs unchanged for years, many years, providing value and ease of integration. That is, when the thing you created delivers value AND does not give reasons to decommission, then you know you did your job right, and you can brag about it.
The only reason you would have that code still running on those chips is because it's not forward compatible with something more modern.
Or the bean counters don't want to pay the up-front cost of moving to something more modern. Many financial departments take a 'if it ain't broke' stance on computing hardware (and software!).
That should be your reaction by default because change always incur costs and risks. So better have a valid reason to incur in new costs and risks. The operation of something old has to be costly, or difficult, or subpar, or the change to something new has a real chance of decreasing costs, or the change is part of a larger, business-related strategy. Only then you have a valid reason to change.
many of them don't factor in electricity and maintenance costs, all they see is "$N Dollars for new server hardware? Why? That's rediculous!"
The cost of electricity and maintenance costs may appear high, but if they are constant and well known, then they imply minimum risk. You can account them for (and you should if you are running a business, ... otherwise, don't run a business.)
If maintenance costs are limited to hardware support, you can ignore them. If it is software costs in terms of difficult deployments or software upgrades, then yes, that is the threshold over which a change needs to be initiated.
But if you don't have that, then, don't fix what is broken. Let the cost be constant and the risk known (and ergo manageable.)
It's why many COBOL programmers still have a job.
No. It is because most COBOL systems are running within expected parameters. After running for years, if not decades, they are well known, their features and their glitches. They are battle tested.
And given the large size of these systems, it would be ridiculous to re-write them into something else. Why? Because that implies unknown risks and costs. Risk is something that grows exponential to the size of the re-write. This is not hand waving, this is a fact.
There is more to software engineering than writing code. It also involves delivering solutions and values and managing risk. Software engineering must be seen as integral part of a business process. Otherwise, we are not software engineers, but mere code monkeys throwing shit at the wall, packaging and selling whatever sticks.
This is a complete and total lie. There may be one "good" way to do something (for values of good), but there are many ways of doing soemthing.
It's not a "complete and total lie." The Zen of Python, "Python Enhancement Proposal #20", states:
There should be one-- and preferably only one --obvious way to do it.
It's one of the guiding principles of the language's design. Type "import this" into a Python command-line, and PEP20 gets printed out.
It is pitch black. You are likely to be eaten by a grue.
I remember that! It's why the C standard allows a void* to be bigger than an int* - char*s on Cray's needed extra bits for sub-memory-address-addressing.
Socialism: a lie told by totalitarians and believed by fools.
Self preservation and regenerative biological devices known as the human body.
Yeah, sure I might use limits.h too, but that wasn't always dependably there, and it's one more #include with that many more lines of code to parse when compiling - stuff that used to matter.
Socialism: a lie told by totalitarians and believed by fools.
A thousand times this.
CLI paste? paste.pr0.tips!
Poorly designed or obsolete languages:
"do not reinvent the wheel because thou are not as wise as the wheel builder"
Rightfully designed, efficient languages:
"do not reinvent the wheel because you can be more efficient, and focus on behavior/app logic"
The perfect case of C following the latter is the Linux Kernel--they do not reinvent a lot of things.
A kernel always reinvents things. New file systems, new paging algorithms, new memory management. Maybe in applications you don't reinvent the wheel, but in low level code you do that often.
I was sure you were going to name a coworker. Presumably an unattractive one with little applicable talent.
Ever wondered why Pascal had packed arrays of char? It was for the Cyber, running 60 bit words. Did a fine job.
C has the PDP 11 architecture baked into its soul, with 8 bit bytes being part of that.
Worse, C has influenced all modern architectures to live within that crude model. For example, how could we have 64 bit pointers without using the upper 16 bits as tag bits? (No, we are never going to use them in our life times, memory access for that much memory would just be too slow. 32 to 64 bit is not the same as 16 to 32 bit.)
Like I said, the minimum requirement is to break the build, so it won't be crashing, because it won't compile. The minimum requirement in a corporate environment is breaking the build plus an explanation of what the code actually does (if it isn't immediately obvious) so that when someone tries to manually enable it on a new architecture and it doesn't work, they know how to rewrite it generically.
Check out my sci-fi/humor trilogy at PatriotsBooks.
For example in Python the claim is that there's almost always only one way to do something, which either means ugly hacks are not possible, or else there's a lack of imagination amongst the programmers.
You're not being fair to Python.
Python's specific mantra (as listed in PEP 20) is: There should be one -- and preferably only one-- obvious way to do it. Emphasis mine.
Python, like any codebase, has a whole spectrum of "clean code" to "ugly hacks", but the richness of the language (and various libraries), provide a much richer foundation to let you avoid those ugly hacks.
(Unlike Perl, where the richness of the language somehow seems to encourage ugly hacks ^_-)
Misleading titles? Inflammatory blurbs? Keep in mind that Slashdot is a tabloid.
Is that one equal sign, two, three?
Both PHP and JavaScript define one as assignment, two as comparison after implicit conversion of values to the same type, and three as comparison of both type and value. Douglas Crockford, author of JavaScript: The Good Parts, proposes never using == at all in these languages, instead explicitly converting everything before comparing them with ===.
Python handles it differently: one is assignment, two are comparison as defined by the types with fallback to is if not defined, and is is object identity (similar to Java == on objects). Python allows types to override operator == in much the same way that Java objects have an equals() method, and the built-in types do implicit conversions on numbers but little else.
How much is "duck" + 1 + orange()?
If you're going to overload operator +, there's a safe way and a less safe way. JavaScript performs the safer way of trying to convert things to strings. PHP, on the other hand, has implicit conversions biased toward numbers, which is why it doesn't overload +, instead using a separate concatenation operator.
Here's how it plays out in JavaScript:
1. Add parentheses on the left for infix operators of equal precedence: ("duck" + 1) + orange()
2. If one side is a string and the other anything else, concatenation is used. "duck" + 1 becomes "duck" + "1" which is "duck1".
3. "duck1" + orange() will also use concatenation.
How many languages have direct hardware access? Or inline ASM code?
A strictly conforming C program can't do either. Casting an integer to a pointer and dereferencing it is undefined by the International Standard, even if a particular implementation defines it to perform MMIO (memory-mapped input and output). Nor is asm a standard keyword. GitHub lumps C and C-as-extended-by-popular-implementations into one set.
It probably also horrifies some programmers who think it's premature optimization (if it's not yet the end of the world then it's too soon).
It's not premature if it's documented properly. Algorithmic optimizations can carry a big O analysis, and micro-optimizations can carry before and after reports from your profiler.
Right now I am working on a project with a 4,000 line function
Even when coding in assembly language for an 8-bit microprocessor, I'd probably extract methods an order of magnitude before 4000 lines.
related classes scattered across multiple projects so they can't compile easily
Create a new project whose purpose is to provide classes to these projects.
If your boss complains about not having time=money for refactoring, try first seeing whether your boss has heard of Dave Ramsey and his Total Money Makeover. If you're not familiar, Mr. Ramsey is a famous proponent of sacrifice to pay down personal debt. Then explain to your boss that your codebase is likewise deep in debt, and dealing with messy code like that is like having to spend a lot of your revenue on paying interest on that debt. Refactoring to pay the principal on your project's technical debt may delay getting the next feature out, but it might help you get the next six features out in same time that you otherwise would have produced only four.
If you're surrounded by co-workers who keep your team in debt, then you should work with your boss to start teaching them coding practices that will get your team out of debt.
So your boss is part of the problem. Does your boss have a boss?
Failing that: Do you have a Stack Overflow account? You might want to start posting answers there on your own time. By the time you get 1000 reputation or so, you should gain access to Stack Overflow Careers.
Degreelessness Mode?
Not everything that can be measured matters; Not everything that matters can be measured.