Simpler "Hello World" Demonstrated In C
An anonymous reader writes "Wondering where all that bloat comes from, causing even the classic 'Hello world' to weigh in at 11 KB? An MIT programmer decided to make a Linux C program so simple, she could explain every byte of the assembly. She found that gcc was including libc even when you don't ask for it. The blog shows how to compile a much simpler 'Hello world,' using no libraries at all. This takes me back to the days of programming bare-metal on DOS!"
*sigh*
Been there done that... on the PDP-11 in 1979.
Adding a static 11k or so is insignificant for any program which actually does anything useful.
Ok, this is wicked great in theory. Our programs have become bloated. We do have them taking up too much RAM, HD space, and CPU time. But after reading through this in-depth analysis I have to wonder if it's all worth it.
If we're willing to leave behind all pretenses of portability, we can make our program exit without having to link with anything else. First, though, we need to know how to make a system call under Linux.
Or I can just write it the old way, making the file size larger and not have to concern myself with portability and how to make system calls under Linux. After all that's what the whole point of this all was right?
Since when does a Hello World program not actually output anything?
I think you missed the point of the article.
The author is trying to highlight that amount of bloat in modern programs is so rampant that even "Hello World" is excessively over sized for what it accomplishes. How can we as programmers expect fast, efficient, lightweight code when our compiler (even ones as popular as gcc) are bloating the program without being asked to?
Why doesn't it fit in TFS?
If God forks the Universe every time you roll a die, he'd better have a damned good memory.
As to the point of this... we recently had a story about how computers had gotten "too big to understand".
And here we have a program, 45 bytes long, for which every single byte has a well-explained purpose. It's getting back to the bare metal and that's what makes it interesting. =)
At the end, the code was assembler, and the compiler wasn't even called - just the linker. I can't say for sure where a C program ends and an assembler program begins, but I'm fairly certain that the last few iterations are assembler, based on the "let's do away with the compiler" suggestion.
Also, "Hello World" programs have to, you know, actually display the message "Hello World" - this is a program that isn't written in C, and doesn't write "Hello World" - care to revisit the title of this entry?
Ken
I understand the point of the article, and everything else mentioned here. I just think that the amount of time spent eliminating 11k from a program in this case is irrelevant because any real application is going to need libc. It's not like she needed to strip it out so it would fit inside a tiny corner of an embedded processor - she's probably running it on a PC with anywhere from 1GB - 4GB of RAM.
But my stupid build process that generates the bloated Hello World is much more maintainable. Now get off my lawn.
Colorless green Cthulhu waits dreaming furiously.
Shouldn't the linker remove unreferenced functions?
I've had this problem with gcc for a while, with C++ code. I was writing some embedded code, and I wanted to use some simple C++. Just by adding a #include of one of the stream libraries. the executable grew by 200k, even though none of it was referenced. The C++ code in iostream is template-generated anyway, so even if the compiler wanted to include the code, it can't until I instantiate it.
But it really is much simpler. The reason your 'average first year comp sci student' might find it less understandable is because they dont actually understand the bloated version either. Using a high-level language doesnt reduce complexity, quite the opposite in fact, it greatly increases actual complexity. It simply makes it easier to get something done without understanding it, and thus makes it easier to kid yourself into thinking you know what you are doing, when you dont.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Friends don't let friends enable ecmascript.
Patch the strip utility on Linux, send in the patch and see if it gets accepted. Then let's see a follow-up of that on Slashdot. She's taking a lot of flack here; but there's value in the work. It just needs to be applied in a more practical way.
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
Yeah, but the 45-byte program doesn't say "Hello World". In fact, there's no example that I can find in TFA that outputs that message or any other. So the summary is incorrect on its face. TFA doesn't show a simpler "Hello World" program; it doesn't show any sort of "Hello World" program at all.
I feel cheated, and tricked into reading an article that didn't do what was advertised.
(It's not the author's fault, of course; the author didn't claim to be writing the sort of program that the summary talked about. Though I was a bit disappointed that only the first few examples were in C. The article was almost entirely about assembly-language programs. So again, I was a bit disappointed, since I was hoping to learn something about making C programs smaller. This was done only in the first example, and it was made smaller by removing its call on write() so it didn't output anything at all. I already understood that I can make programs smaller by removing all functionality. ;-)
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
Hm, if I make a file 'hello.py' with the following content:
print 42
And how big is Python?
Granted, but how big is linux, letting you run that ELF?
The fact that helloworld.c compiles to 11k has less to do with bloat than it has to do with people generally not caring about 11k. You could get rid of that 11k, but to do so, you'd have to make trade offs that either make real programs either slower or bigger, or make compilation slower. Very few people would make those trade offs in the other direction. Those that do either use special purpose compilers or (more likely) write in assembly.
The cake is a pie
If you're actually programming on "bare metal", you're not really using DOS, are you? After all, DOS is an operating system -- a layer between your code and the hardware.
You mean 50. The rest of the civilized world, those of us not eating tapioca pudding and watching Matlock between shifts of drooling on the keyboard, switched to better languages like C++ and Java
TFA explains it: main() isn't the true start of the program, _start is. That resides in ctrl.o, which fires off a bunch of setup stuff before calling __libc_start_main, which in turn kicks off main(), and off your program goes.
To put it as a car analogy: What she found is that turning the key to start doesn't just activate the starter, it also activates the airbag system, the traction control, and the radio too. And if all you want to do is start the engine to prove that it runs (ala Hello World!), then it's kind of silly to lug around all that extra "unnecessary" crap too.
Or something like that. Sadly i'm a better mechanic than a programmer (4yrs vs 1yr), but i'm working on fixing that. :)
Mod parent up. This is all a semantic game about where significant portions of functionality are stored (and thus counted or not). After all, back in the "pre bloatware" days, you'd have had to manage all of the complexities of machine management and I/O yourself. The assembly would have been much larger to achieve the same effect.
Yes, you can make the argument that Linux comes with screen I/O, a scheduler, memory management, etc. already, so that's just overhead, but as others have pointed out, you can say the same thing about bash. It comes everywhere and is just overhead.
STOP . AMERICA . NOW
Try programming a micro-controller and suddenly you'll be facing hardware limits that force you to favor small unreadable code over bigger more maintainable code. There is a solution for it though... comments! Lots of them :D
Doesn't matter anyways because demand paging ensures that only the parts of libc that your program actually uses will be pulled into memory, so all the extra junk will remain on disk.
So did the original - it was launched from the command prompt and the shell was used for the output of the return code. The shell is part of the base OS anyhow, and you can't boot Linux without the shell.
Since the output is the Answer to the Ultimate Question, it necessarily incorporates or encodes every possible output of every possible program, including the string "Hello World!".
The method for extracting the particular output desired is left as an exercise for the reader.
OOP makes people lazy and gives them less of an understanding of what's actually going on.
All that OOP code you write gets translated back into something procedural, you know.
Edward@Tomato - /home/Edward/ man woman
man: no entry for woman in the manual.
"Qua!?"
It's funny that this always come up in conversations about bloat, because not everyone has to program for embedded code, because not everyone is programming embedded devices. It's almost like you guys are a subsubculture of programmer, to the point where many of you guys come off with the general attitude of being superior, when in fact, neither approach is superior, just different based on the situation.
/rant
Not only more maintainable, but filesystems should use 4k per sector, specially on raid's for performance stuff discussed on this post. This means that in a decently configured modern system, anything under 4k will still occupy 4k on disk.
- Human knowledge belongs to the world
That is what I said.
That is not a question for which a single good answer can be given, other than "it depends." There are so many variables. Just how often will how many processors do extra work (that will allow you to calculate the lost electricity - it is a real and calculable cost.) RAM usage also has real costs associated, including electricity, but calculating the final price tag is far more complicated there. But the bottom line is that it is way too much work to really track down and calculate to the penny the costs of innefficient code, even in the narrowest of sense, so no one does. We just sort of guess-timate, and we work within systems that dont encourage us to account for costs that can be passed on unaccounted for, so we generally do it in that manner. That the outcome is naturally for many actors to weigh the decision purely in terms of their own personal and immediate costs and benefits (15 minutes of my time vs. small performance hit to whomever uses) without accounting at all for many less personal and less immediate effects.
And often enough that works just fine. But there are cases where it will bite you hard. Knowing which situation is which is important. How are you going to do that if you only know quick-and-cheap method without understanding the larger picture?
=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Friends don't let friends enable ecmascript.
It's too bad with all these things you "heard" that you didn't happen to hear that programs are written for environments other than Windows (or Linux, Mac OS, etc), and for devices other than PCs. It's unfortunate that you are so in the dark that you don't realize that there are entire industries that rely on devices that have tiny fractions of the memory and processor speed that you ignorantly assume that we all have access too. You probably have no idea how often you are affected by devices that run 100 times slower than the desktop PC you gave as an example, or also have 1,000 times less RAM. On some of these devices C is the most advanced language you can get short of writing a compiler or interpreter yourself.
Sure, pissing away storage space and waving a hand at execution efficiency is fine for some circumstances, but sometimes it's a luxury you can't afford. The world of software development is far bigger than the tiny little niche of programming you've been exposed to.
I suggest you use some "real" perspective, and reevaluate what a "real language" is.
"OOP makes people lazy and gives them less of an understanding of what's actually going on."
I've noticed that people who critise OOP rarely understand what it is and tend to think it's tied to a particular type of language. OO is a way of thinking about a problem at a higher level than functional decomposition. You can code an OO solution in whatever language you like. Done properly it leads to elegant solutions eg; many of the examples in K&R exhibit the features of OO design and they were created before the term "object orientated" was coined. I assume when K&R used function pointers as elements of a struct they "understood what's going on", right?
"All that OOP code you write gets translated back into something procedural, you know."
Perhaps that's because...you know...OOP is procedural.
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.