Smallest Possible ELF Executable?
taviso writes "I recently stumbled across this paper (google cache), where the author investigates the smallest possible ELF executable on linux, some interesting stuff, and well worth a read. The author concludes, 'every single byte in this executable file can be accounted for and justified. How many executables have you created lately that you can say that about?'
I've always wondered what all the glibc overhead (compared to f.i. uClibc) does. I've never noticed any functional difference when setting up a initrd image by using uClibc instead of glibc.
I think there are quite a few. It's seen as a challenge, and does have practical uses. Have a look at Toms Rootboot disk - it includes a web server, a telnet server, a telnet client, an nfs client, wget, gzip, bzip2, vi, a whole load of network drivers, and a tonne of other stuff, all compressed down onto one floppy disk. Only I've never quite been able to find the source code for any of it despite spending a small amount of time looking - possibly someone would be able to put me right on that one.
There are also lots of interesting articles on linuxassembly.org.
Andrew
Soon I realized that smaller programs are not the end-all goal of programming. If a slightly bigger program is easier to understand for the next person who modifies/maintains it, then that is the new "Right Thing" for that application... and I realized the efficient progamming of the PDP days was a biproduct of necessity more than anything else. It's seldom needed with today's blazing hardware capabilities.
This isn't to say that many of today's programs are over-bloated, but just to reinforce the trade-off between small and easy to understand.
No, I don't think you can make it any shorter even by removing that call. The program is 45 bytes, and the 45th byte is required to be there (a critical part of the ELF header), or else it won't execute at all.
Did you bother to read it?
That still included the C stdlib.
From the article, after removing that:
"Now that's tiny! Almost a fourth the size of the previous version!"
The nasm assembly compiler site that he mentions in the article seems /.'d, theres a sourgeforge project site instead.
Unless, of course, you're using ReiserFS with tail packing turned on.
The Demo scene had always beat the usual coders. Not long ago we had a national festival with guys coming all over from Russia. Some demos, mainly Amiga and Spectrum, were impressive. Some 3D effects were shown on machines that lack any types of acceleration. And these things ran nearly with the same speeds we frequently saw in some powerful Pentiums. Besides, the PC demo presented things shrinked to the impossible with a speed, sound, space and color effect that beated many popular games.
I wonder the speed and the effects some Doom III would have if it was written mainly in Asm...
And no, you can't change the page table size, it's hardware-dependent. Most of the other archs seem to have similar or larger pages, too.
Why do I know this? It's "write your own VM" month in my OS class. Next week we get to start swapping out to disk...
A witty [sig] proves nothing. --Voltaire
A dos .com file does not have a lower limit. .COM files are without headers, so having a realy tiny .com file is not very hard ;) It sais more about the crap turbo pascal puts in the .com file.. a .com file that returns correctly can just have one byte in it: 0xc3 (RET)
*cough* Karma whore *cough cough*
- I seem to remember making some damn small Turbo Pascal .COM files. Under 4096 bytes, IIRC.
The Amiga E language (sort of a Pascal+C+whatever like beast) compiler stuffed a hello world program in 80 bytes or so. Pure executable, no external libraries needed.
The author's list of self-designed languages is definitely worth a look.
http://linuxassembly.org/asmutils.html
Check it out, download it and assemble it.
They create the smalles set of binaries for the basic linux tools that I have found and they employ a good portion of the stuff mentioned in this paper.
They make busybox look bloated by comparison.
Another neat trick is to use the ld options "-Wl, gc-sections" when linking a static binary -- it tries to weed out all the unused portions of the libraries it links against.
The last trick I usually use is to link against uClibc or dietlibc rather then glibc. Makes a noticeable difference. RedHat has been working on a program called "newlib" which is supposed to do the same thing as uClibc or dietlibc but better (for embedded stuff).
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
true, but a small elf still takes up less memory once loaded from disk. though with 512mb+ in todays systems, people can even afford to run more than one office Xp app.
cl ft.cpp /O2 /link /opt:ref /entry:main
1024 bytes - the linker always pads to the next page.
If you look at the .text section, it's 16 bytes.
A "Hello world" program generated with this API takes only 267 bytes.
You may say, yeah but how often will you printf more than 1024 bytes? Exactly,- practically never. Which is why this sort of crap is not showing up in testing and DOES show up when people are trying to crack it.
1) The increase in OS requirements is partially due to the increase in OS functions. XP provides a lot more eye candy than NT, which needs more processor power to handle. You may not think it's a good idea, but most people like it.
2) The increase in OS requirements is mainly due to an increase in software requirements as a whole. An OS is worthless if you can't run anything on it, so you need to set your requirements with software in mind. MS made this mistake with Windows 95. Yes, technically IT would run with 4MB of ram, but that wasn't enough to load anything else. XP's stated minimum isn't the actual minimum, but a practical one wheny ou account for applications.
3) As others have mentioned, compact code comes at the price of maintainability. Sure, I can write a program in 100% assembly, and then if I'm realyl good tweak the machine code to make sure it is as efficient as possable. Now try and maintain that. This is hard enough if it's a tiny app, but if it is something large like, say, Mozilla even the orignal programmer would find matenence very difficult and anyone else would find it almost impossable.
4) Along those line, portability requires that you code in a higher level language, and often that you make some changes that increase your code size. If you do everything in optimised assembly, well it's a one platform thing. I can gaurentee that you have to do a massive rewrite of an assembly Windows app if you want to make it run on x86 Linux just because of the API differences. If you are talking another hardware platform, then it's a total and complete rewrite.
5) Your 64k demo thing I'm assuming is refering to the now infamous Farbrausch demos. It is simply stunning what they can get done in 64k BUT it comes at a huge price. First there is the memory usage, look at your task manager sometime when one of those is running, they use like 80MB. Because of their tiny disk usage they can to decompress to memory. Second their compatibility is horrable, their newer one FR22 works properly on my sytstem at work, but not at home, the only big difference being at home a have a geForce 4 at work I have a GeForce 3. Finally, these thigns are only made possable by the "bloated" Windows framework with things like DirectX to simplfy low level access.
6) Most people see little point in trying to make things run well on a 386 when you can get an entire system running at over 1ghz for about $500.
Wired 1995 Surprize coding compo :
.com program that does the following :
:-)
:-)
Write the smallest possible
1, Input a number from the keyboard, call it N
2, Go in mode 13 (vga), draw N 3x3 squares without the central pixel (N * 8 pixels to draw), no square should be adjascent to another.
3, Wait for Enter
4, Exit
Results were
1: Walken/Impact Studios, 48 bytes
1: (ex aequo) Paranoia, 48 bytes
2: KLF, 51 bytes
For info, Walken's version was drawing the squares at different positions every time his program was ran (don't ask me how)
Our own attempt (aegis) yielded 52 bytes, but we were disqualified because we did not support the key "0"
ahh... fun...
lone, dfx.
No. The 45th byte of the resultant program is required to be there as part of the ELF header (Linux won't run the program otherwise). The code which generates the value 42 occurs way before the 45th byte of the program in an unused portion of the header. In fact, the return value could be a couple of bytes longer without changing the length of the overall program.
flossie
Write now. Defend liberty
"Ah, kamisama! Ore no atama ni ono ga arimasu yo!" is quite correct. To my ears, the use of "wa" makes the rhythm of the sentence quite lethargic. Though it may not be correct textbook Japanese (the use of the word "yo" already makes it conversational, and thus, some leeway in grammar is allowed), between peers, this usage will be quite acceptable.
Yes, I do speak Japanese fluently :-P