Slashdot Mirror


Smallest Possible ELF Executable?

taviso writes "I recently stumbled across this paper (google cache), where the author investigates the smallest possible ELF executable on linux, some interesting stuff, and well worth a read. The author concludes, 'every single byte in this executable file can be accounted for and justified. How many executables have you created lately that you can say that about?'

21 of 451 comments (clear)

  1. umm.... yeah? by Lxy · · Score: 2, Insightful

    Basically what the other is saying is that by default, C is somewhat bloated (you need to include massive libraries just to use one function). Writing system level calls in assembly can replace the unnecesary bloat of a library that's only being used by one function.

    I remember this trick wehn I was learning x86 assembly. I wrote a hello world program in assembly. Assembled, it came to something like 35 bytes. In C++, it took over 10K.

    Now, also see the statement that he is abandoning portability, because he's using linux-specific system calls. So, in a nutchell, C++ makes big code that's portable, assembly makes tiny code that's static.

    Did I miss something or was this a long winded article about why assembly is better than C++?

    --

    There is no reasonable defense against an idiot with an agenda
    :wq
  2. Microsoft. Yes, Microsoft by tomoose · · Score: 4, Insightful

    Reminds me of one of Bill Gates' first programs - Micro-Soft's 1975 Altair BASIC. Unfortuantly the page I wanted to link to has gone, but this is something from the register at the time: http://www.theregister.co.uk/content/4/18949.html

    Finally found a web archive of the page I wanted: http://web.archive.org/web/20011031094552/www.rjh. org.uk/altair/4k/index2.html

    A real pity that standards have slipped so much since then.

    (I refuse to post anonymously even though I have mentioned Microsoft in a thread about Small Exes. So there :p )

  3. Efficiancy in OS programming needed by TibbonZero · · Score: 5, Insightful

    We really need more efficiant programming in OSes today. Look at the system requirements for OSes over the past few years. It's gone crazy. Check out the requirements for NT Workstation 4.0, Windows XP Pro and Windows 2000 Pro.

    Doesn't something seem messed up? What have we really gained since 4.0 that causes 4x the memory, 3x the procecssor, and almost 15x the harddrive space? Is USB and Firewire support really that big? And have you ever tried to run XP on the min system? It doesn't work so well. I remember being able to tweak a system to run Windows 95 on a 386 with 5mb memory and a 45mb harddrive. It wasn't pretty but it could run. Today if you aren't going 1ghz+, then they want to leave you behind.

    They are just using really fast hardware as an excuse for bloating the code.
    Even Linux (redhat moreso) is guilty of this.
    Remember when awesome games could fit on a handful of floppies? I think that could fly today if they tried. Look at the Demo scene. 64k can do alot of graphics. The most awesome games like Betrayal at Krondor were only a few floppies. Sure, if you have big hardware use it, but don't waste it. Programmers are just getting slack and including (literally) everything in the world, and not writing anything for themselves. They aren't looking to optimize stuff, just to kick it out and make money (obviously open source isn't guilty of the money or the fast kickout thing)...

    --
    Tibbon
    tibbon.com
    1. Re:Efficiancy in OS programming needed by Zaiff+Urgulbunger · · Score: 2, Insightful

      It'll go full circle at some point. A kind-of-example, might be PalmOS in so far as it is wildly stripped down and slim when compared with WindowsCE. The problem is that it costs at lot to develop really beautifully engineered code. Some day it'll happen though - like a nice, tight office suite that loads ridulously quickly and does everything you want.

    2. Re:Efficiancy in OS programming needed by coupland · · Score: 5, Insightful

      Unfortunately my take on this situation is a bit more sinister. Your post mentions that NT 5.0 (Windows 2000 Pro) requires 64M of RAM, yet NT 5.1 (Windows XP) requires 128M of RAM. Why the twofold increase for a minor upgrade? Well, consider two things:

      1. Windows XP was released during one of the slowest hardware sales slumps in PC history. All the big players were hoping to see XP spur sales. Not coincidentally, for many people XP required a new PC.
      2. Microsoft can only stand to benefit from these PC sales in the form of OEM licenses.

      Yes I'm cynical but I've always been of the belief that the bloat in XP is engineered, not the simple result of bad programming. To think that the project managers and marketing don't talk about these sorts of things is naive

    3. Re:Efficiancy in OS programming needed by Fastolfe · · Score: 4, Insightful

      I think a lot of it is due to pressure to get a product out. Developers are relying exclusively nowadays on high-level languages, even in OS design, and those that write the compilers don't spend as much time on getting good, compact, precise and optimized code out of high-level code. Nobody cares. CPU is cheap, hard disk is cheap. Why should they work to make their stuff efficient when they can just claim their product is so advanced it requires twice the resources.

      Part of it also lies on the shoulders of developers. A lot of developers today are simply programmers that learned C in high school. They have little understanding of machine languages, assembly, or the CPU architectures they're coding for. They know just what the high-level languages look like and one or two ways of accomplishing their goal. What they need to know is how their software design decisions actually get implemented by the assembler and executed by the architecture. Memory efficiency never even crosses their mind. Who wants to pay for programmers that actually know their shit when they can just claim their product is so advanced it now requires four times the resources?

      Perhaps this is another area in which OpenSource software can shine some day...

    4. Re:Efficiancy in OS programming needed by Kashif+Shaikh · · Score: 2, Insightful

      Yes I'm cynical but I've always been of the belief that the bloat in XP is engineered, not the simple result of bad programming.

      Woah...isn't "engineering" and "bad programming" synonymous to Microsoft? Look at all the stuff they engineered over the years!

  4. That goes to show those C bigots by Anonymous Coward · · Score: 2, Insightful

    You know -- the bullshitters that say: "Optimizing C compilers write small/better/faster code than hand-tuned assembly."

    Hand-tuned assembler is always faster/smaller/better than C code, except when it comes to portability.

    And this just goes to show that fact again!

    1. Re:That goes to show those C bigots by Waffle+Iron · · Score: 5, Insightful
      Hand-tuned assembler is always faster/smaller/better than C code, except when it comes to portability.

      Maybe in theory. In practice, once your program gets too big to fit it all in your head at once, you're going to run out of the mental energy required to stay ahead of the C compiler (and remain bug-free).

      If you've disassembled the output of a good optimizing compiler lately, you'd see that it usually produces pretty good code. Except for the inner loops of numerical algorithms, I doubt that anyone will consistently be able to produce code that is more than 25% faster than the C compiler.

      The thing is, the compiler is able to spit out this code at thousands of lines per minute all day long. It doesn't get tired. The human programmer is going to get tired of the boredom, and will start creating higher level abstractions in assembly. He'll start using macros. He'll use a simplified parameter passing protocol so that he doesn't have to inline and hand-allocate the registers for every little subroutine call.

      Before long, he's fallen behind, and the C code will run faster overall. And the C program will have taken less time to write, as well.

  5. Re:Small virus catcher (for DOS) by RinkSpringer · · Score: 4, Insightful

    Uhrm, not really. Almost any COM file infecting virus will read the first 3 bytes and check whether it's a JMP instruction (0xEB and 0xE9 opcode). If they are not, they usually refuse to infect the file.

    Therefore, this file wouldn't be infected by like 99% of all COM infecting virii...

  6. On bloatnesses by Ektanoor · · Score: 4, Insightful

    Well this reminds me the golden days of DOS (not Denial Of Service but Disk Operating System... well, anyway it didn't made a difference). Back then people fought for every bit of code. And assembler was as popular as C or Pascal.

    However, using assembler this way is not the most optimal resource. Frankly this piece of code is only useful if you need some real tiny program and you are running out of space and speed. But, today, 99.99...% of tasks don't need it. The optimal way to use such tricks is to concentrate in tasks that really need "the best and fastest code ever". These are drivers and situations where speed's price costs gold. Usually this is done by injecting the necessary asm directives into C or any other language. Writing everything in pure Assembler is unpractical and the result may become harder to understand than the Rosetta Stone.

    However the article is making a point - how unoptimised are the present compilers. For example, GCC is mostly C in C. It makes it highly portable, but, if anyone decided to repeat Turbo Pascal feat (most of its base code was Assembler), I know that binary code would shrink to the impossible. Right now we may not be feeling this drawback as bloatness still doesn't clog everything. In the future this situation may change if speed and reliability turn to higher priorities.

    Some note for the bloat FUDders: This is not a reason for Linux distros being bloat. First learn to be rational on your needs and don't install everything in one box. Second, learn a little bit of administration, maybe some programming and kick that (mega_kernel) + (some_highly_featured_libs) + (several_unuseful_apps) out of your box. Then you will know that Linux can help fry eggs on your processor with lightning speed. Till then, keep the flame for yourself and read "Why I switched from Mac to Windows".

  7. i love this by sysrequest · · Score: 2, Insightful

    it may not be all TOO practical, since a lot of people try to ensure that their program runs on multiple architectures and platforms, but I also miss the old (DOS) days when the demo scene tried to optimize their intros to fit a half an hour of entertainment into 64k, with full sound blaster support. the registers of the vga cards were abused to no end, lightening fast assembler procedures were optimized either for size, or for speed by unrolling loops, etc.

    while that isn't practical anymore these days, a LOT of code has become very sloppy. More than once have i stumbled over some college kids c app that was supposed to demonstrate linked lists, and instead, it was using one class with an array.

    programming is an art, like acting. many try and are good enough for some purposes, but only a selected few are masters. sounds pretty damn philosophical, don't read too much into it :)

  8. People are missing the point by Kenneth+Stephen · · Score: 5, Insightful

    Looking through the comments here I see two main threads : (1) Squeezing out the last few overhead in a program leads to hard to understand / maintain program and thus is not worth the effort. (2) Whats the big deal anyway in this era of 100 GB disks and 2GHz processors?

    While both these criticisms are valid, they miss the point. Firstly, it wasnt the objective of the author to squeeze the last few bytes out of that program to save resources. He was just putting his hard-earned knowledge to use. He was doing it because he could! This is the same motivation for people who climb mountains : because the mountain is there, and because they can climb it. Indeed, if the author were seriously looking into saving resources, he'd hardly be wasting his time on a trivial program, would he?

    Secondly, one of the authors intentions was to demonstrate the limits to which austerity could be taken to. Certainly, this was a trivial program - but the same principles could be used to shrink larger non-trivial programs, and it those cases, the savings could possibly be larger. Of course, it those cases, the largest savings would come from a good optimizing compiler rather than crunching the headers together. More importantly, the author has exposed whole new ideas and lines of possibilities to programmers.

    --

    There is no such thing as luck. Luck is nothing but an absence of bad luck.

  9. Excellent troll! by PurpleFloyd · · Score: 4, Insightful
    Linux software is horribly bloated, like even "ls" is above 30k
    ls is probably statically linked (all necessary libs reside within the executable), so it will function in almost any circumstance where the executable itself is not corrupted. Would you really want to try to repair a broken system without ls? Most critical utilites and shells are available in statically-linked forms (if not, you can do it yourself). While executable size is an important consideration, it isn't the only one. I would rather have a set of basic programs (like ls) that work even if all the lib directories are toasted, than to save a few K here and there, and have a system that could never pull itself back up if broken.
    --

    That's it. I'm no longer part of Team Sanity.
    1. Re:Excellent troll! by vranash · · Score: 1, Insightful

      Actually dude, last time I fscked up my libc install it turned out it, cp, mv, and most of the other utilities were NOT. In fact, just compiling a Hello World! program with dynamic linking takes up 5K, and something around I believe 275k statically linked, so indeed Linux's binaries do appear quite bloated, at least on the lower end (2.95.3 vs 3.2 sees a difference in larger executables, I've seem ~10% change either smaller or larger depending on the program being compiled)

    2. Re:Excellent troll! by Dahamma · · Score: 2, Insightful
      Can't believe this was modded up to 5!

      The first post called Linux software bloated (oh come on, glibc is VERY bloated - try using at uclibc or dietlibc - they don't have 100% of glibc's functionality, but for embedded systems it's amazing how much space they save) and it's a troll?? Then he makes a statement that starts with "probably" and turns out to be wrong - Debian and RedHat 'ls' definitely link with several libs. I won't say "probably", but who wants to bet on how many of the other major distros do as well?

      Ok, to put something less whiny in my post - if you're worried about having a functional set of utils for emergency use, install busybox. For ~800k statically linked you get a ton of utils in a multicall binary, along with a shell. Available from the RedHat install CDs, in fact.

  10. It doesn't save any disk space by Gerry+Gleason · · Score: 5, Insightful
    Below the filesystem size quantum, you don't save anything anymore. You can't allocate less than a page of memory either.

    Beyond some point, the article is really just silliness, interesting or not. Below 512 bytes, your not going to save anything on any system. Ok, there are filesystems that compress things further for squeezing into flash memory and such, so maybe there are some marginally useful applications, but still the header overlapping is a bit much to be worth considering.

  11. Re:4K Demos by /dev/trash · · Score: 2, Insightful
    I wonder the speed and the effects some Doom III would have if it was written mainly in Asm...

    I wonder if we'd even HAVE Doom III now if it was written in asm.

  12. Small size != More efficient by yorgasor · · Score: 5, Insightful

    Just because a program or executable file is smaller, doesn't necessarily mean it's more efficient. For instance, some compiler optimizations actually produce larger executables. If you unroll a loop, it actually generates code for each iteration of the loop, but saves time because it's faster to keep going forward than to branch backwards to run through the code again.

    Similarly, you can have inline functions that insert the inline function directly into the function calling it. Every function that calls an inline function would get a copy of it, which produces larger code, but saves a lot of time since it doesn't need to push the arguments on the stack, branch to the new function, and return with the value.

    Finally, the biggest speed gains you can get are generally algorithmic in nature. You can do a bubble sort with just a few lines of code. It's a lot simpler code and smaller than the larger and more complicated quick sort or merge sort. I know which one I'd rather wait for with a million items to sort.

    So remember, just because something is bigger, doesn't mean it's more bloated, and just because something is smaller doesn't mean it's faster or more efficient.

    --
    Looking for a computer support specialist for your small business? Check out
  13. Re:Interesting topic... by topham · · Score: 4, Insightful

    If you ever found a bug in code optimized to that degree you would NOT want to fix it as it could require a complete rewrite from scratch.

  14. Re:Optimized Executables by aking137 · · Score: 2, Insightful

    Yes thankyou, I am. I have used both successfully - download the image and try it. They're there, and they do have very real uses in terms of rescuing machines, which is simply to allow one to transfer files across a network. If I remember rightly, the web server executable is significantly less than 1k in size. And why not have a telnetd if you can fit it into a few kB?

    The whole disk has become so useful that it has virtually removed any need for an MS-DOS disk on the network I'm looking after. Running an rm -rf /mnt/c for a multi-gigabyte FAT32 filesystem takes seconds, as opposed to a deltree c:\*.* from a DOS disk, which can take literally hours, for example.

    Andrew