Slashdot Mirror


Don't Overlook Efficient C/C++ Cmd Line Processing

An anonymous reader writes "Command-line processing is historically one of the most ignored areas in software development. Just about any relatively complicated software has dozens of available command-line options. The GNU tool gperf is a "perfect" hash function that, for a given set of user-provided strings, generates C/C++ code for a hash table, a hash function, and a lookup function. This article provides a reference for a good discussion on how to use gperf for effective command-line processing in your C/C++ code."

30 of 219 comments (clear)

  1. Speed in options parsing? by tot · · Score: 5, Insightful

    I would not consider speed of command line option processing to be bottleneck in any application, the overhead of starting of the program is far greater.

    1. Re:Speed in options parsing? by ScrewMaster · · Score: 3, Insightful

      I'd say the speed of human motor activity is an even greater limiting factor.

      --
      The higher the technology, the sharper that two-edged sword.
    2. Re:Speed in options parsing? by ChronosWS · · Score: 4, Informative

      Indeed, what the hell? Now you have to have another tool and another source file for what is essentially declaring a dictionary in C++, which should be in any good developer's library? Yeesh.

      If you don't like the nasty nested ifs, make the keys in your dictionary the command line options and the values delegates, then just loop through your list of options passed on the command-line, invoking the delegate as appropriate. Eliminates the if, there are no switch statements either, and each of your command line arguments is now handled by a function dedicated to it, bringing all of the benefits of compartmentalizing your code rather than stringing it out in a huge processing function.

    3. Re:Speed in options parsing? by pete-classic · · Score: 3, Informative

      What a limited point of view. See "man system", for example.

      -Peter

    4. Re:Speed in options parsing? by Anonymous Coward · · Score: 5, Funny

      You're not a real programmer if you won't over optimize unrelevant parts of your code.

    5. Re:Speed in options parsing? by canuck57 · · Score: 3, Insightful

      I would not consider speed of command line option processing to be bottleneck in any application, the overhead of starting of the program is far greater.

      Your just experiencing this with Java, Perl or some other high overhead bloated program. People often pull out a heavy weight needing a 90MB VM or a 5-10MB basis library calling the cats breakfast of shared libraries I would agree, but lets take a look at C based awk for example, it is only a 80kb draw. Runs fast, nice and general purpose and does a good job of what it was designed to do. It can be pipelined in, out and used directly on the command line as it has proper support for stdin, sndout and stderr. On my system, only 10 disk blocks to load.

      While fewer people are proficient at it, C/C++ will outlast us all for a language. Virtually every commodity computer today uses it in it's core. Many others have come and gone yet all our OSes and scripting tools rely on it. So any dooms day predictions would be premature, and if your want fast, efficient and lean code you do C/C++....

    6. Re:Speed in options parsing? by ai3 · · Score: 4, Funny

      You must not have seen the recent proposal for GNU tools options, which will require four dashes instead of two and a minimum of four words per option. Under a UN/EU funded program to ease the transition to intelligent machines, developers are rewarded for implementing full-sentence options and/or prose. But initial experiments showed that many users where unwilling to wait for the parsing of the command "remove-files --recursively-from-root-directory --do-not-ask-for-confirmation-just-delete --i-really-want-this!" just to be 1337, which led to whatever development efforts are mentioned in the article, which I didn't read.

    7. Re:Speed in options parsing? by Maniac-X · · Score: 4, Funny

      Klingon function calls do not have 'parameters' - they have 'arguments.' AND THEY ALWAYS WIN THEM!

      --
      (A)bort, (R)etry, (I)gnore?_
    8. Re:Speed in options parsing? by VGPowerlord · · Score: 3, Insightful

      Not everyone uses the same tab stops.

      I see that as a good reason to use tabs. Don't like how far it's indented? Change how wide your editor displays tabs.
      --
      GLaDOS for President 2016! "Well here we are again. It's always such a pleasure." -- GLaDOS, 2011
    9. Re:Speed in options parsing? by JesseMcDonald · · Score: 3, Informative

      Writing code that writes code--now we're thinking!

      But what could we call this code, a compiler? Nah, I think we need to think of another word for it.

      How about "macro"?

      --
      "The state is that great fiction by which everyone tries to live at the expense of everyone else." - Bastiat
  2. Too much by bytesex · · Score: 3, Insightful

    I'm not sure that for the usually simple task of command line processing, I'd like to learn a whole new lex/yacc syntax thingy.

    --
    Religion is what happens when nature strikes and groupthink goes wrong.
    1. Re:Too much by hackstraw · · Score: 4, Insightful

      I'm not sure that for the usually simple task of command line processing, I'd like to learn a whole new lex/yacc syntax thingy.

      The syntax for gperf is not that bad, but its simply the wrong tool for the job as far as commandline processing goes.

      gperf simply makes a "perfect" has function for searching a predetermined static lookup. It provides no mechanism for arbitrary arguments like input filenames or modifiers (like a filter for including/excluding things, or increasing/decreasing something) nor does it check for conflicting options or missing options.

      gperf would give you nothing besides a match of input to a state. gperf would provide nothing for a common commandline like: --include="*.txt" --exclude="*.backup" --with-match="some text|or this text" --limit-input=5megabytes

      getopt or just rolling your own if/else if ladder or switch statement would provide much more flexibility over gperf.

      Now, with parsing a configuration file, gperf might help, but for processing commandline arguments, gperf is simply the wrong tool for the job.

      This is like the second or third slashdot posting from IBM's developer works that is simply a well formated nonsense. Past examples are http://developers.slashdot.org/article.pl?sid=07/0 4/09/1539255 and http://developers.slashdot.org/article.pl?sid=07/0 4/09/1539255

      This is silly on both slashdot and IBMs part.

  3. Yeah, because getopt(3) is a real bottleneck by V.+Mole · · Score: 4, Insightful

    Does the phrase "reinvent the wheel" strike a chord with anyone?

  4. Re:C++ I get by Anonymous Coward · · Score: 4, Insightful

    I do. On MIPS, ARM, PPC, x86, and all the other embedded stuff. I don't think C will ever die - it's the universal assembler language.

  5. And the standard says... by Anonymous Coward · · Score: 5, Insightful

    Good grief. What a strawman of an example.
    Anyone writing or maintaining command line programs knows that they
    should be using the API getopt() or getopt_long().
    There are standards on how command line options and arguments are to be
    processed. They should be followed for portability and code maintenance.

  6. Re:C++ I get by V.+Mole · · Score: 5, Funny

    There's this little project of which you may have heard: http://www.kernel.org/

  7. Broken handling of vtables in linkers by tepples · · Score: 4, Informative

    Now you have to have another tool and another source file for what is essentially declaring a dictionary in C++, which should be in any good developer's library? Due to the brokenness of how some linkers handle virtual method lookup tables, using anything from the C++ standard library tends to bring in a large chunk of dead code from the standard library. I compiled hello-iostream.cpp using MinGW and the executable was over 200 KiB after running strip, compared to the 6 KiB executable produced from hello-cstdio.cpp. Sometimes NIH syndrome produces runtime efficiency, and on a handheld system, efficiency can mean the difference between fitting your app into widely deployed hardware and having to build custom, much more expensive hardware.
  8. Re:Joke? by iangoldby · · Score: 4, Insightful

    Someone found a "new" toy?
    Well I for one won't be using this to process command-line arguments (that's what getopt() and getopt_long() are for), but it is certainly useful to know of a tool that I can use to generate a perfect hash. The next time I need some simple but efficient code to quickly discriminate between a fixed set of strings, I'll know to Google for gperf. (Before I read this article I didn't even know it existed.)
  9. Re:C++ I get by mce · · Score: 4, Interesting

    You, whenever you compile C++ code, as it is compiled to C before machine code (unless you are using an exotic compiler such as the Compaq AXP C++ compiler for TRU64).

    Excuse me???? That was not even true anymore when I started using C++, back in 1992. There are features in the C++ standard that are so extremely difficult to correctly implement in standard compliant C that it's a complete waste of effort trying to pass via C while compiling. Exception handling comes to mind as the prime example. A failed attempt to support exceptions was the reason why Cfront 4.0 was abandoned. Note that 3.0 was released as early as 1991. The last Cfront based compiler I had the horor of using was HP's CC. It was superseeded by the new native aCC by 1994 at the latest.

    By the way, I used to write C/C++ compilation/optimisation stuff for a living, so I guess I know something about the topic.... :-)

  10. It is if the linker complains about not finding it by tepples · · Score: 4, Informative

    Yeah, because getopt(3) is a real bottleneck getopt() is in the header <unistd.h>, which is in POSIX, not ANSI. POSIX facilities are not guaranteed to be present on W*nd?ws systems. It also handles only short options, not long options. For those, you have to use getopt_long() of <getopt.h>, which isn't even in POSIX.

    Does the phrase "reinvent the wheel" strike a chord with anyone? If the wheel isn't licensed appropriately, copyright law requires you to reinvent it. Specifically, using software under the GNU Lesser General Public License in a proprietary program intended to run on a platform whose executables are ordinarily statically linked, such as a handheld or otherwise embedded system, is cumbersome.
  11. Re:C++ I get by Enselic · · Score: 4, Informative

    You are wrong about 3):

    The process of building the new engine went much more smoothly than anything we have done before, because I was able to do all the groundwork while the rest of the company worked on TeamArena. By the time they were ready to work on it, things were basically functional. I did most of the early development work with a gutted version of Quake 3, which let me write a brand new renderer without having to rewrite file access code, console code, and all the other subsystems that make up a game. After the renderer was functional and the other programmers came off of TA and Wolf, the rest of the codebase got rewritten. Especially after our move to C++, there is very little code remaining from the Q3 codebase at this point.

    Source: http://archive.gamespy.com/e32002/pc/carmack/


    And 4) as well:

    Historically, compilers for many languages, including C++ and Fortran, have been implemented as "preprocessors" which emit another high level language such as C. None of the compilers included in GCC are implemented this way; they all generate machine code directly. This sort of preprocessor should not be confused with the C preprocessor, which is an integral feature of the C, C++, Objective-C and Objective-C++ languages.

    Source: http://gcc.gnu.org/onlinedocs/gcc-4.2.1/gcc/G_002b _002b-and-GCC.html

  12. All the world is not a PC by tepples · · Score: 5, Insightful

    HOLY SHIT! 194KB BIGGER?! HOW WILL YOU EVER FIND THE SPACE FOR SUCH A HUGE EXECUTABLE?!?! I develop for a battery-powered computer with 384 KiB of RAM. In such an environment, what you appear to sarcastically call a "mere couple hundred kilobytes" is a bigger deal than it is on a personal computer manufactured in 2007.
  13. Wrong in so many ways by geophile · · Score: 4, Insightful

    Perfect hash functions are curiosities. If you have a static set of keys, then with enough work you can generate a perfect (i.e. collision-free) hash function. This has been known for many years. The applicability is highly limited, because you don't usually have a static set of keys, and because the cost of generating the perfect hash is usually not worth it.

    Gperf might be reasonable as a perfect hash generator for those incredibly rare situations when the extra work due to a hash collision is really the one thing standing between you and acceptable performance of your application.

    I thought maybe we were seeing a bad writeup, but no, it's the authors' themselves who talk about the need for high-performance command-line processing, and give the performance of processing N arguments as O(N)*[N*O(1)]. I cannot conceive of a situation in which command-line processing is a bottleneck. And their use of O() notation is wrong (they are claiming O(N**2) -- which they really don't want to do, not least because it's wrong). O() notation shows how performance grows with input size. Unless they are worrying about thousands or millions of command-line arguments, O() notation in this context is just ludicrous.

    I don't know why I'm going on at such length -- the extreme dumbness of this article just set me off.

  14. Historically? by ClosedSource · · Score: 3, Insightful

    "Command-line processing is historically one of the most ignored areas in software development."

    This is like saying that walking is historically one of the most ignored areas in human transportation.

  15. devkitARM by tepples · · Score: 3, Informative

    And you do so using MinGW and c++? Yes, I do so with devkitARM (a cross-compiling GCC toolchain that is itself compiled with MinGW) and C++.
  16. only relevent to static linking by sentientbrendan · · Score: 4, Informative

    It sounds like the author is statically linking his library and running on embedded an embedded system. It is not surprising in that case that the c++ standard library brings in much more code than the c standard library, but it should be made clear that it is not relevant to desktop developers, pretty much all of which dynamically link with glibc.

    Again, to be clear, dynamically linking with the c++ standard library is not going to increase your executable size. Please don't try to roll your own code that exists in the standard library. It is a real nuisance when people do that.

    I should qualify that by saying that template instantiations do (of course) increase executable size, but that they do so no more than if you had rolled your own.

  17. Re:C++ I get by mce · · Score: 3, Informative

    Of course C++ exceptions are what I meant. What else would I mean when using the word "exceptions" in this context?

    And yes, C++ exceptions can be expressed in C. After all, C is a glorified assembler and the resulting code from C++ translation is assembler as well. It all depends in the level of abstraction at which write the C code is written and on the amount of uglyness/inefficiency you're willing to take on board (and also the trade-off between both of the latter). But that's not the point. The point of this thread is that nowadays it makes no sense to make use of this capability in a C++ compiler. Especially not when considering that a user of a C++ compiler wants more than just a compiler. He also wants a debugger that is able to meaningfully link up the binary and the original C++ source. If you're a C++ compiler vendor, using C as an IL does nothing but complicate your own life. Twice.
  18. This tool is much easier by stupendou · · Score: 3, Interesting

    Try supergetopt instead. Much easier to use and also open source.
    http://www.ibiblio.org/pub/Linux/devel/sugerget-1. 1.tgz

    With this code, you simply specify command-line strings and variables in a printf()
    style format.

    E.g. supergetopt( argc, argv,
                                        "string1", "%d %d", function1,
                                        "string2", "%s", function2 )

    will call function1( int a, int b ) when string1 is on the command line,
    and will call function2( char *s ) when string2 is used on the command line.

    A whole lot easier than gperf, IMHO.

  19. Re:C++ I get by mce · · Score: 3, Informative

    The main problem (but not the only one) is called "object destructors". You have to make sure they are called. All of them, and in the correct order, at all the nested scopes of execution you are in when the exception occurs. And you need to make sure not to call them on any object not yet constructed (always remember that constructors can throw exceptions too) and never to call a destructor twice (I've seen this kind of bug multiple times in multiple compilers). And then there is the fun of exceptions thrown by destructors, not to mention the possibility that it all happens in the middle of constructing or destructing an array of objects.

    All that is why setjmp()/longjmp(), also known as C's non-local goto, don't cut it, which in turn means that you need to complicate function return mechanisms. And just when you think you got that problem sorted out, you need to be aware that C++ functions can call (library) C functions that were never compiled to even know about exceptions but that in turn can call C++ functions that may again throw an exception. The entire construction needs to be able to handle this.

    As I wrote in an other post in this thread, it can be done. But it is not easy. Note that the entire object destructor issue also applies within a single scope, which is why life is not as easy as replacing every "throw" statement by "goto end;".

  20. Re:Byte counts when compiled with devkitARM by sholden · · Score: 3, Funny

    It's not pushing it all. It's storage, it's network attached, it's in a box... What I am pushing is the poor little linksys device. It's plugged into 4 USB hard drives (plus a thumb drive, but that's just for booting) which it's running software RAID5 on. Poor little thing, if it could scream I'm sure it would be. Sadly it's the only machine with a C++ compiler on it at home these days...

    Please don't tell the poor thing it's running on MIPS, the ARMv5TE kernel might just freak out and collapse the universe.