Code Generation in Action
Overview Code Generation in Action, CGiA to its friends, is presented in two parts. The first part is four chapters, and covers a code generation case-study, the basic principles of code generation, including the different types of code generation strategies together with reasons why you would or would not use each strategy. The book's chosen toolset for building generators is presented next, and then some walk-through examples of building simple generators wraps up the first part.
The second part is a kind of a cross between a cookbook and a list of engineering solutions. There are nine chapters with the breadth of solutions covered being quite impressive, covering the gamut of generation of user interfaces, documentation, unit tests and data access code. Each chapter presents a couple of solutions within its topic area, often for different technologies within that topic. For example, the user interface chapter covers the generation of Java ServerPages, Swing dialog boxes and then Microsoft MFC dialog boxes. No favouritism here!
What's To Like There's a lot to like with this book. The writing is very clear and of good prose. I found the introduction to be very compelling, and I felt completely drawn in by the opening case-study. The four chapters of part one are a concise case for code generation, and would be very useful information to help persuade co-workers and management of the positive risk/benefit ratio with trying code-generation on a live project.It would be impossible to try enough of any solution from part two in a time-frame short enough to make this review useful, but in the solutions that match my areas of knowledge, I found myself admiring Herrington's straight-forward and pragmatic approach.
What's To Consider There are two aspects of this book that I want to flag. One of these aspects, some will love and others will hate, and that is the choice of generator language for CGiA. The author has chosen to use Ruby as his working language. This is an interesting choice. Ruby is certainly a language that is inspiring a lot of admiration these days (in fact, it's hard to get Dave Thomas to stop talking about it :-), but with the majority of the code-generation examples being for Java-related technologies, I wonder why Java was not selected instead.I also found myself wondering about the lack of discussion of how to integrate these Ruby tools into a typical Java build process. Many developers I know use ant to bring automation and consistency to their builds, yet the book doesn't mention this. (JRuby anyone?) Certainly something to consider for the second edition or future code-generation authors.
SummaryThis is a masterful tome that inspires and delights, although the two issues raised above did cost it a perfect score of ten.
Table Of Contents- Code generation fundamentals
- Overview
- Code generation basics
- Code generation tools
- Building simple generators
- Code generation solutions
- Generating user interfaces
- Generating documentation
- Generating unit tests
- Embedding SQL with generators
- Handling data
- Creating database access generators
- Generating web services layers
- Generating business logic
- More generator ideas
You can purchase Code Generation in Action from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
Richard Hell and Codeoids
Code generation is definitely something programmers of large/complex projects should look into. There's a lot of different forms of it, and I'd be surprised if people haven't used on form or another already.
My personal favorite trick at the moment is describing interfaces in XML, and getting multi-language bindings and docs out of it with a few XSLT scripts. Techniques like that makes script language bindings easy as pie (as do other systems like Swig).
Code generation is a time-saving technique that helps engineers do better, more creative, and useful work by reducing redundant hand-coding. In this world of increasingly code-intensive frameworks, the value of replacing laborious hand-coding with code generation is acute and, thus, its popularity is increasing.
Why not put a little bit about code generation in the review. Even a little blurb like "It builds the mundane portions of coding" would have helped out a bit.
And Lisp has been doing it for years, as has, to a lesser extent, C++ (and Ada) templates. And the good thing about Lisp metaprogramming is that it is in the one language - lisp.
Having 1 language generating another - Ruby and Java - is the recipe for confusion and complexity.
My rule of thumb is if I find myself writing code that bores me silly and thinking "a frigging monkey could be taught to code this piece", I will strongly consider writing a program to write the program for me. Be warned about the maintenance and readability issues though in larger development projects where there are a lot of mediocre programmers around. You can always assign those mediocre programmers to hack on monkey-easy code, but you can't get them to hack on a code generator, so carefully consider the nature of the development organization you are dealing with, and the tradeoff between available "high-value" time and resources vs. "low-value" (i.e. monkey coder) time and resources. This perspective has been brought to you by my pointy-haired side.
if you ever read the "Pragmatic Programmer" book about software developer practices, it mentions that they used code generation for many things. Code generation definitely helps when you have many similar but not same implementation.. But, in Java or any other object oriented languages, inheritence can be used to avoid similar looking code (which is what code generation would do).
But, even here, I think the UI pages, or template based generators definitely help. There was a tool by DevelopMentor which used to generate code for ATL. It was based on templates..
But apparently, it is not. I, for one, wasn't too happy seeing the term "code generator" applied to superficial software that generates HTML/user level code for standard dialog boxes etc. HTML isn't code in the first place, anyway.
Maybe I'm being too fussy about this, but a code generator, traditionally has always meant a part of the compiler back-end which actually translates intermediate code to machine-level instructions. VB and other UI tools generate stubs for the most part...well maybe they are code generators, but I'm just not too happy about the choice of terms.
An Indian-American Hindu committed to non-violent thought/speech/action alarmed by the global explosion of radical Islam
The author has chosen to use Ruby as his working language. This is an interesting choice. Ruby is certainly a language that is inspiring a lot of admiration these days..., but with the majority of the code-generation examples being for Java-related technologies, I wonder why Java was not selected instead.
I think the author makes it pretty clear why he chose Ruby instead of Java. Essentially, in order to parse text (which is one of the primary functions in code generators) you would have to write 2 to 3X more code in Java than you would in Ruby. Java is not an optimal text parsing language - first off you have to find a regex engine for it. That leaves you with choosing one of the scripting languages: Ruby, Perl or Python.
Here's what the author says about the cons of using Java for code generation (page 41):
* Java is not ideal for text parsing. (I would agree)
* Strong typing is not ideal for text processing application (again, I would tend to agree, strong typing only gets in your way)
* The implementation overhead is large for small generators. (you'll be writing a lot more java code than you would in Ruby to get the same thing done)
Overall, I'm finding it to be a great book, and the use of Ruby for implementing the examples is a plus as far as I'm concerned.
As far as integrating Ruby into the build process goes, I recall hearing something about a project that uses Ruby to drive Ant.
I became a code-generation convert about two years ago when I first encountered Perl Data Language -- what made PDL development feasible was a code generator by Lukka, Glazebrook, and Soeller: their generator ("PP") writes all the loops and conditionals that do vectorized processing of large arrays.
Of course, PP has its limitations -- the age old complaint of engineers is that they will eventually outgrow any code generation framework -- but it's simple enough to augment the generation language or, in extreme cases, to sidestep the parts of the functionality that you're not using.
I work on a large Java project where we use entity EJBs. Code generation isn't an _option_ here, it's a necessity. We have hundreds of tables (over 500) and each of them has an EJB. Writing out the infrastructure for each and every one by hand would be a huge and boring waste of time. I think the necessity for code generation actually points to a problem with design of EJB itself, but that we're pretty much stuck with.
I worked at a telco where we generated C code on the fly from high level Structured Definition Language for the main call control processing.
It was a great idea... in theory. In theory, it was impossible for the implementation to get out of sync with the detailed design (the SDL). In theory, there's no difference between theory and practice, but in practice there is. Some of the features that we had to add simply couldn't be modelled in SDL, plus there were performance issues, and it produced ugly source.
It was used for fifteen years (yeah, pre-ANSI), but it eventually collapsed under the weight of all of the hacks that were required to work around the limitations. We eventually had to admit that the behaviour of the complete source (generated plus all the stuff around it) was now so different from that defined by the SDL that it was no longer worth putting up with the limitations of the SDL.
In the end, we just took a snapshot of the generated code and set developers free to actually fix it rather than hack around it. At that point, there were only a few people left who even knew SDL, so there were very few tears shed. The rest of us cheered, and the product got significantly cleaner as we refactored the bejeesus out of all the generated C and removed the hacks.
I'd recommend giving code generation a try, but don't be ruled by it. Once the product is mature, if the code generation is limiting you, then don't be afraid to drop it and fix the lower level generated code.
If you were blocking sigs, you wouldn't have to read this.
Actually, it's very much there. It's not a _replacement_ for development, but there are many parts of coding which is benefitted by code generation. I often write tools to write code for me. Once I had to write a color-picker in HTML (VERY repetitive code), so I wrote a code-generator in Emacs Lisp and it saved me several hours. Several implementations of this concept exist:
* Templating (see Alexandrescu's book on Modern C++ design)
* Macros for those in the Scheme/Lisp world (these are GREAT and AWESOME)
* Compile-time programming (only available in LISP as far as I'm aware through the eval-when construct)
* Custom program-generators
And then there's the related concept of partial evaluation that, while excellent, has received very little attention by the commercial sector.
Now, many code-generation facilities could be done better with good libraries, but this isn't universally the case. Delphi is probably the best at putting in libraries/properties what others put in code generators, and their software is much easier and better because of it.
Macros and compile-time programming are two of the best ways to do this, but Scheme and LISP are the only ones that do this reasonably.
Engineering and the Ultimate
Code Generation is for people who don't understand or are too lazy for abstraction
The article that timothy suggested as background reading (here) points out that code generation is most useful when you're forced to use a framework that requires lots of simple-minded "scaffolding-style" code. EJB is the prime example.
In other words, I agree with you --- if code generation is useful, it's probably because the infrastructure you're using was poorly designed. But that doesn't mean you don't have to use it. Managers who have no idea what J2EE is require you to use J2EE. So you use code generation.
A quote from the Sun web site:
J2EE technology and its component based model simplifies enterprise development and deployment.
But now I hear that we need code generation to keep up with all the mundane tasks made necessary by the use of EJBs.
So we build a code generator and we have to maintain that.
This is on top of all the J2EE design patterns you are supposed to do because the world would come to an end if you just accessed a database table using JDBC directly.
Once in a while someone should look at the assertion that it would be harder to maintain a lot of imbedded JDBC code in your application than it would be maintain the 5 or so classes you need for each business object in order to maintain architectural purity.
"We can't solve problems by using the same kind of thinking we used when we created them." -- Albert Einstein
For those who are saying that the term "Code Generator" isn't applicable - it is. Consider a C++ compiler. It may generate asm code, which then gets converted into machine code.
(generic) C++ -> (specific) asm -> executable bits
(obviuosly, the C/C++ compiler doesn't need to generate asm, but it's still code generation if it does)
Code generators just take this a level higher, so the code "life cycle" looks like this:
(generic) Diagram / CG description -> (specific implementation) C++ -> (specific machine) asm -> machine code.
Code generators have a great potential for easing coding and documentation. Just like GCC has many backends to generate code for different processor architectures, the code compilers can have different backends to make source code in different languages (C, C++, Fortran, whatever). Even better, you can run a different translator and get documentation out of the "source" - in HTML, DocBook, XML, or any other format you want.
There are tools to let you make UML diagrams (Google for "Executable UML" for great goodness), and generate real-time C code for an application, a C++ app simulator that runs on a PC, and documentation for the system, all from the same diagram. The tools are expensive (like $15k-$30k), but for large projects, they can be a great savings.
I saw a program called BridgePoint (from Project Technologies), which was able to generate embedded, real-time code that was as efficient (more in some areas, less in others, but it averaged out the same) as hand-optimized code done by expert programmers. It all depends on how goo dyour translator is (and this program lets you write your own).
Some books on the subject:
"Executable UML: A Foundation for Model-Driven Architecture", by Stephen J. Mellor and Marc J. Balcer
"Executable UML: The Models are the Code", by Leon Starr
"Real-Time UML: Developing Efficient Objects for Embedded Systems, Second Edition", by Bruce Powel Douglass
- The Sigless Wonder
While I agree with you in principle - that better abstraction is usually the way to go - that's not always possible in the real world. For example, sometimes you have to produce an API for someone else, or hook up to their API. In those cases, better abstraction isn't always an option. Sometimes your boss says "you're using EJB" and you're stuck with it. In those cases a generator can be a big help.
The cool thing about integrating a generator into your build is that you get the benefits of abstraction without many of the drawbacks. The generator becomes your abstraction, so you can make modifications in one place. Sure, using a generator requires a little more thought than cut-n-paste. So does proper abstraction. The two aren't all that different, and both approaches have their place.
Yes, of course you can do this in lisp.
The point your parent poster was making, is that a lisp program has the full power of the language available at macro expansion time, just prior to compilation. This means you can redefine the syntax of the language at will to create any language you like on top of lisp. Lisp macros should not be confused with c-style macros, which are merely token substitutions, not redefinitions of language grammar.
You only have this in perl in a very crude and hackish sense. You have to write your own parser; you have to write your own code generator; you have to run every piece of code through your home-brew parser/code-generator before you send it to the perl interpreter. Debugging? Heaven help you if any of:
1. Your code written in your new-language-built-on-perl has errors
2. Your parser has errors.
3. Your code generator has errors.
In Lisp, the macro facilities come for free, are part of the standard (so macros are portable). Vendors are responsible for correctness, so debugging is a simple matter of using the built in functions macroexpand and macroexpand-1.
Saying that Perl has the same sort of code generation capabilities as Lisp is rather like saying that, since all languages are Turing equivalent, assembly language has the same macro capabilities as lisp.
The power of a language comes from its expressiveness, the things it lets you do easily without having to resort the 21st century equivalents of a turing machine. With Perl, you only get this level of expressiveness by using convoluted, error prone, home-brew substitutes for real macros.
The .NET Framework has great built-in support for code generation, called the CodeDOM model. It is the mechanism used by all of the Visual Studio IDE wizards.
What is great is how they abstracted out the target language. Once you build a CodeDOM graph in your generator, you can pass it to any language generator (C#, VB.NET, J#, write your own) to create the result.
Generating code from database schemas and the like can be acceptable in my book. However, generating code just because it is repetitive and clunky is *not*.
...
:-).
In theory, good programmers never write repetitive code: they factor out functions, classes. Still, there are things that cannot be factored out easily, patterns of code that just do not map well into functions or cannot be abstracted away in whatever way the language provides.
Some languages are better at abstraction than others, but not many provides the ability to generate code (think of syntactic abstraction). This is where Lisp macros shine. They can hide ugliness I'm forced to do in the name of performance, define new control structures,
Think of the number times you were forced to go out of your way to solve a problem because while it was more correct or efficient it was less easy on the eyes. True macros would have helped. Well, divide that number by 10, the rest was probably your fault
A language needing a code generator for reasons above lacks power and for this lack a thousand small tools will appear all trying to plug some holes.
Code generation is a practical, efficient tool for solving many problems where OO-style abstraction need not enter the picture. One such class of problems is building interfaces and glue code from external specifications.
A few years ago, I wrote a simple code generator that reads the SQL DDL for a large database and generates an object-based interface to the database. Client coders could then use the object-based interface to access the database. The advantages of this approach proved to be numerous:
- Single, authoritative reference specification.
The object interface was always in sync with the reference, which for
this project was the database schema.
- Richer compile-time error detection. The
projection of the schema into the object interface was fully available
to the type system so that many kinds of client errors could be caught
at compile time, not run time.
- Reduced opportunity for errors between subsystem
boundaries. Because the object-based interface was generated
by machine from the actual database -- and not derived from some
programmer's understanding of the database -- there were
fewer opportunities for impedance mismatch across the boundaries
of the application code and the database. (Studies of errors in
complex projects have shown that errors are more common between
subsystem boundaries, and so this benefit is important.)
mcc further states: If you can't think of any such cases, it's because you're thinking too small. Look at the bigger picture. For starters:- When the number of variables affecting the desired code characteristics
is large enough to make hand-coding (at any level of abstraction)
impractical. E.g., FFTW: "FFTW
uses a code generator to produce highly-optimized routines for
computing small transforms."
- When your code must conform to an external reference specification
that changes rapidly enough to make hand coding (at any level of
abstraction) impractical. (See my example above.)
- When the requirement for correctness is so stringent as to make
hand-coding methods impractical, mandating code generation from
a formal specification.
- When you must target an output language whose native abstraction
capabilities are too crude to capture directly the degree of abstraction
that is merited. Believe it or not, most popular OO languages
fall into this category for many commonly occuring problems. Hence the popularity of design patterns. (Compare, e.g., with the abstraction capabilities of modern
functional programming languages like Haskell and O'Caml.)
Make no mistake about it, code generation is a practical, effective tool that every programmer should understand. To dismiss it out of hand is a costly mistake.Easy, automatic testing for Perl.
Amen to LISP/Scheme macros. Java and C++ have reimplemented so many other Lisp ideas you wonder why attention never turned to the preprocessor. After 15+ years of C++/Java and OO we would still all be better off programming in Lisp.
an ill wind that blows no good