Pragmatic Programmers on Designing with Metadata
Bill Venners writes "This week I've published the fourth installment of my interview with Andy Hunt and Dave Thomas, the authors of the best-selling book, The Pragmatic Programmer. In this installment, Dave and Andy talk about their recommended approach to design in which details are pulled out of the code and stored as metadata. This installment of the interview really made me think. Their focus on metadata sounded non-intuitive when I read their book, but in actually talking to them about it, I got the feeling they might be on to something. Check out: Abstraction and Detail."
I don't see what is so special about the idea of separating metadata from the rest of the code. I think this is what experienced programmers often do. (I also think that the authors have not really understood XP, as it is not a style of programming, but a style of working in the first place.) Having to think about what to put in "code" and what to put in "metadata" is really deciding what should be "compiled" and what should be "interpretted". The are two reasons why code should be compiled. Firstly, because compiling is factors faster than interpretting. Secondly, it is still one of the best ways of protecting ones investments (in the Closed Source) development model. I think most of the effort in programming still lies in performance. Many people will deny this, but I think that they simply blind to the fact how much we do because of all the tricks we use to give computers a reasonable speed. Many programmers aren't even aware of the memory piramid and its impact on how programs and operating systems work.
I am not sure where they got their definition of metadata, but it sure isn't what I mean when I say it. From the article:
Uh. He's talking about storing data externally. This is his idea of metadata? Sounds more like 'data' to me. I think these people must be consultants because they must be full of shit.
At another point in the article the guy says he implements a state machine by using a database instead of hardcoding it in the code. Wow. Pretty novel. Next thing he's going to start talking about how OOP and OOD combined with putting the business logic in Prolog is going to affect my ROI with regard to my WYSIWYG.
I'd never think about storing data and loading it dynamically by my program instead of hardcoding it.
I think that storing everything possible in XML or a database is a good thing. Perhaps he is referring to the coding style of outputing everything in XML and use XSLT to convert it as required. This is an incredibly flexible and scalable technique with only slightly more initial development time and effort.
For webpages, I can have my application do a SQL select and output the data as XML. The XSLT template will translate that to HTML. To change the look of my site, I edit the XSLT. The application does not have to be modified to change the look of the website.
This could be done for applications too if using a GUI toolkit based on XML such as XUL. Your program does the SQL select, outputs XML, and then XSLT translates that XML to XUL which is displayed as an application.
Maybe I've got the wrong idea from the interview, but what was discussed was rules, not metadata.
Business rules are a well known aspect of enterprise software development, especially in light of the many old(er) custom-build systems in which the rules were hard-coded. A business rule is "sales tax is 7%", or "customer pays a 1.5% surcharge is payment is more than 2 days late".
Metadata is a partner and also an opposite to a business rule. Metadata is quite simply "data about data". The fact that the value "7%" is "sales tax" is metadata; but the fact that the current value of the sales tax is 7% is not. The age-old concept of a "data dictionary" is an embodiment of what metadata is.
A rules engine is (rather simply) a powerful extension of the practice of declaring constants for significant literals (which are or could be subject to change); quite often one which allows runtime modification of the value rather than requiring a recompile. Rules engines also tend to provide mechanisms for evaluating compliance with the rule, or performing calculations based on rules.
i-name =twylite [http://public.xdi.org/=twylite], see idcommons.net
To summarize the article.
Any professional programmer somewhat experienced (i.e. anyone who has run into a respecification of a constant occuring in 52 locations thoughout 30000 lines of code) would concider this common sence. But, hey, anyone who didn't know this might find the article useful!
There's an effect that I've named (what else) "Hollan's Law": The liklihood of something changing is directly proportional to the intensity of the argument that it never will.
Thus, when we see something in code that looks like it might need to be maliable, it probably will be.
Looks like these guys have noticed the same thing.
Ultimately, good programming is about finding the clearest way to express how to do something. It is not much of a stretch to imagine that often this how will take the form of "Imagine a machine that works this way to interpret data... then it is programmed thus... and this 'program' [metadata] makes such a machine do what we want."
You could've hired me.
This is not a new technique, but not used as much as it should be: I've used code generation in several projects, and I think that superior of metadata and property files/databases for cases where the property file would consist of a large number of fields, that are repetitive. For example: a big finite state machine. The code generation tool is always written by hand, which is the tough part. After the code generator is finished, you can leave much of the work to less experienced programmers, or even non-programmers.
:-)
The pros:
- the result is code that you can compile, which is more efficient,
- you catch the problems during compile-time instead of run-time,
- standardized code, which is easy to debug and maintain(worth how many millions to you, as the developer?
The cons:
- somewhat high initial cost in form of developing the code generator,
- the code generator is sensitive to changing requirements - code generation is best used on requirements that has pretty much settled, and where you can take examples of hand-written code and use as a model for you code generation template.
The article simply says: do not encode "known" constants (such as tax levels, etc.) into the code, but put it in an external XML database.
No, that's not what it says. It says do not encode known behavior into the code, but put it in some more easily changed external data source. Also, it might be XML but it can also be in the code, just code structured more like data then code. (I often write code-data like that, writing default keymaps in the language itself, for instance. It's easier then writing a custom parser if you just use the language itself...)
Sales tax is used as an example for the interview, but it goes deeper then that. The other example is much more instructive, with the display of financial numbers. Few programmers instinctively write a "displayMonetaryAmount" function that allows them to make one edit to suddenly display negatives in red; it's much more common to always directly dump the value. "displayMonetaryAmount" is likely to be very simple, almost data-like, and easy to change, rather then changing the code everywhere that displays money, which is virtually impossible to correct.
There's a lot of value in that approach that is missed out on by a lot of programmers.
Dave and Andy's!
How are you going to keep them down on the farm once they've seen Karl Hungus?
These kind of approaches inevitably end with the programmer inventing a new, personal language for expressing the rules and writing a small interpreter for it (which he basically did).
The same result and much more can probably be achieved by utilizing a full-featured embedded scripting engine (such as Perl or Python) that is already available. Then again perhaps the entire application can be at least prototyped that way.
One technique for programming with meta-deta is data dictionaries
It takes a while to hone the usage of such things, especially WRT handling unexpected requirements, but there is a point where they start to pay off handsomely in my experience.
Table-ized A.I.