Slashdot Mirror


Pragmatic Programmers on Designing with Metadata

Bill Venners writes "This week I've published the fourth installment of my interview with Andy Hunt and Dave Thomas, the authors of the best-selling book, The Pragmatic Programmer. In this installment, Dave and Andy talk about their recommended approach to design in which details are pulled out of the code and stored as metadata. This installment of the interview really made me think. Their focus on metadata sounded non-intuitive when I read their book, but in actually talking to them about it, I got the feeling they might be on to something. Check out: Abstraction and Detail."

26 comments

  1. Compiling versus interpretting by Frans+Faase · · Score: 1

    I don't see what is so special about the idea of separating metadata from the rest of the code. I think this is what experienced programmers often do. (I also think that the authors have not really understood XP, as it is not a style of programming, but a style of working in the first place.) Having to think about what to put in "code" and what to put in "metadata" is really deciding what should be "compiled" and what should be "interpretted". The are two reasons why code should be compiled. Firstly, because compiling is factors faster than interpretting. Secondly, it is still one of the best ways of protecting ones investments (in the Closed Source) development model. I think most of the effort in programming still lies in performance. Many people will deny this, but I think that they simply blind to the fact how much we do because of all the tricks we use to give computers a reasonable speed. Many programmers aren't even aware of the memory piramid and its impact on how programs and operating systems work.

    1. Re:Compiling versus interpretting by Webmonger · · Score: 1

      Compiling also prevents people with an inadequate grasp of a program's structure from changing it in dangerous ways.

  2. Article is a load of bull by JoeSmack · · Score: 1

    I am not sure where they got their definition of metadata, but it sure isn't what I mean when I say it. From the article:

    When people read our advice in the book about metadata , they tend to imagine very complicated architectures with lots of abstraction. But in reality, it could be very simple. If the sales tax rate is currently 7%, I don't put 7% into the code. I put it into a properties file or the database. The sales tax rate is a detail I abstract out of the code and store externally.

    Uh. He's talking about storing data externally. This is his idea of metadata? Sounds more like 'data' to me. I think these people must be consultants because they must be full of shit.

    At another point in the article the guy says he implements a state machine by using a database instead of hardcoding it in the code. Wow. Pretty novel. Next thing he's going to start talking about how OOP and OOD combined with putting the business logic in Prolog is going to affect my ROI with regard to my WYSIWYG.

    1. Re:Article is a load of bull by hey! · · Score: 1

      He's talking about storing data externally. This is his idea of metadata? Sounds more like 'data' to me.

      Well, of course metadata is data. The important thing is to design out as many assumptions as possible so that they are represented by data rather than code. We used to call this "data driven programming".

      Example:
      Years ago, I had a huge backlog of requests for different data sets out a database. I abstracted out the common aspects of the requests, built an engine that retrieved the data based on a simple (OK, not that simple) text file. Then I showed the users how to set up the files. Bingo -- three year request backlog reduced to a few weeks.

      This is really an orthagonal issue to OOD, although it addresses some of the same issues. OOD is about how you solve problems -- data driven programming is about how you define problems. The more abstract you can make a problem, the more reusable code will be.

      Most people have not fully grasped the implications of this -- as your post illustrates.

      --
      Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
    2. Re:Article is a load of bull by oliverthered · · Score: 1

      When someone buys x[a fish tank] apply function[sales tax] with input y[7%].

      That's kinda metadata, more a function, but it's often hard to tell the difference between data that describes data and a function.

      input y is looked up as in input for function sales tax.
      list values for y have metadata that says they are possible inputs for sales tax.

      sales tax has metadata that says it can be applied to a product and/or a product has metadata that says salestax can be applied to it.

      It's important not to hard code because there are different tax bands and some products may have no sales tax.

      --
      thank God the internet isn't a human right.
    3. Re:Article is a load of bull by Horny+Smurf · · Score: 0
      Well, if I was writing a program that needed to calculate the 7% sales tax, I wouldn't hard code 7%, I would make it a user preference!


      Trying to use Metadata to handle abstraction and make code reusable isn't novel -- MacOS had windows and controls defined as data in a resource, and a couple tool calls would show the windows and controls. If they had just listed MacOS resource forks, NextStep/Os X property lists, or even Windows resources, most people here would say "duh".

    4. Re:Article is a load of bull by Hognoxious · · Score: 1
      ... 7% sales tax, I wouldn't hard code 7%, I would make it a user preference!
      If that was a standalone program, only covering one jurisdiction, maybe.

      For a multinational, with outlets all over, different rules, mail order etc, that wouldn't work. Trust me, I've been there.
      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    5. Re:Article is a load of bull by Hognoxious · · Score: 1
      Well, of course metadata is data.
      I think the distinction is between data that's to do with the objects the system models (e.g customers & invoices if it's a billing system) rather than data that makes the system behave in a certain way. The former I would call 'operational', the latter 'configuration'. For me, metadata means schemas and suchlike.

      Some have great difficulty eeing the distinction; a rule of thumb is if end users mess with it it's operational, if only specialists/analysts/etc do it's configuration.

      But really, I don't see that they're saying anything new, it's all in 'Code Complete' which is v. old now. SAP works that way too. What's good in the article is the opposition to XP's philosophy of never building the flexibility before you need it; they are totally correct that with a bit of experience you can predict where the flexibility will be needed, more than enough to break even.

      Refreshing to see something other than "XP is brilliant", if nothing else.
      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    6. Re:Article is a load of bull by hey! · · Score: 1

      For me, metadata means schemas and suchlike.

      Right -- a relational schema is metadata for a relational model. A state machine description is also a schema for a state machine model.

      But really, I don't see that they're saying anything new, it's all in 'Code Complete' which is v. old now.

      Sure, it's an old idea. HOwever it is not one that has been absorbed by nost programmers.

      --
      Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
    7. Re:Article is a load of bull by Hognoxious · · Score: 1
      Sure, it's an old idea. HOwever it is not one that has been absorbed by nost programmers.
      Yeah, I totally agree. Common sense isn't that common.

      I find it interesting to see discussions about OO and inheritance, components and all the rest of the reuse armoury, when 80% of programmers never learned how to use includes properly.

      P.S. I like the sig.
      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
  3. wow by GiMP · · Score: 1

    I'd never think about storing data and loading it dynamically by my program instead of hardcoding it.

    I think that storing everything possible in XML or a database is a good thing. Perhaps he is referring to the coding style of outputing everything in XML and use XSLT to convert it as required. This is an incredibly flexible and scalable technique with only slightly more initial development time and effort.

    For webpages, I can have my application do a SQL select and output the data as XML. The XSLT template will translate that to HTML. To change the look of my site, I edit the XSLT. The application does not have to be modified to change the look of the website.

    This could be done for applications too if using a GUI toolkit based on XML such as XUL. Your program does the SQL select, outputs XML, and then XSLT translates that XML to XUL which is displayed as an application.

    1. Re:wow by __past__ · · Score: 1
      But the XSLT is a part of your application. You don't have to go through a complete edit-compile-link circle that way, but you can have that with a lot of other programming languages, too. (Yes, XSLT is a programming language, just a rather specialized one, with very verbose syntax and not too much helpful tools like debuggers or profilers).

      This isn't about strictly separating code and data, it's about using different tools for different layers of your app to blur this distinction, which is a good thing. You can build really flexible apps with it. Think ~/.emacs - is it a config file (data), or a programm?

  4. Rules, not metadata by Twylite · · Score: 2, Informative

    Maybe I've got the wrong idea from the interview, but what was discussed was rules, not metadata.

    Business rules are a well known aspect of enterprise software development, especially in light of the many old(er) custom-build systems in which the rules were hard-coded. A business rule is "sales tax is 7%", or "customer pays a 1.5% surcharge is payment is more than 2 days late".

    Metadata is a partner and also an opposite to a business rule. Metadata is quite simply "data about data". The fact that the value "7%" is "sales tax" is metadata; but the fact that the current value of the sales tax is 7% is not. The age-old concept of a "data dictionary" is an embodiment of what metadata is.

    A rules engine is (rather simply) a powerful extension of the practice of declaring constants for significant literals (which are or could be subject to change); quite often one which allows runtime modification of the value rather than requiring a recompile. Rules engines also tend to provide mechanisms for evaluating compliance with the rule, or performing calculations based on rules.

    --
    i-name =twylite [http://public.xdi.org/=twylite], see idcommons.net
    1. Re:Rules, not metadata by oliverthered · · Score: 1

      We used to do that for e-commerce.
      You could setup differnet rules in the database, the rules were run as SQL or JavaScript, or a call to an external ActiveX Component.

      There were some user configuration parameters and other bits and bobs so that you could configure everything from a whois do domain name registration to complex multi purchase pricing rules.

      It made for east site design, maximum code reuse and the ability for an end user to configure their area of the system.

      Now, if someone applied that to a filing system ....

      --
      thank God the internet isn't a human right.
    2. Re:Rules, not metadata by statusbar · · Score: 1

      There is more to it, though.

      The 'Metadata' should include the sales tax algorithm, not just the percentage.

      For instance, the crazy Goods and Services Tax (GST) here in Canada has weird rules in it like: 3 donuts are taxable, but 12 donuts are not. Encoding a plain 7% in the 'metadata' is not sufficient.

      --jeff++

      --
      ipv6 is my vpn
    3. Re:Rules, not metadata by DEBEDb · · Score: 1
      It made for east site design

      ...and West Side storyboard...

      --

      Considered harmful.
  5. Summary by e8johan · · Score: 1

    To summarize the article.

    1. The meta data is data concerning the business functions encoded, not what is commonly refered to meta data in programming, i.e. data concerning for example object heirarchies and such.
    2. The article simply says: do not encode "known" constants (such as tax levels, etc.) into the code, but put it in an external XML database.

    Any professional programmer somewhat experienced (i.e. anyone who has run into a respecification of a constant occuring in 52 locations thoughout 30000 lines of code) would concider this common sence. But, hey, anyone who didn't know this might find the article useful!

  6. yup, yup by renehollan · · Score: 1
    Andy Hunt: But for me personally, almost every time I've taken that extra care to make a system flexible, it has saved me.

    There's an effect that I've named (what else) "Hollan's Law": The liklihood of something changing is directly proportional to the intensity of the argument that it never will.

    Thus, when we see something in code that looks like it might need to be maliable, it probably will be.

    Looks like these guys have noticed the same thing.

    Ultimately, good programming is about finding the clearest way to express how to do something. It is not much of a stretch to imagine that often this how will take the form of "Imagine a machine that works this way to interpret data... then it is programmed thus... and this 'program' [metadata] makes such a machine do what we want."

    --
    You could've hired me.
  7. Code generation by Mxyzptlk · · Score: 1

    This is not a new technique, but not used as much as it should be: I've used code generation in several projects, and I think that superior of metadata and property files/databases for cases where the property file would consist of a large number of fields, that are repetitive. For example: a big finite state machine. The code generation tool is always written by hand, which is the tough part. After the code generator is finished, you can leave much of the work to less experienced programmers, or even non-programmers.

    The pros:
    - the result is code that you can compile, which is more efficient,
    - you catch the problems during compile-time instead of run-time,
    - standardized code, which is easy to debug and maintain(worth how many millions to you, as the developer? :-)

    The cons:
    - somewhat high initial cost in form of developing the code generator,
    - the code generator is sensitive to changing requirements - code generation is best used on requirements that has pretty much settled, and where you can take examples of hand-written code and use as a model for you code generation template.

    1. Re:Code generation by Frans+Faase · · Score: 1
      Hard coding, storing (meta)data in a database, embedded scripting engines, virtual machines and code generations, are all techniques (each with their own pros and cons) for implementing functionality in an execution environment.

      Most programming is still done in low-level programming languages. IMHO, we really need systems/languages in which we can transform high-level specifications into executable code. This kind of tools would allow you to select any of the above techniques based on what suits you best.

      If now you want to switch between techniques, it simply means you have to start all over again. And I feel that is what the IT industry has been doing in the past twenty years.

      Instead of being engineers we are still craftsman, each of us working in their own workshop.

  8. Bad Summary by Jerf · · Score: 2, Interesting

    The article simply says: do not encode "known" constants (such as tax levels, etc.) into the code, but put it in an external XML database.

    No, that's not what it says. It says do not encode known behavior into the code, but put it in some more easily changed external data source. Also, it might be XML but it can also be in the code, just code structured more like data then code. (I often write code-data like that, writing default keymaps in the language itself, for instance. It's easier then writing a custom parser if you just use the language itself...)

    Sales tax is used as an example for the interview, but it goes deeper then that. The other example is much more instructive, with the display of financial numbers. Few programmers instinctively write a "displayMonetaryAmount" function that allows them to make one edit to suddenly display negatives in red; it's much more common to always directly dump the value. "displayMonetaryAmount" is likely to be very simple, almost data-like, and easy to change, rather then changing the code everywhere that displays money, which is virtually impossible to correct.

    There's a lot of value in that approach that is missed out on by a lot of programmers.

  9. Mmmmmm by jpsst34 · · Score: 1
    --
    How are you going to keep them down on the farm once they've seen Karl Hungus?
  10. embedded scripting engine by wotevah · · Score: 1
    I think he is talking about code implementing rules which are likely to change so he decides to implement a state machine for it. At this stage I wonder if it wouldn't be simpler to just change the code than to program transitions of a state machine in a database :)

    These kind of approaches inevitably end with the programmer inventing a new, personal language for expressing the rules and writing a small interpreter for it (which he basically did).

    The same result and much more can probably be achieved by utilizing a full-featured embedded scripting engine (such as Perl or Python) that is already available. Then again perhaps the entire application can be at least prototyped that way.

    1. Re:embedded scripting engine by tree_frog · · Score: 1
      Funny that. Dave and Andy are well known as leading members of the Ruby language community.

      regards, treefrog

  11. Field dictionaries by Tablizer · · Score: 1

    One technique for programming with meta-deta is data dictionaries

    It takes a while to hone the usage of such things, especially WRT handling unexpected requirements, but there is a point where they start to pay off handsomely in my experience.