Slashdot Mirror


Evolutionary Database Design

Andre Mermegas writes "Check out this article by everybody's favorite object mentor Martin Fowler on database design. Be sure to take a peek at his wonderful books as well."

22 of 171 comments (clear)

  1. relational databases, woo hoo by ademko · · Score: 5, Insightful

    I think it's nice that people are starting to get interested in relational databases again. They really are the backbone of information systems in business, despite what the industry rags will have you believe.

    The "hype" of object-oriented and XML-driven "databases", although aesthetically prettier, have adverse effects on performance and design. Programmers get lazy, applications become sloppy and performance goes into the toilet.

    1. Re:relational databases, woo hoo by esme · · Score: 3, Informative

      I agree that relational databases are the best solution for most problems -- that's why they're the backbones of most apps these days.

      That said, there are some cases where they fall down. One example that I'm working on right now is organizing a million or so smallish documents. The relational design to store the documents with the same degree of specificity as the XML format they are in is ridiculously complicated. But storing them in an XML database (we're using Xindice, but have looked at Tamino, and a few others) is a lot simpler.

      Another downside of relational dbs is that it's generally pretty difficult to change your schema. A lot of XML databases, on the other hand, can be configured to not enforce a schema at all. So if you're working on a problem that requires experimentation on the basic schema, it can be a lot easier to use an XML database (or even just files on disk) instead.

      -Esme

    2. Re:relational databases, woo hoo by King+of+the+World · · Score: 3, Interesting
      Q. Aside from putting it in as a BLOB, How can anyone put a document such as the one he's written into a relational database?

      A. It can't be done.

      Q. What do you lose by putting a document in as a BLOB?

      A. Granularity. The ability to have the database sort and extract parts of files at the tag level. For example, take a site that has an essay spread over 10 pages. Do you store each section as a database record? That's not clean, as what if we want to break that essay up over 5 pages? It seems rather strange to hardcode into the database presentation logic, so webpage = database record is a workable but inelegant model. Do you store the essay as one BLOB and extract substrings? Extracting substrings is certainly not as faster as an XML database, though it is smarter than 'webpage = database record'. And here we have a scenario where an XML database might suit, and would easily outperform a relational database (yes, Tamino or Excelon do outperform relational databases some of the time! ;).

      #######

      Here's a general rule of thumb, kids. When you read a post bashing another database model, bashing another operating system, or bashing another programming language, just realise that the poster is a jock who refuses to see where the alternative suits, and where the alternative doesn't suit. They're not about creating understanding, they're bashers.. They're not informative.

    3. Re:relational databases, woo hoo by VP · · Score: 5, Informative

      I think many people are confusing Relational databases with SQL database - they are not one and the same. In fact, this site, one of the most vocal proponents of relational databases, states that none of the existing SQL databases is a true relational database. A quote from one of their articles ( "Little Relationship to Relational"):

      "Not only do most practitioners think that SQL DBMSs are relational, but they actually blame the problems due to SQL's violations of, or lack of adherence to relational principles on the relational model itself!"

      In my opinion, there is no reason that an object-oriented environment cannot implement the relational model, and thus be a true relational database.

    4. Re:relational databases, woo hoo by sql*kitten · · Score: 3, Informative

      Q. What do you lose by putting a document in as a BLOB?

      A. Granularity. The ability to have the database sort and extract parts of files at the tag level.


      That's not true. If you've the Oracle documentation to hand, read about interMedia (formerly known as ConText). It gives you extensions to SQL to use XPath-like statements to select from within an XML document in a CLOB.

  2. Working Together... by airrage · · Score: 5, Interesting

    In the projects I've worked, I often find that the DBAs are older men or women and the developers are young. So the friction lies in the fact that the young-guns are doing .NET or Java or XML queing and so the DBA is really at a loss to help "the developer think of things he may have not thought about". Of course, on the table-design side, this maybe true. Secondly, due to the age-difference, "popping over the cube" is also difficult as the DBAs (being more mature shall I say) are less likely to be excited about a new paradigm.

    Case in point, when I read in an Oracle PL/SQL book about Nested Tables, the light bulb in my head went off (or lit up, or whatever). Basically, these nested tables were objects with methods (code behind them), however, could be queried like tables. So, instead of selecting say a person's name, birthdate, and calculating an age, I could select name, birthdate, and age (the age column had code behind it automatically calculating the age). Now the beauty of this is for derived quantities that are only used once, but would be burdensome to store, this was a godsend. However, my DBA completely rejected this idea as too untried and new-fangeled.

    This may sound very arrogant, but I think the developer should manage the DBA, often the DBA is a lone-wolf with too much power. Often the poor programmer has to submit changes with about as much hope they'll get done as one might have submitting universe changes to God Almighty.

    --
    "This isn't a study in computer science, its a study in human behavior"
    1. Re:Working Together... by ergo98 · · Score: 5, Insightful

      This may sound very arrogant, but I think the developer should manage the DBA, often the DBA is a lone-wolf with too much power.

      So instead you'd end up with a developer with too much power...

      The thing about the separation of database design from "front-end" design (which could be middleware or front-end applications) is that in most cases the database scheme and I/O design has a shelf life far longer than most front-ends. i.e. Don't see that new web interface as a new system with a new database, but rather as a "one of many" front ends to the back-end database: i.e. the database is of much greater long term importance than any front-end.

      Regarding your particular scenario: Is it possible that you're looking to shoehorn functionality for your particular front-end into a universal back-end where it might not be appropriate? Will every query on the person table suddenly have have the overhead of calculating the persons age because one page in one obscure part of one front end needs it?

    2. Re:Working Together... by ComputerSlicer23 · · Score: 3
      Hmmm, have you ever heard of views? They do specifically this. I've seen nested tables, but the seem to violite all the rules about Relational design. Oh, I'm a young pup developer, whose read enough to be a DBA w/ no experience.



      create view view_name as
      select name, birthdate, age_from_bday( birthdate ) from base_table_name;

      Where age_from_bday is the function used to calculate the number of years.


      Oracle traditionally has problems with new stuff. Okay, Oracle only has problems with esoteric corner cases with new stuff, but I've run across some of them w/ partitioning. So for production stuff, your DBA might be right on the money. Good DBA's and good SA's get paid big bucks to be ultra-conservative, and say "No". That's because they get paid big bucks so when they say "This will work in a production environment for the next ten years with acceptable downtime", they are correct. This is coming from a developer whose had to do his own DBA, SA and production support work because we can't afford a DBA or an SA. I dream of having another team member to ensure the stability of my production system.


      Kirby

    3. Re:Working Together... by sql*kitten · · Score: 5, Interesting

      Secondly, due to the age-difference, "popping over the cube" is also difficult as the DBAs (being more mature shall I say) are less likely to be excited about a new paradigm.

      I guess you haven't been around the industry too long. You see, this is an industry totally driven by fashions and fads (far more so than even the clothing or entertainment industries). Every year there's a slew of new buzzwords and technologies, each of which promises to be the "silver bullet" and a whole new "paradigm" and none of them ever are. So when some bright-eyed bushed-tailed young hotshot announces that he's discovered the solution to the organization's IT ills, all the "old geezers" just roll their eyes, 'cos they've seen it a dozen times before.

      However, my DBA completely rejected this idea as too untried and new-fangeled.

      The problem with many developers is that they see a shiny new feature, can't wait to use it, and you end up with an application in which a dozen different people have solved the same problem a dozen different ways.

      My attitude is usually that a developer can do anything they want... so long as they're willing to carry a pager that might go off at 3AM, and take responsibility for fixing it before the next business day. Amazing how many times they just wanted to try out a new feature without any real need for it.

      In your specific case, you could have done exactly what you wanted to do with a view.

      This may sound very arrogant, but I think the developer should manage the DBA, often the DBA is a lone-wolf with too much power. Often the poor programmer has to submit changes with about as much hope they'll get done as one might have submitting universe changes to God Almighty.

      Yeah, and the accountants use software, so the developers should manage the accountants! And the salesmen! And the canteen staff! After all, a developer wrote the program that prints their paychecks!

      I personally have spent half an hour rewriting a developer's SQL that took the run time down from 15 hours to 9 seconds. Having said that, I don't know all that much about writing, say, MT-safe C++. That's why we have specialists in the first place. I'll bet dollars to donuts that your DBA knows far more about databases than you do, even if you know many more trendy buzzwords than he does.

  3. This article is short and common sense... by dagg · · Score: 5, Interesting
    But it is great to actually read it. Sometimes common sense things need to be written down just to verify that your techniques really do make sense. There are so many great little tidbits in the article, I'm having trouble picking one out to really comment on. Here's one:
    An important part of this approach is iterative development, where you run the entire software life-cycle many times during the life of a project. Agile processes run complete life cycles in each iteration, completing the iteration with working, tested, integrated code for a small subset of the requirements of the final product. These iterations are short, usually running between a week and a couple of months, with a preference towards shorter iterations.

    A big issue with iterative development is that the QA folks will quickly fall behind and become very anxious. What's the solution to that? Either embrace the QA person to get closer to the real development environment, or if that is impossible, get a new QA person. That's the only way to succeed.

    --
    Sex - Find It
  4. The big picture by oliverthered · · Score: 4, Interesting

    While I believe that 'Agile processes' are the best way to develop software, he appears to be advocating anarchy.

    When you have a lot of people all making little changed everyone starts to loose sight of the Big picture and you run into a Too many cooks spoil the broth.

    I'm sure a lot of people who read this site have seen a lot of code and design, and probably a lot of horrific code and design, well enough said.

    --
    thank God the internet isn't a human right.
    1. Re:The big picture by chromatic · · Score: 3, Insightful
      When you have a lot of people all making little changed everyone starts to loose sight of the Big picture and you run into a Too many cooks spoil the broth.

      Why?

      He's big on automated testing. He's big on frequent, small integrations. Ideally, every developer integrates with the bleeding-edge source code at least once a day. That's why the changes are so little! No one strays too long from the work of everyone else.

      It may seem like anarchy at first glance, but if you read a little closer, you'll see that there are strong behaviors in place to prevent that chaos.

  5. Quality assurance by giel · · Score: 4, Interesting

    Just like the actual users the QA folks should be heavily involved in the actual development cycles. Extreme programming states that every development cycles starts with a functional design and the development of tests for each deliverable. Having these available means that QA is able to keep track of the quality of the deliverables very well.

    IMHO if QA cannot keep track of the big picture they fail as QA, because that is just an important part of their job. On the other hand perhaps extreme programming should involve relatively more QA people than 'regular' development methods.

    --
    giel.y contains 2 shift/reduce conflicts
  6. A developer perspective of the world. by municio · · Score: 5, Insightful

    I'm currently working as a developer, but I used to work as a development DBA. In my opinion this article shows the database and the DBA roles in a project from a developer perspective.

    As a general rule, the developers think that the database is there to support their application, which is really the piece that solves the problem. In the other hand DBAs think that the developers are there to support their data model, by supplying an interface with validation and some simple pieces of logic that their store procedures don't cover.

    I have worked much longer as a developer than as a DBA, but I still find it funny that the article assumes that the developer should be able to add a column to a table freely and the incorporate the changes to the main database. This is the equivalent of saying the DBA should be able to freely change a class or an interface and then add the changes to the source control repository.

    While not wrong in itself, it clearly shows that many developers consider the DBA role secondary to the developer. It goes something like this: I can somehow do some DBA tasks that impact the development like adding tables to the schema, I just don't want a get involved in the boring parts (backups, recovery or replicating schemas).

    I think that creating a good data model is as difficult as creating a good application design and doing a decent store procedure as hard as doing an efficient method. While some DBAs can write very good C++/Java code and some developers can design very good data models, no one should be doing each other job unless they really, really, really know what they are doing.

    As a general rule of thumb, if you consider that mySQL is a better database for large complex applications as PostgreSQL or Oracle, you should not be doing any database work.

    1. Re:A developer perspective of the world. by roundand · · Score: 3, Insightful

      In my opinion this article shows the database and the DBA roles in a project from a developer perspective.

      True but perhaps irrelevent. Sometimes the most important difference between DBAs and developers is not their technical skills but their attitude to change. The nature of the job is that a DBA tends to be a bit like a soccer goalkeeper - he's not rewarded for scoring goals (adding new features that responding to user requirements more rapidly than anyone else) - all he gets is the blame if allows goals in (lets someone break the database). The nature of the job tends to reward defensiveness.

      The result - semantic corruption, with any amount of database re-use, however dirty, prefered to re-factoring. Like my insurance client in 1999, who were using 2000-01-01 as the null value in some of their date columns...

      It's a really good article. We're doing a fair amount of the recommendations already, I can confirm the value of the tight DB layer, and having good test data packs from the start. In fact I'd go further - it doesn't matter whether you think you're doing waterfall or iterative, you will have to change the DB and you might as well work out how to do it efficiently.

  7. Works for OODB as well. by bokmann · · Score: 4, Interesting

    I work on a project of about a dozen developers, some os us geographically diverse. We use an Object-Oriented Database with Java (Database is from Versant).

    We don't worry about *any* kind of DB administrator. Each developer has their own instance of the database. We don't worry about schema changes that break the database, because we *also* have a way to import/export the database to an XML file. Thus, if the schema radically changes from the deployed version, we just export to XML and re-import, so there is no complex though about 'schema evolution' necessary,

    Of course, with everyone having such free ability to make changes that impacts the format of the data store, we need good unit tests to make sure things don't break unintentionally. This is actually one area we need to improve upon. People can make changes that affect the schema easily, and most times, its not an issue... but a lot of times people make changes that would impact the XML format, and they don't always handle it properly.

    Unit Tests are Key, but that's nothing new to the concept of refactoring.

  8. Done similar things by adamy · · Score: 3, Interesting

    In a small shop (4 people) we had a similar setup. We were doing J2EE/JBoss/Tomcat work and used PostgreSQL as the back end. We had no full time DBA or Sys Admin. We had to be flexible.

    The Good: Database changes were part of development. When our system worked right (58.3% of the time) All changes would go through QA, a small fix cycle, and we would push code and database changes during the evening (we were running a web site for people who used it during the business day only).

    The Bad: People tended to develop with live data. The main problem with this was that if something changed, It might break a unit test.This could be a real problem if Sensitive Data was involved as well.

    The Mechanism: We had a script (PERL) that executed a list of sql statements embedded. For a revision, we changed the name of the script (the first script ended in 0001, which for the next revision would be 0002 etc) and checked if that reision had been executed, and if all the previous revisions had been executed; script 0008 could only be run after script 0007 had been run etc. We had a single table that kept track of the current revision.

    The nightly backup from live was dumped into the Integration database. The current update script would be run against it, then all functional tests. Every few days we would push code live. Yes, every few days. This was an organisational issue and yes, it casue a lot of headaches.

    To improve:
    Better QA. We should have had a batch of scripts that could be run agains the DB. Instead, our QA person had to run through them manually.

    We should have had test data for the tables that were primarily used for collecting and reporting, instead of runnning against live data. I

    Instead of a self executing perl script being the required mechanism, I would have an executable on a machine that tracked the schema name, schema version number ,and currently available script. Scripts would be primarily straight SQL, run through a single program, and targeted against multiple schemas. Hmm, maybe I should explain more.

    For a give application, we had 3 databases running. One was for collection, one was for reporting, and one was reference data. Only the collection database was backup up, as the other two could be regenerated from source. However, we still needed to revision conrtol the reference databases source, and it was faster to send patches than to update the whole thing each time.

    We started moving to an Application Service Provider (ASP) setup where we uysed the same schemas, but with different data in them, for different clients. Since our clients were Local governments (County) we wanted to A) Be able to get a new county up and runniong quickly if we got them as a client, B)Be able to dump whatever existing data they had into that counties db quickly, C) Keep the counties data separate from each other, D) Maintain our versioning. So certain scripts had to be run against County X data of the collection scheme. If I had to do it agains, with what I know now,I'd have had a revision control database that kept track of the other databases (and be self maintaining, why not) With Schema name and data set name, current revision for each.

    --
    Open Source Identity Management: FreeIPA.org
  9. The key point by PinglePongle · · Score: 5, Insightful

    is iterative design. Which is becoming fairly widely accepted in OO circles, and almost universally accepted in Agile circles.

    Databases, however, are a lot harder to iterate - the cost of change is higher than with any other code. Martin Fowler is laying down an approach to manage (not reduce - manage) that cost, and it all comes down to a guess we have to make - do we think the overall cost/benefit tradeoff of an iterative process is better than a Big Design Up Front process ?

    On the eXtreme Programming mailing list, there's been a lot of discussion about how to deal with databases - some deny the need for databases altogether, some advocate using Mock Objects for testing and even development etc. It all boils down to the cost of change - it's expensive to change a database design because it is very hard to identify the knock-on effects. Some changes are relatively easy to manage - adding a column is unlikely to actually break anything - but others can wreak havoc with existing applications - changing the type or size of a column for instance.

    I'd love to think that the next big improvement in software development tools is not going to be yet another language but a sensible way of tying objects to their persisted data. All the solutions I've seen so far are bolted-on - they either force the database into unnatural positions, or make the objects fit into a model that's not quite what they'd be otherwise.

    In the meantime, this article is well worth investigating - the idea of evolving the datamodel in tandem with the migration scripts is very powerful.

    --
    It's all very well in practice, but it will never work in theory.
  10. "scattered willy-nilly" by Tablizer · · Score: 3, Interesting

    To understand the consequences of database refactorings, it's important to be able to see how the database is used by the application. If SQL is scattered willy-nilly around the code base, this is very hard to do. As a result it's important to have a clear database access layer to show where the database is being used and how.....Having a clear database layer has a number of valuable side benefits. It minimizes the areas of the system where developers need SQL knowledge to manipulate the database, which makes life easier to developers who often are not particularly skilled with SQL. For the DBA it provides a clear section of the code that he can look at to see how the database is being used. This helps in preparing indexes, database optimization, and also looking at the SQL to see how it could be reformulated to perform better. This allows the DBA to get a better understanding of how the database is used.

    I disagree with this more or less. SQL is often too closely related to the application to put in a separate place. You have to go hunting back and forth to see and manage the relationship. If there is duplication, then I agree that it should be factored to a shared spot. However, beyond cleaning duplication, keep the SQL near where it is used.

    If you want to be able to track it, then put some kind of comment marker that a grep-like utility can use to find and gather the SQL if need be. Both approaches are conventions anyhow. My suggestion gives the best of both worlds.

    Also, OO proponents tend to use simpler SQL, and thus it might be easier to put wrappers around such trivial SQL. However, a more balanced approach is the use the full power the DBMS rather than re-invent it in your application. Hand-built indexing, joins, filtering, multi-user contention management etc. built into app code is a common sin of OO design IMO, including Fowler's designs, I am sorry to say.

    Sometimes I think many OO'ers are motivated by a desire for control, and that is why they would rather reinvent the DB rather than use an existing one. But that is anti-reuse IMO.

    Relational thinking and OO tend to be at odds, either way. It is not practical to have both manage the noun models and noun views IMO. Pick one or the other, I say. Thus, either get an OODBMS, or relax the OO design to not duplicate and fight with the RDB and query languages.

    1. Re:"scattered willy-nilly" by NineNine · · Score: 3, Interesting

      I've seen projects literally ruined by OO zealots who simply refused to use the database for what it was for. They'd do a "select *" and put a wrapper around it, and object(ify?) the whole damn chunk of data. I've seen this happen in several different projects, despite my protesting. Suffice to say each project done like that flopped due to 1. Very difficult to maintain code and 2. Serious performance problems. It actually caused the collapse of a 80+ person company in one instance. The project didn't make it to the key customers in time due to this shitty architecture.

  11. The problem is how developers design! by MattRog · · Score: 3, Interesting

    It seems like he's coming up for a solution to a developer-induced problem, not a problem with DBMS' or DBAs in general.

    If you normalize you don't have to worry about null-ability of columns. You don't add and drop columns (usually). Pack those tables behind views and your application doesn't need to change a thing:
    CREATE VIEW user_info AS
    SELECT *
    FROM user u
    INNER JOIN user_detail ud ON u.username = ud.username
    -- etc

    Then your app would simply select from that view.

    The problem is that developers don't take the time to properly learn relational theory, instead content to know the basic semantics of the SQL language and call themselves 'fluent in SQL'. They know how to create tables in their GUI of choice (or even code it by hand) and are 'database designers'.

    Another benefit is stored procedures. Although abused to include procedural logic in the database, they can help keep database logic out of your application and generally help much along the same lines as views:
    CREATE PROCEDURE get_user_info AS
    SELECT *
    FROM user u
    INNER JOIN user_detail ud ON u.username = ud.username
    -- etc

    In this case the stored procedure would be called in your application.

    --

    Thanks,
    --
    Matt
  12. Re:"Automatically Update all Database Developers" by dubl-u · · Score: 3, Insightful

    Whether it's a schema change or a data change, a change like that mid-way through development is a serious decision, and shouldn't be undertaken lightly.

    That's one way to develop.

    Or, instead, you could assume that change is a given and tune your development process for that.

    The DBA and developers need to work together to work out an ideal schema. Ideally, it's msotly worked out *before* any coding is done. DB objects should not be done on the fly in msot cases.

    Yes, ideally all requirements are identified, and then all the design is done, and then all of the code is written. Ideally. And ideally, you could remove the backspace key from they keyboard.

    Alas, most of us don't get to code in an ideal world. The premise behind the various Agile methods (Extreme Programming, Crystal, Scrum, FDD, etc.) is that since the world isn't ideal, we might as well pick development methods that are tuned for the world we live in.

    Interesting notion, eh?