Evolutionary Database Design
Andre Mermegas writes "Check out this article by everybody's favorite object mentor Martin Fowler on database design. Be sure to take a peek at his wonderful books as well."
← Back to Stories (view on slashdot.org)
FP
Database designs YOU!
Indexing of the database has been ruined by slow decimal to binary conversion routines. Decimal ruins yet another thing.
I actually get to read an article BEFORE it is knocked off the net by the slashdot effect!
X(7): A program for managing terminal windows. See also screen(1).
Instead of doing a brute-force crack of the private key, why not use an intelligent algorithm for cracking it? As I understand it, the other distributed.net projects used brute-force just to show that how much time a brute-force could take. If this project is really about discovering the key and not about seeing how long it takes to stumble upon it at random, then shouldn't they use a smart algorithm?
nt
Dealing with Change
One of the primary features of agile methods is their attitude towards change. Most of the thinking about software process is about understanding requirements early, signing off on these requirements, using the requirements as a basis for design, signing off on that, and then proceeding with construction. This is a plan-driven cycle, often referred to (usually with derision) as the waterfall approach
Such approaches look to minimize changes by doing extensive up-front work. Once the early work is done, changes cause significant problems. As a result such approaches run into trouble if requirements are changing, and requirements churn is a big problem for such processes.
Agile processes approach change differently. They seek to embrace change, allowing changes to occur even late in a development project. Changes are controlled, but the attitude of the process is to enable change as much as possible. Partly this is in response to the inherent instability of requirements in many projects, partly it is to better support dynamic business environments by helping them change with the competitive pressures.
In order to make this work, you need a different attitude to design. Instead of thinking of design as a phase, which is mostly completed before you begin construction, you look at design as an on-going process that is interleaved with construction, testing, and even delivery. This is the contrast between planned and evolutionary design. One of the vital contributions of agile methods is that they have come up with practices that allow evolutionary design to work in a controlled manner. So instead of the common chaos that often happens when design isn't planned up-front, these methods provide techniques to control evolutionary design and make them practical.
An important part of this approach is iterative development, where you run the entire software life-cycle many times during the life of a project. Agile processes run complete life cycles in each iteration, completing the iteration with working, tested, integrated code for a small subset of the requirements of the final product. These iterations are short, usually running between a week and a couple of months, with a preference towards shorter iterations.
While these techniques have grown in use and interest, one of the biggest questions is how to make evolutionary design work for databases. Most people consider that database design is something that absolutely needs up-front planning. Changing the database schema late in the development tends to cause wide-spread breakages in application software. Furthermore changing a schema after deployment result in painful data migration problems.
Over the course of the last three years we've been involved in a large project (called Atlas) that has used evolutionary database design and made it work. The project involved almost 100 people in multiple sites world-wide (US, Australia, and India). It is around half a million lines of code and has over 200 tables. The database evolved during a year and a half of initial development and continues to evolve even though it's in production for multiple customers. During this project we started with iterations of a month, but after a few months changed to two week iterations which worked better. The techniques we describe here are the ones that we (or more accurately Pramod) used to make this work.
Since that project got going we've spread these techniques over more of our projects, gaining more experience from more cases. We've also found inspiration, ideas, and experience from other agile projects.
Limitations
Before we dive into the techniques, it's important to state that we haven't solved all the problems of evolutionary database design. In particular:
* We developed an application database for a single application rather than an integration database that tries to integrate multiple databases.
* We don't have to keep the production databases up 24/7
We don't consider these problems to be inherently unsolvable, after all many people believed we couldn't solve this one. But until we do, we won't claim we can solve them either.
The Practices
Our approach to evolutionary database design depends on a handful of important practices.
DBAs collaborate closely with developers
One of the tenets of agile methods is that people with different skills and backgrounds need to collaborate very closely together. They can't communicate mainly through formal meetings and documents. Instead they need to be out talking with each other and working with each other all the time.Everybody is affected by this: analysts, PMs, domain experts, developers... and DBAs.
Every task that a developer works on potentially needs a DBA's help. Both the developers and the DBA need to consider whether a development task is going to make a significant change to the database schema. If so the developer needs to consult with the DBA to decide how to make the change. The developer knows what new functionality is needed, and the DBA has a global view of the data in the application.
To make this happen the DBA has to make himself approachable and available. Make it easy for a developer to just pop over for a few minutes and ask some questions. Make sure the DBAs and developers sit close to each other so they can easily get together. Ensure that application design sessions are known about so the DBA can pop in easily. In many environments we see people erecting barriers between the DBA and application development functions. These barriers must come down for an evolutionary database design process to work.
Everybody gets their own database instance
Evolutionary design recognizes that people learn by trying things out. In programming terms developers experiment with how to implement a certain feature and may make a few attempts before settling down to a preferred alternative. Database design can be like that too. As a result it's important for each developer to have their own sandbox where they can experiment, and not have their changes affect anyone else.
Many DBA experts see multiple databases as anathema, too difficult to work in practice, but we've found that you can easily manage a hundred or so database instances. The vital thing is to have to tools to allow you to manipulate databases much as you would manipulate files.
Developers frequently integrate into a shared master
Although developers can experiment frequently in their own area, it's important to bring the different approaches back together again frequently. An application needs a shared master database that all work flows from. When a developer begins a task they copy the master into their own workspace, manipulate, and then integrate their changes back into the master. As a rule of thumb each developer should integrate once a day.
Let's take an example where Mike starts a development task at 10am (assuming he actually comes in that early). As part of this task he needs to change the database schema. If the change is easy, like adding a column, he just decides how to make the change himself, Mike also makes sure the column he wants to add does not already exist in the database, with the help of the data dictionary (discussed later). If it's more complicated then he grabs the DBA and talks over the likely changes with him.
Once he's ready to begin he takes a copy of the database master and can modify both the database schema and code freely. As he's in a sandbox any changes he makes don't impact anyone else's. At some point, say around 3pm, he's pretty comfortable that he knows what the database change needs to be, even though he's not completely done with his programming task. At that point he grabs the DBA, and tells him about the change. At this point the DBA can raise any issues that Mike hasn't considered. Most of the time all is well and the DBA goes off and makes the change (by applying one or more database refactorings, which we'll come to below). The DBA makes the changes right away (unless they are destructive changes - again more on that below). Mike can continue to work on his task and commit his code any time he likes once the DBA has applied these changes to the master.
You may well recognize this principle as similar to the practice of Continuous Integration, which is applied to source code management. Indeed this is really about treating the database as another piece of source code. As such the master database is kept under configuration management in much the same way as the source code. Whenever we have a successful build, the database is checked into the configuration management system together with the code, so that we have a complete and synchronized version history of both.
With source code, much of the pain of integration is handled by source code control systems. For databases there's a bit more effort involved. Any changes to the database need to done properly, as automated database refactorings, which we'll discuss shortly. In addition the DBA needs to look at any database changes and ensure that it fits within the overall scheme of the database schema. For this to work smoothly, big changes shouldn't come as surprises at integration time - hence the need for the DBA to collaborate closely with the developers.
We emphasize integrating frequently because we've found that it's much easier to do frequent small integrations rather than infrequent large integrations. It seems that the pain of integration increases exponentially with the size of the integration. As such doing many small changes is much easier in practice, even though it often seems counter-intuitive to many. This same effect's been noticed by people in the Software Configuration Management community for source code.
A database consists of schema and test data
When we talk about a database here, we mean not just the schema of the database, but also a fair amount of data. This data consists of common standing data for the application, such as the inevitable list of all the states in the US, and also sample test data such as a few sample customers.
The data is there for a number of reasons. The main reason is to enable testing. We are great believers in using a large body of automated tests to help stabilize the development of an application. Such a body of tests is a common approach in agile methods. For these tests to work efficiently, it makes sense to work on a database that is seeded with some sample test data, which all tests can assume is in place before they run.
As well as helping test the code, this sample test data also allows to test our migrations as we alter the schema of the database. By having sample data, we are forced to ensure that any schema changes also handle sample data.
In most projects we've seen this sample data be fictional. However in a few projects we've seen people use real data for the samples. In these cases this data's been extracted from prior legacy systems with automated data migration scripts. Obviously you can't migrate all the data right away, as in early iterations only a small part of the database is actually built. But the idea is to iteratively develop the migration scripts just as the application and the database are developed iteratively. Not just does this help flush out migration problems early, it makes it much easier for domain experts to work with the growing system as they are familiar with the data they are looking at and can often help to identify problem cases that may cause problems for the database and application design. As a result we are now of the view that you should try to introduce real data from the very first iteration of your project.
All changes are database refactorings
The technique of refactoring is all about applying disciplined and controlled techniques to changing an existing code base. Similarly we've identified several database refactorings that provide similar control and discipline to changing a database.
One of the big differences about database refactorings is that they involve three different changes that have to be done together
* Changing the database schema
* Migrating the data in the database
* Changing the database access code
Thus whenever we describe a database refactoring, we have to describe all three aspects of the change and ensure that all three are applied before we apply any other refactorings.
We are still in the process of documenting the various database refactorings, so we aren't able to go into detail on them yet. However there are a few things we can point out. Like code refactorings, database refactorings are very small. The concept of chaining together a sequence of very small changes is much the same for databases as it is for code. The triple nature of the change makes it all the more important to keep to small changes.
Many database refactorings, such as adding a column, can be done without having to update all the code that accesses the system. If code uses the new schema without being aware of it, the column will just go unused. Many changes, however don't have this property. We call these destructive changes, an example of which is making an existing nullable column not null.
Destructive changes need a bit more care, the degree of which depends on the degree of destruction involved. An example of a minor destructive change is that of changing a column from nullable to not null. In this case you can probably just go ahead and do it. The refactoring will take care of any data in the database that's null. Usually the only developer who cares about this property is the one who requested the change, and that developer will update the database mapping code. As a result the update won't break anyone else's code and if by some strange chance it does, they find out as soon as they run a build and use their tests. (On our large project we gave ourselves some extra breathing space by waiting a week before making the database change.)
Splitting a heavily used table into two however is a rather more complicated case. In this case it's important to let everyone know that the change is coming up so they can prepare themselves for it. In addition it's worth waiting for a safer moment to make the change. (These kinds of changes we would defer until the start of a new iteration - we like to use iterations of two weeks or less).
The important thing here is to choose a procedure that's appropriate for the kind of change that you're making. If in doubt try to err on the side of making changes easier. Our experience is that we got burned much less frequently than many people would think, and with a strong configuration control of the entire system it's not difficult to revert should the worst happen.
Automate the refactorings
In the world of code we are seeing tools for some languages to automate many of the identified refactorings. Such automation is essential for databases; at least in the areas of schema changes and data migration.
As a result every database refactoring is automated by writing it in the form of SQL DDL (for the schema change) and DML (for the data migration). These changes are never applied manually, instead they are applied to the master by running a small SQL script to perform the changes.
Once done, we keep hold of these script files to produce a complete change log of all the alterations done to the database as a result of database refactorings. We can then update any database instance to the latest master by running the change log of all the changes since we copied the master to produce the older database instance.
This ability to sequence automated changes is an essential tool both for the continuous integration process in development, and for migrating production databases to a new release.
For production databases we don't make changes during the usual iteration cycles. Once we do a release, which may occur at the end of any iteration, we apply the full change log of database refactorings since the previous release. This is a big change, and one that so far we've only done by taking the application offline. (We have some ideas for doing this in a 24/7 environment, but we haven't actually had to do it yet.) It's also wise to test this migration schema before applying it to the live database. So far, we've found that this technique has worked remarkably well. By breaking down all the database changes into a sequence of small, simple changes; we've been able to make quite large changes to production data without getting ourselves in trouble.
As well as automating the forward changes, you can consider automating reverse changes for each refactoring. If you do this you'll be able to back out changes to a database in the same automated way. We haven't done this yet, as we've not had a much demand for it, but it's the same basic principle.
(A similar thing that we have done is to support an old version of an application with an updated version of the database. This involved writing a compatibility layer that allowed the application to think it was talking to the older version of the database even though it was actually talking to the newer one.)
Automatically Update all Database Developers
It's all very well for people to make changes and update the master, but how do they find out the master has changed? In a traditional continuous integration environment with source code, developers update to the master before doing a commit. That way they can resolve any build issues on their own machine before committing their changes to the shared master. There's no reason you can't do that with the database, but we found a better way.
We automatically update everyone on the project whenever a change is made to the database master. The same refactoring script that updates the master automatically updates everyone's databases. When we've described this, people are usually concerned that automatically updating developers databases underneath them will cause a problem, but we found it worked just fine.
This only worked when people were connected to the network. If they worked offline, such as on an airplane, then they had to resync with the master manually once they got back to the office.
Clearly separate all database access code
To understand the consequences of database refactorings, it's important to be able to see how the database is used by the application. If SQL is scattered willy-nilly around the code base, this is very hard to do. As a result it's important to have a clear database access layer to show where the database is being used and how. To do this we suggest following one of the data source architectural patterns from P of EAA.
Having a clear database layer has a number of valuable side benefits. It minimizes the areas of the system where developers need SQL knowledge to manipulate the database, which makes life easier to developers who often are not particularly skilled with SQL. For the DBA it provides a clear section of the code that he can look at to see how the database is being used. This helps in preparing indexes, database optimization, and also looking at the SQL to see how it could be reformulated to perform better. This allows the DBA to get a better understanding of how the database is used.
Variations
Like any set of practices, these should be varied depending on your specific circumstances. These practices are still pretty new, so we haven't come across that many variations, but here are some we have.
Keeping multiple database lineages
A simple project can survive with just a single database master in the repository. With more complex projects there's a need to support multiple varieties of the project database, which we refer to as database linages. We may create a new lineage if we have to branch an application that's put into production. In essence creating a new database lineage is much the same as branching the source code on the application, with the added twist that you also make a lineage when you need a different set of sample data, such as if you need a lot of data for performance testing.
When a developer takes a copy of a master they need to register which lineage they are modifying. As the DBA applies updates to a master for a particular lineage the updates propagate to all the developers who are registered for that lineage.
You don't need a DBA
All of this sounds like it would be a lot of work, but in fact it doesn't require a huge amount of manpower. On the Atlas project we had thirty-odd developers and a team size (including, QA, analysts and management) of close to a hundred. On any given day we would have a hundred or so copies of various lineages out on people's workstations. Yet all this activity needed only one full time DBA (Pramod) with a couple of developers doing some part-time assistance and cover.
On smaller projects even that isn't needed. We've been using these techniques on a number of smaller projects (about a dozen people) and we find these projects don't need a full time DBA. Instead we rely on a couple of developers with an interest in DB issues who handle the DBA tasks part-time.
The reason for this is automation. If you are determined to automate every task, you can handle a lot work with much less people.
Tools to Help
Doing this kind of thing requires a lot of repetitive tasks. The good news is that whenever you run into run into repetitive tasks in software development you are ideally placed to automate them. As a result we've developed a fair amount of often simple tools to help us.
One of the most valuable pieces of automation is a simple set of scripts for common database tasks.
* Bring a user up to date with the current master.
* Create a new user
* Copy a database schema, for example Sue finds a bug with her database, now Mike can copy Sue's database and try to debug the application
* Move a database, for example from a workstation to a different workstation, this is essentially Copy database and Delete database combined as one
* Drop a user
* Export a user so team members can make offine backups of the database that they are working with.
* Import a user, so if the team members have a backup copy of the database, they can import the backup and create a new schema.
* Export a baseline - make a backup copy of the master database. This is a specialized case of Export a User
* Create a difference report of any number of schemas, so that Mike can find out what is different structurally between his database and Sue's.
* Diff a schema against the master, so that developers can compare their local copy against the master.
* List all the users
Analysts and QA folks often need to look at the test data in the database and to be able to easily change it. For that we created an Excel application with VBA scripts to pull data down from the database into an excel file, allow people to edit the file, and send the data back up to the database. Although other tools exist for viewing and editing the contents of a database, excel works well because so many people are familiar with it.
Everybody on the project needs to be able to explore the database design easily, that way they can find out what tables are available and how they are used. We built an HTML based toos to do this that used servlets to query database metadata. We did the data modeling using ERwin and pulled data from ERwin into our own metadata tables.
Further Steps and Further Information
This is by no means the last word on the subject of evolutionary database design. We certainly want to see if and how we can extend these techniques to integration databases, 24/7 operation, and other problem areas that we haven't run into yet.
If you'd like to find out more about this, or talk about your own experiences, Pramod has started a yahoo egroup for agile databases. Pramod is also starting to talk about these techniques at various conferences, so you may get a chance to talk to him directly. Naturally we also do consulting on this stuff too.
I think it's nice that people are starting to get interested in relational databases again. They really are the backbone of information systems in business, despite what the industry rags will have you believe.
The "hype" of object-oriented and XML-driven "databases", although aesthetically prettier, have adverse effects on performance and design. Programmers get lazy, applications become sloppy and performance goes into the toilet.
In the projects I've worked, I often find that the DBAs are older men or women and the developers are young. So the friction lies in the fact that the young-guns are doing .NET or Java or XML queing and so the DBA is really at a loss to help "the developer think of things he may have not thought about". Of course, on the table-design side, this maybe true. Secondly, due to the age-difference, "popping over the cube" is also difficult as the DBAs (being more mature shall I say) are less likely to be excited about a new paradigm.
Case in point, when I read in an Oracle PL/SQL book about Nested Tables, the light bulb in my head went off (or lit up, or whatever). Basically, these nested tables were objects with methods (code behind them), however, could be queried like tables. So, instead of selecting say a person's name, birthdate, and calculating an age, I could select name, birthdate, and age (the age column had code behind it automatically calculating the age). Now the beauty of this is for derived quantities that are only used once, but would be burdensome to store, this was a godsend. However, my DBA completely rejected this idea as too untried and new-fangeled.
This may sound very arrogant, but I think the developer should manage the DBA, often the DBA is a lone-wolf with too much power. Often the poor programmer has to submit changes with about as much hope they'll get done as one might have submitting universe changes to God Almighty.
"This isn't a study in computer science, its a study in human behavior"
You know, by the RIAA/MPAA's logic, if we wanted to stop spam we should just make all database programs illegal...
Repeal the DMCA!
As it's turned out these have been very valuable as we have started to use .NET in 2002.
Such a person should not be referred to on Slashdot.
A big issue with iterative development is that the QA folks will quickly fall behind and become very anxious. What's the solution to that? Either embrace the QA person to get closer to the real development environment, or if that is impossible, get a new QA person. That's the only way to succeed.
Sex - Find It
While I believe that 'Agile processes' are the best way to develop software, he appears to be advocating anarchy.
When you have a lot of people all making little changed everyone starts to loose sight of the Big picture and you run into a Too many cooks spoil the broth.
I'm sure a lot of people who read this site have seen a lot of code and design, and probably a lot of horrific code and design, well enough said.
thank God the internet isn't a human right.
Just like the actual users the QA folks should be heavily involved in the actual development cycles. Extreme programming states that every development cycles starts with a functional design and the development of tests for each deliverable. Having these available means that QA is able to keep track of the quality of the deliverables very well.
IMHO if QA cannot keep track of the big picture they fail as QA, because that is just an important part of their job. On the other hand perhaps extreme programming should involve relatively more QA people than 'regular' development methods.
giel.y contains 2 shift/reduce conflicts
Bad day at the office?
Stoptional
I'm currently working as a developer, but I used to work as a development DBA. In my opinion this article shows the database and the DBA roles in a project from a developer perspective.
As a general rule, the developers think that the database is there to support their application, which is really the piece that solves the problem. In the other hand DBAs think that the developers are there to support their data model, by supplying an interface with validation and some simple pieces of logic that their store procedures don't cover.
I have worked much longer as a developer than as a DBA, but I still find it funny that the article assumes that the developer should be able to add a column to a table freely and the incorporate the changes to the main database. This is the equivalent of saying the DBA should be able to freely change a class or an interface and then add the changes to the source control repository.
While not wrong in itself, it clearly shows that many developers consider the DBA role secondary to the developer. It goes something like this: I can somehow do some DBA tasks that impact the development like adding tables to the schema, I just don't want a get involved in the boring parts (backups, recovery or replicating schemas).
I think that creating a good data model is as difficult as creating a good application design and doing a decent store procedure as hard as doing an efficient method. While some DBAs can write very good C++/Java code and some developers can design very good data models, no one should be doing each other job unless they really, really, really know what they are doing.
As a general rule of thumb, if you consider that mySQL is a better database for large complex applications as PostgreSQL or Oracle, you should not be doing any database work.
warning hidden goatse link above
I work on a project of about a dozen developers, some os us geographically diverse. We use an Object-Oriented Database with Java (Database is from Versant).
We don't worry about *any* kind of DB administrator. Each developer has their own instance of the database. We don't worry about schema changes that break the database, because we *also* have a way to import/export the database to an XML file. Thus, if the schema radically changes from the deployed version, we just export to XML and re-import, so there is no complex though about 'schema evolution' necessary,
Of course, with everyone having such free ability to make changes that impacts the format of the data store, we need good unit tests to make sure things don't break unintentionally. This is actually one area we need to improve upon. People can make changes that affect the schema easily, and most times, its not an issue... but a lot of times people make changes that would impact the XML format, and they don't always handle it properly.
Unit Tests are Key, but that's nothing new to the concept of refactoring.
In a small shop (4 people) we had a similar setup. We were doing J2EE/JBoss/Tomcat work and used PostgreSQL as the back end. We had no full time DBA or Sys Admin. We had to be flexible.
,and currently available script. Scripts would be primarily straight SQL, run through a single program, and targeted against multiple schemas. Hmm, maybe I should explain more.
The Good: Database changes were part of development. When our system worked right (58.3% of the time) All changes would go through QA, a small fix cycle, and we would push code and database changes during the evening (we were running a web site for people who used it during the business day only).
The Bad: People tended to develop with live data. The main problem with this was that if something changed, It might break a unit test.This could be a real problem if Sensitive Data was involved as well.
The Mechanism: We had a script (PERL) that executed a list of sql statements embedded. For a revision, we changed the name of the script (the first script ended in 0001, which for the next revision would be 0002 etc) and checked if that reision had been executed, and if all the previous revisions had been executed; script 0008 could only be run after script 0007 had been run etc. We had a single table that kept track of the current revision.
The nightly backup from live was dumped into the Integration database. The current update script would be run against it, then all functional tests. Every few days we would push code live. Yes, every few days. This was an organisational issue and yes, it casue a lot of headaches.
To improve:
Better QA. We should have had a batch of scripts that could be run agains the DB. Instead, our QA person had to run through them manually.
We should have had test data for the tables that were primarily used for collecting and reporting, instead of runnning against live data. I
Instead of a self executing perl script being the required mechanism, I would have an executable on a machine that tracked the schema name, schema version number
For a give application, we had 3 databases running. One was for collection, one was for reporting, and one was reference data. Only the collection database was backup up, as the other two could be regenerated from source. However, we still needed to revision conrtol the reference databases source, and it was faster to send patches than to update the whole thing each time.
We started moving to an Application Service Provider (ASP) setup where we uysed the same schemas, but with different data in them, for different clients. Since our clients were Local governments (County) we wanted to A) Be able to get a new county up and runniong quickly if we got them as a client, B)Be able to dump whatever existing data they had into that counties db quickly, C) Keep the counties data separate from each other, D) Maintain our versioning. So certain scripts had to be run against County X data of the collection scheme. If I had to do it agains, with what I know now,I'd have had a revision control database that kept track of the other databases (and be self maintaining, why not) With Schema name and data set name, current revision for each.
Open Source Identity Management: FreeIPA.org
is iterative design. Which is becoming fairly widely accepted in OO circles, and almost universally accepted in Agile circles.
Databases, however, are a lot harder to iterate - the cost of change is higher than with any other code. Martin Fowler is laying down an approach to manage (not reduce - manage) that cost, and it all comes down to a guess we have to make - do we think the overall cost/benefit tradeoff of an iterative process is better than a Big Design Up Front process ?
On the eXtreme Programming mailing list, there's been a lot of discussion about how to deal with databases - some deny the need for databases altogether, some advocate using Mock Objects for testing and even development etc. It all boils down to the cost of change - it's expensive to change a database design because it is very hard to identify the knock-on effects. Some changes are relatively easy to manage - adding a column is unlikely to actually break anything - but others can wreak havoc with existing applications - changing the type or size of a column for instance.
I'd love to think that the next big improvement in software development tools is not going to be yet another language but a sensible way of tying objects to their persisted data. All the solutions I've seen so far are bolted-on - they either force the database into unnatural positions, or make the objects fit into a model that's not quite what they'd be otherwise.
In the meantime, this article is well worth investigating - the idea of evolving the datamodel in tandem with the migration scripts is very powerful.
It's all very well in practice, but it will never work in theory.
To understand the consequences of database refactorings, it's important to be able to see how the database is used by the application. If SQL is scattered willy-nilly around the code base, this is very hard to do. As a result it's important to have a clear database access layer to show where the database is being used and how.....Having a clear database layer has a number of valuable side benefits. It minimizes the areas of the system where developers need SQL knowledge to manipulate the database, which makes life easier to developers who often are not particularly skilled with SQL. For the DBA it provides a clear section of the code that he can look at to see how the database is being used. This helps in preparing indexes, database optimization, and also looking at the SQL to see how it could be reformulated to perform better. This allows the DBA to get a better understanding of how the database is used.
I disagree with this more or less. SQL is often too closely related to the application to put in a separate place. You have to go hunting back and forth to see and manage the relationship. If there is duplication, then I agree that it should be factored to a shared spot. However, beyond cleaning duplication, keep the SQL near where it is used.
If you want to be able to track it, then put some kind of comment marker that a grep-like utility can use to find and gather the SQL if need be. Both approaches are conventions anyhow. My suggestion gives the best of both worlds.
Also, OO proponents tend to use simpler SQL, and thus it might be easier to put wrappers around such trivial SQL. However, a more balanced approach is the use the full power the DBMS rather than re-invent it in your application. Hand-built indexing, joins, filtering, multi-user contention management etc. built into app code is a common sin of OO design IMO, including Fowler's designs, I am sorry to say.
Sometimes I think many OO'ers are motivated by a desire for control, and that is why they would rather reinvent the DB rather than use an existing one. But that is anti-reuse IMO.
Relational thinking and OO tend to be at odds, either way. It is not practical to have both manage the noun models and noun views IMO. Pick one or the other, I say. Thus, either get an OODBMS, or relax the OO design to not duplicate and fight with the RDB and query languages.
Table-ized A.I.
Good comment.
What I really am not crazy about in the article is that it fails to realize that databases live on. Applications come and go. Databases just keep on living. Hence doing a bad database design comes back and haunts you forever....
The project Atlas that has worked for the past two years is a CPU cycle in terms of databases. I know at large corporations that they have databases that have been running for thirty years non stop. It is called a mainframe!
"You can't make a race horse of a pig"
"No," said Samuel, "but you can make very fast pig"
No, and I do mean no Perl programmer ever calls it PERL. It is not an acronym, even though people has been to retrofit one to it at lots of occassions. Unless you are the only person that actually uses the cheap copy with that name, that someone tried to promote a long time ago.
Your credibility goes right out the window if you don't even know that. Reading your post it does seem that you are just throwing out some buzzwords and acronyms, hoping for some cheap karma.
If I wanted to be real nitpicky, there are dozens of other such mistakes - things that are called things they are not and misspelled. Sad, sad.
It seems like he's coming up for a solution to a developer-induced problem, not a problem with DBMS' or DBAs in general.
If you normalize you don't have to worry about null-ability of columns. You don't add and drop columns (usually). Pack those tables behind views and your application doesn't need to change a thing:
CREATE VIEW user_info AS
SELECT *
FROM user u
INNER JOIN user_detail ud ON u.username = ud.username
-- etc
Then your app would simply select from that view.
The problem is that developers don't take the time to properly learn relational theory, instead content to know the basic semantics of the SQL language and call themselves 'fluent in SQL'. They know how to create tables in their GUI of choice (or even code it by hand) and are 'database designers'.
Another benefit is stored procedures. Although abused to include procedural logic in the database, they can help keep database logic out of your application and generally help much along the same lines as views:
CREATE PROCEDURE get_user_info AS
SELECT *
FROM user u
INNER JOIN user_detail ud ON u.username = ud.username
-- etc
In this case the stored procedure would be called in your application.
Thanks,
--
Matt
Right on brow. I always recommend to all of my deralict friends to become dbas. The field has not progressed beyond the formalized theroy in the 70's. Any idiot can read the Oracle manuals and be billed out at over 100 an hour to gate keep a database from developers.
What really gets me are thoese "database designers". These are the dbas who are too embaressed and arrogant to even do the dba tasks. (similar to software architects)...
There is never any reason for developers to have to go through a dba to make a database change. There is no reason ALL database scripts are not managed in source controll systems. There is no reason A DBA needs any sort of interaction with developers during the design and implementation of a database. ALL dba should be considered system administrators that have more knowledge of certain dbms products then the os.
They are important (I am not saying otherwise) but you don't want to be paying one to review schema changes or come up with a box and line diagram for you... that is the work of a software devloper, if they are unable to do this... fire the stupid software engineer, pay more for someone who can tie their own shoe.
As a former DB developer, I can say that if my DBA did this, he's be eating through a straw. Before my sandbox is updated, I *need* to know what is going to be updated and when. Whether it's a schema change or a data change, a change like that mid-way through development is a serious decision, and shouldn't be undertaken lightly.
Also, I should have some input as to what is updated. A DBA shouldn't have complete control over the schema. The DBA and developers need to work together to work out an ideal schema. Ideally, it's msotly worked out *before* any coding is done. DB objects should not be done on the fly in msot cases.
Flamebait time!
Summarising several other replies and adding my own biasses...
Summary: All data storage solutions suck. For your specific application, there will be one or two which suck the least. That's why they pay you.
sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
One of the best things that you can get out of Fowler and Sadalage's article is some idea of the benefits that an object persistence/database abstraction layer can give you. I work with WebObjects, which comes with the Enterprise Objects Framework, a powerful, mature object persistence layer. Think of it as entity Enterprise Java Beans, except that it's a lot lighter in weight at runtime since everything happens in the same address space, along with about 5 years maturity over entity EJB's and very powerful relationship management.
This gives you the ability to let most of your developers works solely with the object model, and encapsulates things like calculating an age from a birth date and today's date or a full name from first name and last name without polluting the database with miscellanea. Only one or two or a specified small team of people need to be concerned with the design of the database and writing SQL. All of the other developers can just use the Enterprise Objects and not worry about it.
However, this brings me to another point -- all of the successful WebObjects projects that I have seen share one thing in common: a highly skilled architect who has a solid understanding of not only the development process but also how databases work. A good developer teamed with a good DBA is no substitute for one master artiste in this case, as the master can know how to create an object design that meshes precisely with the database to optimize performance and ease of development, as well as making it relatively easy to create the UI.
In fact, a sign of a well-run WebObjects project is that for the first week or two, most of the developers are sitting on their hands or taking long lunches or reading Slashdot. During this time, however, the senior person or two is taking the time to get the underlying data object model right. Only when that is solid does the manager unleash the rest of the troops.
--Paul
man, no wonder every enterprise are full of unintegrated kludgy silos. man no wonder why the universal data model is getting popular. cause they came up stuff to make sure you morons wouldnt fuck things up.
There is no reason A DBA needs any sort of interaction with developers during the design and implementation of a database youre right thats the data architects job. developers cant model crap. all U get is unitgrated kludgy silos. A data model represents youre business. would trust a dba more than a developer, but its not even the dbas job. No wonder big compnaies are beginnig to use len silverstons universal data model. everybody knows you morons cant build a good model, so they built one that anyone can use for any business. and no wonder they pay data warehouse guys a ton o money cleaning out youre shit.
A database is just storage for applications written by programmers. Don't make database administration out to be something difficult. Most programmers know at least a half dozen computer languages with SQL being just one of them. It is really easy for any programmer to optimize database queries and transactions. It's just common sense.
I find that testing "code with side-effects" (ie: database inserts) is the hardest type of code to test, and I haven't yet found a solution that satisfies me.
On the eXtreme Programming mailing list, there's been a lot of discussion about how to deal with databases - some deny the need for databases altogether, some advocate using Mock Objects for testing and even development etc.
As the XP mailing list talks about, you can always create mock objects and test their state, etc, but it's still not quite legitimate. You end up building these massive meta-models that themselves might have issues. Perhaps the best solution I found was to have a "test" instance of your database that would always contain an appropriate seed of test data. If you keep your "side-effect code" and "test the side-effect code" inside of a transaction that you roll-back, you're pretty well off. Unit tests can start taking a long time, though.
Things get worse when you look at Ron Jeffries' Adventures in C# where he starts on the slippery slope of re-implementing the textbox class as a mock object.
Not much of a point here, more of a "me, too" response.
It all goes downhill from first post
The "Silver Bullet" Kitty is referring to here is the seminal work No Silver Bullet - Essence and Accidents of Software Engineering written by Frederick P. Brooks. If you are a "young" developer then reading and understanding this paper will start you on the road to being an "old" developer.
.NET will probably not reduce your "coding" time anywhere near 50%.
One of the main points is that "coding" the application is actually a small percentage of the overall time and cost of delivering non-trivial applications. Because of this, improvements in the software development process that are directed towards the "coding" aspect will necessarily be somewhat inconsequential - much to the chagrin of tool vendors and the IS managers that willingly believe their vendorspeak.
The much much larger part of your budget and flow-time will be consumed by requirements analysis and system design (leaving out maintenance for now.) So even if some new tool reduces your "coding" time by 50% (fat chance), 50% of - say - 20% is still not THAT big of a deal. And lets face it... switching from ASP/VB6 to
To have a dramatic improvement, you need to go after the "big rocks" first....requirements analysis and system architecture/design.
BTW, Extreme Programming is not about skipping requirements definition. It's about doing meaningful and efficient requirements definition which they believe can best be accomplished by delivering production quality iterations on development projects using evolutionary rapid prototyping (see Structured Rapid Prototyping by John L. Connell and Linda Brice Shafer.)
I find it interesting that with all the work that has been published on XP, that they only NOW are addressing the issues of incorporating database issues into the methodology. And make no mistake about it, if they can not incorporate databases in the way that the article describes - well that pretty much hurts it as a viable methodology for most data intensive business applications. Now, assuming that the inventors of XP intended their methodology to include this most important class of application, it seems almost foolhardy to have gone this far without addressing this database issue much sooner. Hey - but that's what they get for not doing their requirements first!
I want to be alone with the sandwich
I wonder what it will look like in 100 million years.
Patent: from Latin patere, to be open
A different solution that I am experimenting with is the notion of treating the database as another class.
Using this paradigm the database tables are the equivalent of private fields/methods and a combination of views and stored procedures are equivalent to public fields and methods. In this way the DBA is able to update his implementation of the class i.e. the table structure, while maintaining a consistent interface for users of his class through the views and stored procedures.
The problems experienced with typical applications where developers interact directly with the database tables are exactly the type of problems the OO paradigm is designed to prevent.
And a fifth is that once the database exists, no matter how much the original designers warn against it, people start using the database as an integration layer. Suddenly 14 different apps are munging the same data, making it impossible to change the schema, and nearly as hard to track down a bug. The whole point of OO programming is that data should always be wrapped by the code that goes with it.
I am curious how OO philosophy is supposed to handle this. I am not criticizing OO here (I'll save that for elsewhere), I just want to make sense of OO doctrine. The idea of a database is that *multiple* applications, languages, and paradigms can share the same data without mass copying. (This is a good thing, no?) But OO wants to wrap the data or state within a single language interface. If that is done, then how is the data shared with multiple applications, languages, and paradigms?
Also, what kind of 'bug' are you referring to? A bug in the RDBMS engine, or a shop app? Setting up app-level logins and/or transaction logging can help trace who the writer is. Although it might slow things down while on.
Table-ized A.I.