Strategies for Test Databases?

← Back to Stories (view on slashdot.org)

Strategies for Test Databases?

Posted by ryuzaki0 on Tuesday September 19, 2006 @11:10PM from the cats-and-dogs-working-together dept.

youngcfan asks: "I've been tasked with finding strategies for a test database that can be used effectively by both software developers and the QA team. We're a J2EE shop with most of the interesting pieces of the application interacting heavily with the database -- so we need to test it. We're ramping up on JUnit, but are looking for ways to test the database-driven pieces of code. Since QA needs the same database for functional testing separate from developers' unit tests, DBUnit doesn't seem to suffice. We also have the challenge of working on multiple releases at the same time, which only complicates how and when to add new data to the test database in a way that's useful and valid for everyone. We're looking for strategies for using a test database in a way that meets both the QA's and the developers' needs, works for multiple releases, and isn't a heavy burden to maintain given that the schema and code can change anytime before any of the multiple upcoming releases. Any suggestions?"

13 of 66 comments (clear)

Min score:

Reason:

Sort:

Why use only one DB? by Dr.+Hok · 2006-09-19 23:26 · Score: 4, Insightful

Why do you insist on using one DB for both developers and QA? They have different test scopes, so they should use different DBs. It's like using an axe to both chop wood and cut fingernails.
You'll find it much easier to create dedicated DBs for each test scope.

--
Say out loud: I'm an Aspie and I'm somewhat proud, I guess. Uh. Can I write an email in all caps instead? Hm...
1. Re:Why use only one DB? by djbckr · 2006-09-20 01:31 · Score: 5, Insightful
  
  As the parent eludes to, the only way to do it The Right Way (tm) is to have a Development environment, a QA environment, and a Production system.
  
  Each of these systems should be using the same architecture when it comes to hardware and configuration.
  
  The Development system is always in a state of flux, as its name implies.
  
  The QA system should *at least* approximate (if not be identical to) the data and load of the production system, and it should be treated like a production system that QA tries to break.
  
  It is only in this fashion that you will be able to test and make sure your system will work as expected. Leave nothing to chance. Expensive, yes. But it's less expensive than a downed production system, and definitely less expensive than building a complete system and realising it doesn't perform as expected.
Perfect world enviroment by techpawn · 2006-09-20 00:02 · Score: 2, Insightful

You'd have 3 servers as close in configuration as possible. One houses your production enviroment DB and the other houses your test and one for the QA enviroment. You can get away with QA and TEST in the same server but you REALLY don't want a devoloper to crash the test box or bog it down with a bad query when they're doing QA.

--
Ask not what you can do for your country. Ask what your country did to you
Re:Use real data, not test data by theonetruekeebler · 2006-09-20 00:09 · Score: 2, Insightful

In some cases a developer can't or shouldn't have access to production data. Our production data contains confidential client information -- including information about our own employees. There are federal laws in place regarding access to it, and our developers and QA people must not have unfettered access to it, and it should never be placed on a system that is not access-restricted with the utmost diligence and paranoia.
We do take a QA snapshot of the production server about once a week. Its confidential information gets stripped and obfuscated in a hundred different ways before it's brought online. It's good for testing new code, and for some debugging, but often it's useless for reproducing a specific client's problem. If a developer or QA needs to look at a particular client's data, he first gets the customer's permission. Then he submits a logged request for it (CC'd to the customer). Then he gets a tiny instance of his own, which will be taken down in 8 business hours unless he re-subscribes to it. We have tools and scripts that automate a lot of the process -- test instances usually come up within ten minutes for smaller clients.

--
This is not my sandwich.
Doesn't Suffice? by Aladrin · 2006-09-20 00:09 · Score: 3, Insightful

DBUnit doesn't suffice? What's it missing? It's only function is to place the database into a known state before the test, to make sure the data is correct before you test with it. How can that not do what you want?

It also occurs to me that if you can't even decide what data is 'useful and valid to everyone' then your test data is nothing like the live data you will have. Here's my suggestion: If it seems like it'll be even slightly relevant to anyone, use it. Otherwise you aren't testing everything.

The constantly changing schema is puzzling also. Did you not plan your database beforehand? I'm guessing this is an XP shop then, eh? XP doesn't stand for 'no planning'. I can understand changes to the schema in the early stages of programming, but if you're getting close to 'multiple releases' then the schema should be pretty solid by now, and the little changes needed to make to DBUnit shouldn't be a big bother.

--
"If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
1. Re:Doesn't Suffice? by Mongoose+Disciple · 2006-09-20 00:46 · Score: 2, Insightful
  
  The constantly changing schema is puzzling also. Did you not plan your database beforehand? I'm guessing this is an XP shop then, eh? XP doesn't stand for 'no planning'. I can understand changes to the schema in the early stages of programming, but if you're getting close to 'multiple releases' then the schema should be pretty solid by now, and the little changes needed to make to DBUnit shouldn't be a big bother.
  
  In theory I'd agree with you, but in practice I've rarely worked on a project of significant size that didn't see DB changes (if small ones) damn near right up until release.
  
  Maybe one of the other developers didn't code or design his part of the database perfectly. Maybe the first few times you run against production-quality data, you discover a few special cases you missed that require an additional piece of data to be tracked. Maybe the DBAs introduce constraints late in the game that force you to add a field or refactor a table or two. Maybe your first real stress test shows you that, while your code is logically correct, it takes 4 minutes to execute a common operation which needs to happen in under 4 seconds, and fixing that requires getting at the data a different way. Maybe requirements change in a significant way two weeks from release, or a new business rule is introduced.
  
  It's never everything, but it's always something. We work in a world where deadlines are often set more by clients needs than the amount of time it would take to do something right. As long as that's true, there are going to be surprises that no design planned for. You can do a lot of things right to minimize it, but it never really goes away.
2. Re:Doesn't Suffice? by Aladrin · 2006-09-20 02:15 · Score: 2, Insightful
  
  On the other hand, if you've got to make changes to the schema, you really should not be upset about having to make changes to the tests that go with it... It's all part and parcel. I don't foresee a magic version of DBUnit that handles all that for you.
  
  --
  "If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
Do you have a DBA? by duffbeer703 · 2006-09-20 00:43 · Score: 2, Insightful

It sounds like you need someone intimately familiar with the database who is not a developer, but can do things like create scripts to build your schema and populate it with useful test data... this person is usually called a DBA.

DBAs are usually viewed by devs as complete assholes, because they scream and holler at devs who make gratuitous changes to schemas and stored procedures. But a good DBA will make your database issues go away.

--
Conformity is the jailer of freedom and enemy of growth. -JFK
proper design and planning? by Stigu · 2006-09-20 00:57 · Score: 2, Insightful

Ok, with a structured approach you can make testing walk in the park. First, listen to your costumer, what ar his needs? What does he wnt to do? Define input and output, and of course wht information needs to be stored, and what information can be tabulated. It's no use storing for example ge when you have a birthdate registered. Remove ALL information that can be derived. Make a paper drawing of the structure of your database. plan out the relations. Make sure to obey the CODD rules for design of a relational database. Don't go ovrboard with it though. Just remove the repeating groups (the reoccuring fields inside a table, if there are any) and draw up the general layout. Then proceed to define the content type (text field, integer, float, boolean,...) of each field and set up the test database. You allready have a paper blueprint, liturally so you can easily use an erasor and pencil to record any changes you make. If you've done your planning well, there won't be a need for changes. Now that you have a digital empty construct of the database, defined with the content types. you can then make masks on the input so that only the correct type of date, properly formatted (think of dates etc...) is sent to the database. Having done this you have now successfully created a proper test database. As mentioned earlier as a reply to this post, I agree, you should NEVER alow multiple user groups to access the database during testing time. Give every group it's own test database so the problems both in structure and usability can be recorded seperately from each user group. You get a much clearer view of the problems you might have this way. Better information makes for better sollutions. after the first round of testing compare notes of the different user groups problems, fix them, and go to the second round. Repeat as often as is nessecary. A good database is able to keep preforming forever. If the needs don't change a properly designed and tested databse will work till all our bones are in some history museum. And, provided it'sa relational database, changes in needs can usaully be complied with without having to redesign the entire database.
Re:Use real data, not test data by CastrTroy · 2006-09-20 01:14 · Score: 2, Insightful

I agree with this completely. For any sufficiently sized application, there's too many permutations of data for the developers to think up and make on their own. The only thing you're missing out on, which you probably do, is to create a set of scripts to clear or change any data that the devs or QA team shouldn't see. Confidentiality is an issue, but you should be able to identify the data and delete or change it accordingly. Also, devs probably have access to production data in some form or another anyway, they could even put back doors into the code so that they could access it later. So if you don't trust them to be working on production data, then you'd better have lots of checks and balanaces to make sure that they can't access it. Because a determined developer will be able to access the data, regardless of whether or not they will be working with it on a daily basis.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
For test databases... by Ramses0 · 2006-09-20 01:34 · Score: 3, Insightful

For (QA) test databases, it's generally not enough to just have a separate instance, you also need to support the following capabilities:

1- "Clone" whatever is most recent on production

2- Revert to "known good QA state" (ie: big red reset button)

3- Dump current state for later use.

You need to be able to clone so that ad-hoc testing can be run against production data w/o making production impact. This doesn't have to be live, but can be like a once-a-week/once-a-month activity, or rotate out a slave DB every once in a while, or have your DB people test your backups / etc.

You need the ability to revert to a known good state so that specific tests can be run and those can be more easily automated. Like: search "foo", 7 results found (not 6, not 8, not "it was 8 a few seconds ago but now it's 9 because there's a new result that was just added) ... the more confident QA is in the data, the more confident (and/or prone-to-automation) their can be.

The ability to dump out DB state is a very distant third, but can be helpful for post-testing analysis or being able to modify a particular DB snapshot to fit some particular testing needs and then dump that out to the file-system for later use.

QA is hard, thank you for trying to make it easier.

--Robert
DB for Unit Testing? by Slashdot+Parent · 2006-09-20 05:08 · Score: 3, Insightful
A couple of points.
1. Typically, the term Unit Testing refers to the testing of a single, fine-grained unit of code. In other words, to do your true Unit Tests, you should not be accessing any database.
2. The question that I think you are asking, is "How do I get databases initialized with the correct schema and correct data for integration testing?" The answer is, as always, "It depends."
The two biggest factors for creating useful test environments are: "How often does your schema change?", and "How much data do you need in your database for meaningful test cases?"
Schema Changes: As a J2EE architect, the first time I saw Ruby on Rails' database migrations my first impulse was to wonder, "Why the !@#$ is this not in Hibernate?" I am not aware of any slick framework for J2EE apps to manage DB migrations, so you may have to use your own migration scripts. Hopefully, your schema is not changing much.
Getting Data In There: This totally depends on how much data you need. My "favorite" reply to you was to have one snapshot of your production data per developer. That works great, as long as you don't have much data. My last project had I don't even remember how many terrabytes of data in prod. Do you really think the client was going to spring for that much storage and that many Oracle licenses to get one instance per developer? Yeah right. We had a full snapshot for performance testing, but regular integration testing was done on a representative subset of data.
DBUnit is a great way to initialize a small amount of data. For larger datasets, you cannot get away with things like DBUnit, as it would take hours, if not days, to get the data in there. For our performance testing databases, we had the prod data snapshot stored on a RAID-1. Before testing started, we broke the mirror and did testing against the degraded array. When it came time to reset the data, we shut down Oracle and rebuilt the array to the good snapshot. That wound up being very fast for us. For medium amounts of data, you could probably get away with using SQL*Loader.
--
They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock
Functional DB testing versus unit-testing... by phamlen · 2006-09-20 05:48 · Score: 2, Insightful

One approach that has worked for me in the past is the "backup and recover" approach. Basically, it works like this:
1) You maintain a canonical "test" database (or multiple ones). This database has the same functionality as the production database but generally contains much less data. No one touches this database unless they need to permanently modify the test data. After each release, you make a backup of the database and release that backup to everyone who needs a test database. They restore it to their own environment.
2) You always write changes to the database as scripts so that you can run them against your test database and your production database. Your release process has to change to include running any database modification scripts on the canonical test database as well as the production database. This ensures that your new test database matches the production database for that release.
3) You need to modify your test process so that it runs a database restore at the appropriate points. In our case, we always restore before QA functional tests (because they leave the database in an altered state) but we don't restore for unit-tests (because we insist they leave the database in the same state they started.)

The advantages to this approach is that everyone has a copy of an actual database and you get to see all the funkiness of your real environment. The downside is that you have to be very disciplined in keeping the backups for all releases, and for running modification scripts against both the test and production databases appropriately.

-Peter