Strategies for Test Databases?
youngcfan asks: "I've been tasked with finding strategies for a test database that can be used effectively by both software developers and the QA team. We're a J2EE shop with most of the interesting pieces of the application interacting heavily with the database -- so we need to test it. We're ramping up on JUnit, but are looking for ways to test the database-driven pieces of code. Since QA needs the same database for functional testing separate from developers' unit tests, DBUnit doesn't seem to suffice. We also have the challenge of working on multiple releases at the same time, which only complicates how and when to add new data to the test database in a way that's useful and valid for everyone. We're looking for strategies for using a test database in a way that meets both the QA's and the developers' needs, works for multiple releases, and isn't a heavy burden to maintain given that the schema and code can change anytime before any of the multiple upcoming releases. Any suggestions?"
You'll find it much easier to create dedicated DBs for each test scope.
Say out loud: I'm an Aspie and I'm somewhat proud, I guess. Uh. Can I write an email in all caps instead? Hm...
DBUnit doesn't suffice? What's it missing? It's only function is to place the database into a known state before the test, to make sure the data is correct before you test with it. How can that not do what you want?
It also occurs to me that if you can't even decide what data is 'useful and valid to everyone' then your test data is nothing like the live data you will have. Here's my suggestion: If it seems like it'll be even slightly relevant to anyone, use it. Otherwise you aren't testing everything.
The constantly changing schema is puzzling also. Did you not plan your database beforehand? I'm guessing this is an XP shop then, eh? XP doesn't stand for 'no planning'. I can understand changes to the schema in the early stages of programming, but if you're getting close to 'multiple releases' then the schema should be pretty solid by now, and the little changes needed to make to DBUnit shouldn't be a big bother.
"If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
For (QA) test databases, it's generally not enough to just have a separate instance, you also need to support the following capabilities:
... the more confident QA is in the data, the more confident (and/or prone-to-automation) their can be.
1- "Clone" whatever is most recent on production
2- Revert to "known good QA state" (ie: big red reset button)
3- Dump current state for later use.
You need to be able to clone so that ad-hoc testing can be run against production data w/o making production impact. This doesn't have to be live, but can be like a once-a-week/once-a-month activity, or rotate out a slave DB every once in a while, or have your DB people test your backups / etc.
You need the ability to revert to a known good state so that specific tests can be run and those can be more easily automated. Like: search "foo", 7 results found (not 6, not 8, not "it was 8 a few seconds ago but now it's 9 because there's a new result that was just added)
The ability to dump out DB state is a very distant third, but can be helpful for post-testing analysis or being able to modify a particular DB snapshot to fit some particular testing needs and then dump that out to the file-system for later use.
QA is hard, thank you for trying to make it easier.
--Robert
- Typically, the term Unit Testing refers to the testing of a single, fine-grained unit of code. In other words, to do your true Unit Tests, you should not be accessing any database.
- The question that I think you are asking, is "How do I get databases initialized with the correct schema and correct data for integration testing?" The answer is, as always, "It depends."
The two biggest factors for creating useful test environments are: "How often does your schema change?", and "How much data do you need in your database for meaningful test cases?"Schema Changes: As a J2EE architect, the first time I saw Ruby on Rails' database migrations my first impulse was to wonder, "Why the !@#$ is this not in Hibernate?" I am not aware of any slick framework for J2EE apps to manage DB migrations, so you may have to use your own migration scripts. Hopefully, your schema is not changing much.
Getting Data In There: This totally depends on how much data you need. My "favorite" reply to you was to have one snapshot of your production data per developer. That works great, as long as you don't have much data. My last project had I don't even remember how many terrabytes of data in prod. Do you really think the client was going to spring for that much storage and that many Oracle licenses to get one instance per developer? Yeah right. We had a full snapshot for performance testing, but regular integration testing was done on a representative subset of data.
DBUnit is a great way to initialize a small amount of data. For larger datasets, you cannot get away with things like DBUnit, as it would take hours, if not days, to get the data in there. For our performance testing databases, we had the prod data snapshot stored on a RAID-1. Before testing started, we broke the mirror and did testing against the degraded array. When it came time to reset the data, we shut down Oracle and rebuilt the array to the good snapshot. That wound up being very fast for us. For medium amounts of data, you could probably get away with using SQL*Loader.
They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock