Strategies for Test Databases?

Why use only one DB? by Dr.+Hok · 2006-09-19 23:26 · Score: 4, Insightful

Why do you insist on using one DB for both developers and QA? They have different test scopes, so they should use different DBs. It's like using an axe to both chop wood and cut fingernails.

You'll find it much easier to create dedicated DBs for each test scope.

--
Say out loud: I'm an Aspie and I'm somewhat proud, I guess. Uh. Can I write an email in all caps instead? Hm...

Re:Why use only one DB? by Dr.+Hok · 2006-09-19 23:37 · Score: 2, Funny

It's like using an axe to both chop wood and cut fingernails.
Simultaneously, I should add.

--
Say out loud: I'm an Aspie and I'm somewhat proud, I guess. Uh. Can I write an email in all caps instead? Hm...
Re:Why use only one DB? by kfg · 2006-09-19 23:47 · Score: 3, Funny

Ok, maybe it didn't turn out to be such a hot idea, but as side effect it is easier for me to compute in base 8 now.

KFG
Re:Why use only one DB? by djbckr · 2006-09-20 01:31 · Score: 5, Insightful

As the parent eludes to, the only way to do it The Right Way (tm) is to have a Development environment, a QA environment, and a Production system.

Each of these systems should be using the same architecture when it comes to hardware and configuration.

The Development system is always in a state of flux, as its name implies.

The QA system should *at least* approximate (if not be identical to) the data and load of the production system, and it should be treated like a production system that QA tries to break.

It is only in this fashion that you will be able to test and make sure your system will work as expected. Leave nothing to chance. Expensive, yes. But it's less expensive than a downed production system, and definitely less expensive than building a complete system and realising it doesn't perform as expected.
Re:Why use only one DB? by Mycroft_514 · 2006-09-20 02:29 · Score: 1

You are right as far as you go. With larger companies, you need multiple copies of each. One shop I was in had 1 Prod environment, 5 QA environments, and a seperate test environment on each developers PC.

Another had 1 P, 2 QA and 2 Test.

My current company has 2 Prod (one is a daily clone for reporting from ), 1 QA and 2 test. And QA and test may have duplicate tables at any time of the normal tables, due to special testing.

Now if you will excuse me, The space manager just mounted that new pack for me, I have to go lay down a temp table in both test and QA, (3.2 GB each).
Re:Why use only one DB? by gosand · 2006-09-20 09:03 · Score: 1

As the parent eludes to, the only way to do it The Right Way (tm) is to have a Development environment, a QA environment, and a Production system.

Each of these systems should be using the same architecture when it comes to hardware and configuration.

The Development system is always in a state of flux, as its name implies.

The QA system should *at least* approximate (if not be identical to) the data and load of the production system, and it should be treated like a production system that QA tries to break.

Well, depending on how your release schedule works, you may want a separate system that mirrors production. Using the QA system for that will only let them test against the production-like environment. If you need to test anything new, you run the risk of making your production-like system less like production. You may want to try to reproduce a production issue, and tainting your QA system may not allow you to do that. BTDT. Again, it depends on your release approach.

Full dumps and restores are your friend. You might be able to get away with one QA system, but if it were me, I wouldn't recommend it.

(And to be the anal-retentive person that I am, QA *technically* refers to quality process management, and not testing, which would be QC. I know, I know - everyone refers to 'testing' as 'QA'. Just don't say you are going to 'QA' something, that really makes no sense.)

--

My beliefs do not require that you agree with them.
Re:Why use only one DB? by angel'o'sphere · 2006-09-20 15:50 · Score: 2, Interesting

Oh my god ....

You are nearly as wrong as your parrent!

1st: the QA system very likely won't be the production system, but the production system running in future.
2nd: DEFINETELY the development system is the same like the QA system. And no: it is not in flux!!! It is reset after each developer test, or developer access to it, either by erasing it and using a back up or by "roll back" of all transactions (that likely is not possible).

How the hell should a developer figure if his actual "attempt of a new working piece of code" failes because "his DB is in flux" or if he has a programming error? What is if it does not fail because his work is in flux, but the QA system later says: you deleiverd defect code?

If developers have the feeling "they need to set up" somethign before they can use the "rolled back" DB then the QA system needs very likely the same "set up". Note: the QA system wants to be as close as possibel on the future deployment system, so wants teh development system.

Rule of thumb:
a) production system - the system your old code runs on and the system your new code wil be deployed on
b) QA system - the system that includes a "good" stand of the production system and the anticipated meaningfull defaults for a new system in future
c) the development system - the system that is similiar to the QA sysem but has even more, the test environments for developers, probably via a mandator/client approach, this one should very easy and fast be able to fall back on an old revision.

However, to answer the core question of the poster: your test cases should abstract away the data base and use mock ups for scenario tests.
The goal should be to write for every QA test a script/test like you would for unit tests. That test can run completely without a DB if your system architecture is sound, that means if you have a class/object responsible for accessing every external resource. So you can e.g. use a flat file (property file or XML or what ever) to mimic a DB, by simply having a production version of that class and a test version.

angel'o'sphere

--
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
Re:Why use only one DB? by DuckDodgers · 2006-09-21 02:10 · Score: 1

One step further:
2 QA Systems. One for testing the next release, one set up identically to the production server so QA can reproduce problems found with the production software.

The company I'm at now is extremely small. I constitute 50% of the software engineers and 33% of the entire IT department. We have a production setup, 2 developer setups, and a QA server that we've configured to rapidly switch between our production software and release software. The transition takes about 15 minutes.
Re:Why use only one DB? by Anonymous Coward · 2006-09-21 02:49 · Score: 0

The goal should be to write for every QA test a script/test like you would for unit tests. That test can run completely without a DB if your system architecture is sound, that means if you have a class/object responsible for accessing every external resource. So you can e.g. use a flat file (property file or XML or what ever) to mimic a DB, by simply having a production version of that class and a test version.

When/how do you prove that the flat file class and the database class are equivalent? When/how to you integrate the fact that flat files and databases have different failure modes, and thus different end user error messages?
Re:Why use only one DB? by Anonymous Coward · 2006-09-21 04:03 · Score: 0

The QA system should *at least* approximate (if not be identical to) the data and load of the production system, and it should be treated like a production system that QA tries to break.
If, as you suggest, you only have one QA system, how in the world could it always approximate the production system? As soon as you have a significant new release to test that includes data changes, you will load it in QA and then QA probably will differ greatly from production.
Re:Why use only one DB? by angel'o'sphere · 2006-09-21 14:17 · Score: 1

When/how do you prove that the flat file class and the database class are equivalent? When/how to you integrate the fact that flat files and databases have different failure modes, and thus different end user error messages?

When the flat file gives a "file not found error" the test environment is not set up correctly.
The flat file class and the data base class don't need to be equivalent. They are customized for every test case, so they are only equivalent in regard to that special case.
End user messages are something that should come from application logic, not from the technology used. E.g. if you work with C++/C#/Java most technology is shielded from the application by an API, e.g. JDBC in case of Java DB access. That API throws technology specific exceptions, like java.sql.Exception ... "SQL Error (oh that is a stupid one ;D )". The DB access logic has to transform all errors, that make sense into business logic errors (thats not allways possible, e.g. if a file is ot found because you had a crash, and played back in some back ups, but one file is missing, you still get only a "file not found error". You can wrap all technical erros (like DB acces, Network or File System) into a specific exception class for every module. E.g. you have a "Customers Module" and a "Billing module" (well, both would be components in new speak or SOA services, probably).

So the Customer Component/Module has a base exception class "CustomerException" from that class you derive all exceptions for that module, like "CustomerNotFound", "IllegalZipCode" and: "CustomerTechnicalError". The same you do for the Billing Component, base class is "BillingException" then e.g. "NegativeBillExeption" and finaly "BillingTechnicalError".

All technical errors/exceptions failures (if you don't find a modified but similar approach that suits you better) are wrapped in CustomerTechnicalError and BillingTechnicalError exceptions, depending of source. Probably both are derived from a common base class or implement the comon interface: TechnialException, so you can write your exception handlers accordingly.

Now your unit test code and your integration test code only handels "TechnicalExceptions", after all your test code also has to take into account that those exceptions will occure!!! So you will have test cases that simple have one line: throw new CustomerTechnicalException("File not found: this is a test");

angel'o'sphere

--
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
Re:Why use only one DB? by Fulcrum+of+Evil · 2006-09-22 20:57 · Score: 1

And no: it is not in flux!!! It is reset after each developer test, or developer access to it, either by erasing it and using a back up or by "roll back" of all transactions (that likely is not possible).

How the hell do you get anything done? That would spell disaster where I work - 1000 developers split into about 80 teams, more or less, communicating via service interfaces, all using separate databases. Devo is Devo so you can trash it and nobody cares - this makes you design stuff that is resiliant in the face of cruft lying around, which helps prod too, since it's not always perfect. Just design your use cases such that the data from one run to the next is mostly independent.

That test can run completely without a DB if your system architecture is sound, that means if you have a class/object responsible for accessing every external resource.

Nope, you'll never find scaling problems that way. You can verify correct behavior, but good luck bolting sql onto a flat file. Better to use a blessed schema + dataset for DB testing.

--
"We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"

Test databases by Anonymous Coward · 2006-09-19 23:38 · Score: 2, Informative

Oracle, Sybase and MySQL can all be used as test databases.

Perhaps you really want to know how to test code that uses databases, which is a different question

There are many refactorings that can be done to reduce your dependency on a particular database install...but thats a rather large topic. I'm available for consultancy, post here and I can get in touch...

Some things you might like to consider

Per-developer databases (obviously using automated schema building/destruction
Dependency-injection of non-database-using data-access-layers for testing
Mocks

Use real data, not test data by SMQ · 2006-09-19 23:42 · Score: 3, Informative

Test data sucks: there are too many real-world situations the developers fail to think of.

We're a pretty small shop, but here's what we do: The production server backup is loaded to the test server daily. Every developer maintains a set of scripts which make any needed databae structure modifications after the backup has loaded. All development and QA testing is done against this test database. Where the production data isn't stable enough for unit testing we force-feed a few specific rows (as few as possible). This gives us fresh, real-world data for development and testing, and when an application rolls out, the exact same set of modification scripts are usually run on the production server (i.e. the modification scripts have been indirectly but repeatedly tested themselves).

--
SMQ 90AE4B2BC4F6BEAF7340F0B40BA2DEF7340F6BC2D0392

Re:Use real data, not test data by Anonymous Coward · 2006-09-19 23:50 · Score: 1, Interesting

A problem we had in a shop I used to work for was the the production dataset was huge. There were some plans to try and take subsets of data... but the schema was quite large and complex - making it a pain to keep integrity (which is crucial for performing tests against). In the end, we ended up doing a big refresh of test every few months. This was for user acceptance testing. The developers box got updated even less often - and as you can imagine - this caused huge problems (developers were expected to update the data and stored procedures needed for their portion of testing). Some of us just snuck our tests onto the acceptance box to get work done.
Re:Use real data, not test data by theonetruekeebler · 2006-09-20 00:09 · Score: 2, Insightful

In some cases a developer can't or shouldn't have access to production data. Our production data contains confidential client information -- including information about our own employees. There are federal laws in place regarding access to it, and our developers and QA people must not have unfettered access to it, and it should never be placed on a system that is not access-restricted with the utmost diligence and paranoia.
We do take a QA snapshot of the production server about once a week. Its confidential information gets stripped and obfuscated in a hundred different ways before it's brought online. It's good for testing new code, and for some debugging, but often it's useless for reproducing a specific client's problem. If a developer or QA needs to look at a particular client's data, he first gets the customer's permission. Then he submits a logged request for it (CC'd to the customer). Then he gets a tiny instance of his own, which will be taken down in 8 business hours unless he re-subscribes to it. We have tools and scripts that automate a lot of the process -- test instances usually come up within ten minutes for smaller clients.

--
This is not my sandwich.
Re:Use real data, not test data by budgenator · 2006-09-20 00:28 · Score: 1

Additionaly live production data had better be good data, but for testing and QA you'll want some bad data; how well the bad data is handled is important for a robust system.

--
Apocalypse Cancelled, Sorry, No Ticket Refunds
Re:Use real data, not test data by theonetruekeebler · 2006-09-20 00:47 · Score: 1

Good point. We do have test clients and such for regression testing that get merged into the QA database during the weekly munge.

--
This is not my sandwich.
Re:Use real data, not test data by CastrTroy · 2006-09-20 01:14 · Score: 2, Insightful

I agree with this completely. For any sufficiently sized application, there's too many permutations of data for the developers to think up and make on their own. The only thing you're missing out on, which you probably do, is to create a set of scripts to clear or change any data that the devs or QA team shouldn't see. Confidentiality is an issue, but you should be able to identify the data and delete or change it accordingly. Also, devs probably have access to production data in some form or another anyway, they could even put back doors into the code so that they could access it later. So if you don't trust them to be working on production data, then you'd better have lots of checks and balanaces to make sure that they can't access it. Because a determined developer will be able to access the data, regardless of whether or not they will be working with it on a daily basis.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Re:Use real data, not test data by SpaceLifeForm · 2006-09-20 05:51 · Score: 1

No. You should never have bad data in the database to start with. If you manually put bad data into a DB, you are of course going to be running into problems that should never exist.
If you have to test code for handling bad data in the DB, then you are not testing the code that should be properly validating the data *before* it is inserted into the DB.

--
You are being MICROattacked, from various angles, in a SOFT manner.
Re:Use real data, not test data by budgenator · 2006-09-20 10:37 · Score: 1

good point SpaceLifeForm, does your planet except Earthling immigrants? On my world, just before I send in the bug report to the language developers about the bug I've been ripping my hair out about for two days, the thought occures to me to double check the test data and sure enough the program is doing exactly what it should be with the data it's getting.

--
Apocalypse Cancelled, Sorry, No Ticket Refunds
Re:Use real data, not test data by angel'o'sphere · 2006-09-20 15:56 · Score: 1

Test data sucks: there are too many real-world situations the developers fail to think of.

It's not the developers business to define test data, but the business of either the business analyst, or the test engineer in cooperation with the business analyst.

Sure, lots of business cases are so simple a developer could define the test case. But, if you have a contract with a customer to develop something ... who should define wether you get payed, wether you did right? You or the customer?

angel'o'sphere

--
Cost free eBook I read (by iBook/Kobo/Amazon/ObookO/Gutenberg etc.): "The Green Odyssey" by Philip Jose Farmer.
Re:Use real data, not test data by Fulcrum+of+Evil · 2006-09-22 20:59 · Score: 1

o. You should never have bad data in the database to start with. If you manually put bad data into a DB, you are of course going to be running into problems that should never exist.

Yes, because that never happens.
/furiously rolls eyes

--
"We returned the General to El Salvador, or maybe Guatemala, it's difficult to tell from 10,000 feet"
Re:Use real data, not test data by cavac · 2006-09-23 08:25 · Score: 1

Actually, for developing a reliable system, at least one developer MUST use real data. Unless you're ready to send him to the production database after releasing the software to fix some unexpected problems.

But if you can't trust your techs, devs and sysadmins to handle sensitive data, then how are you expecting them to fix a problem on a production system?

While i do most developing and testing on test data (to simplify backup, restore und bugtracking), i *always* use a backup of the real database for final testing. This is especially important if you made changed in the database layout and you need to upgrade the production database during the next release.

BTW: While you're testing your upgrade procedure, take a note of the time reuired to do it (hence the real data). The companies planners might need this info for scheduling the upgrade; especially on systems that normally have a 24/7 workload to handle...!

--
Look, this thing is totally safe! Built it myself, you know. You just press that button like this and then turn that lev

One Commercial Solution by Anonymous Coward · 2006-09-19 23:52 · Score: 0

http://www.quest.com/benchmark_factory/

It will help create, manage and then run tests against the Databases...

which goals ? by rgucci · 2006-09-19 23:59 · Score: 1

If you have to work on UNIT test (or single developer test) there are a lot of tools but if you are talking about SIZING, TUNING and so on you cannot reach your goals without using complex tools and working with more RDBMS. In the last 4 years I worked and designed testing processes on J2EE and without "high levels" tool we cannot understand when the probs are on the java code, on IO SW&HW subsystems, RDBMS or concurrency on classes or table rows. You have to develop testing code for specific goals. We spend less time using commercial tools (like BMC, Quest and Mercury for J2EE and web users simulation), we develop the 4th generation of our test processes and the last 2 times, using the commercial tools, we take half time on update processes and testing times.

Perfect world enviroment by techpawn · 2006-09-20 00:02 · Score: 2, Insightful

You'd have 3 servers as close in configuration as possible. One houses your production enviroment DB and the other houses your test and one for the QA enviroment. You can get away with QA and TEST in the same server but you REALLY don't want a devoloper to crash the test box or bog it down with a bad query when they're doing QA.

--
Ask not what you can do for your country. Ask what your country did to you

Re:Perfect world enviroment by beacher · 2006-09-20 00:49 · Score: 1

That's what we do in our shop although we do a few tweaks as well. On the weekends, we nuke our test and dev environments and then copy production back to test and dev. We then apply all outstanding data & ddl logs to test and dev in order to get the database back to where it should be.

Developers have DBA rights on Dev and are locked out of our Prod instances. Developers script all changes so that their work can be reapplied with the same results on every instance. We also log object changes so we can quickly identify which version of DDL was applied to each object. It's a lot of overhead, but we're a large shop and have dedicated DBAs.

It's a really good way to keep the instances managed. The developers know that they can do their dangerous work on Fridays because if they make a serious mistake, it'll be clean on Monday.
Re:Perfect world enviroment by LarsWestergren · 2006-09-20 00:57 · Score: 2, Interesting

You'd have 3 servers as close in configuration as possible. One houses your production enviroment DB and the other houses your test and one for the QA enviroment. You can get away with QA and TEST in the same server but you REALLY don't want a devoloper to crash the test box or bog it down with a bad query when they're doing QA.

Seconded. I'm on a project right now where we (the programmers) have finally gotten management to allocate time for us so we can get going on doing more unit testing, integration testing and generally cleaning up the code.

We have had a few incedents where a bug caused bad data to be inserted into the database. The bug was solved, but the data remained and caused strange behaviour. I am currently (reluctantly :-) learning Ruby so I can write a script that empties database tables and inserts fresh test data. This script is started by cron every night and the JUnit tests and integration tests are then run automatically. The QA team can then do their manual test during the day on that server.

We also have one server that all developers are running their daily code against, and one intermediate server where we do test deploys before stuff is delivered for QA testing.

I can also recommend you to take a look at Apache Derby which is included in JDK6. It is small, fast, and you don't have to do a lot of setting up. A single line of code to open a jdbc connection and you are ready to go, perfect for testing.

--
Being bitter is drinking poison and hoping someone else will die

Doesn't Suffice? by Aladrin · 2006-09-20 00:09 · Score: 3, Insightful

DBUnit doesn't suffice? What's it missing? It's only function is to place the database into a known state before the test, to make sure the data is correct before you test with it. How can that not do what you want?

It also occurs to me that if you can't even decide what data is 'useful and valid to everyone' then your test data is nothing like the live data you will have. Here's my suggestion: If it seems like it'll be even slightly relevant to anyone, use it. Otherwise you aren't testing everything.

The constantly changing schema is puzzling also. Did you not plan your database beforehand? I'm guessing this is an XP shop then, eh? XP doesn't stand for 'no planning'. I can understand changes to the schema in the early stages of programming, but if you're getting close to 'multiple releases' then the schema should be pretty solid by now, and the little changes needed to make to DBUnit shouldn't be a big bother.

--
"If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM

Re:Doesn't Suffice? by Mongoose+Disciple · 2006-09-20 00:46 · Score: 2, Insightful

The constantly changing schema is puzzling also. Did you not plan your database beforehand? I'm guessing this is an XP shop then, eh? XP doesn't stand for 'no planning'. I can understand changes to the schema in the early stages of programming, but if you're getting close to 'multiple releases' then the schema should be pretty solid by now, and the little changes needed to make to DBUnit shouldn't be a big bother.

In theory I'd agree with you, but in practice I've rarely worked on a project of significant size that didn't see DB changes (if small ones) damn near right up until release.

Maybe one of the other developers didn't code or design his part of the database perfectly. Maybe the first few times you run against production-quality data, you discover a few special cases you missed that require an additional piece of data to be tracked. Maybe the DBAs introduce constraints late in the game that force you to add a field or refactor a table or two. Maybe your first real stress test shows you that, while your code is logically correct, it takes 4 minutes to execute a common operation which needs to happen in under 4 seconds, and fixing that requires getting at the data a different way. Maybe requirements change in a significant way two weeks from release, or a new business rule is introduced.

It's never everything, but it's always something. We work in a world where deadlines are often set more by clients needs than the amount of time it would take to do something right. As long as that's true, there are going to be surprises that no design planned for. You can do a lot of things right to minimize it, but it never really goes away.
Re:Doesn't Suffice? by Aladrin · 2006-09-20 02:15 · Score: 2, Insightful

On the other hand, if you've got to make changes to the schema, you really should not be upset about having to make changes to the tests that go with it... It's all part and parcel. I don't foresee a magic version of DBUnit that handles all that for you.

--
"If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
Re:Doesn't Suffice? by YomikoReadman · 2006-10-03 08:26 · Score: 1

As a DBA in a similar situation to what the OP seems to be in, I can sympathize with him. The problem on my project, however, isn't a lack of planning. The problem is that the customer can request requirements changes, and in order to ensure the software can do what the customer needs it to do, schema changes can be necessary.

As to the question of a way to test the DB, the use of a test system, or possibly even multiple test schemas is the correct way to accomplish this. If it's an issue with constructed test data not being inclusive enough, then use an export of your production system data in test. That way you can test for any and all instances without having to create data.

Overall, I agree with the other child to your post, but you make an excellent point concerning DBUnit. I don't know exactly what the OP was expecting from DBUnit, but reading the description of it, it seems like an excellent tool, when used correctly. I can think of a few projects I've been assigned to that would have seen great benefit from it.

--
I have no regrets, this is the only path.
My whole life has been "UNLIMITED BLADE WORKS"

here's a couple of things to consider by Anonymous Coward · 2006-09-20 00:32 · Score: 0

Ideally, when unit testing your code, you should touch the database as little as possible. One way to do this is by coding to interfaces, especially for your data access objects, and allowing your junits to override the implementations at runtime with mocked DAO's that don't touch the database. You could also try using an Inversion of Control framework (such as Spring, PicoContainer, etc.) to help decouple your DAO implementations from your business logic.

The downside to this is that, if your DAO code is not generated, you'll still need to unit test it. For this we use an in-memory java database such as HSQL or Derby. We build the database from scratch for each test run, using sql scripts to create the tables, views, PK & FK's, populate the data, etc. This ensures that we're always testing against the same data, and because it is in-memory, it is VERY fast. In development, we try to rely less on JWebUnit because it is not fast (relative to JUnit), but when we do use it we prefer to run our server against a local database rather than a shared database. We usually use HSQL for this as well.

Worth mentioning ... It is not desirable that our development database be a copy of production. Whenever we find a new bug that is data-state related, we re-create that data-state in our SQL and write a new test for it. For Test and QA, however, we do tend to use copies of production data.

"This word does not mean what you think it means" by coyote-san · 2006-09-20 00:39 · Score: 1

I'm confused by your statement. A single database server (Oracle, PostgreSQL, whatever) can hold many databases. You should definitely have two separate databases for each release (for developers and testing), and arguably a database for each developer for unit tests. It's a one-line change in your config files to switch from one database to another, hardly an onerous burden.

I guess some toys would only be able to handle a single database, but I can't imagine why anyone would use one when there are so many excellent free database servers.

(This is ignoring tools like Sleepy Cat DB since it's not something you would use in a J2EE context.)

--
For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken

Do you have a DBA? by duffbeer703 · 2006-09-20 00:43 · Score: 2, Insightful

It sounds like you need someone intimately familiar with the database who is not a developer, but can do things like create scripts to build your schema and populate it with useful test data... this person is usually called a DBA.

DBAs are usually viewed by devs as complete assholes, because they scream and holler at devs who make gratuitous changes to schemas and stored procedures. But a good DBA will make your database issues go away.

--
Conformity is the jailer of freedom and enemy of growth. -JFK

Not for unit tests by coyote-san · 2006-09-20 00:50 · Score: 1

Unit tests should be as minimal as possible. E.g., you might have a single record loaded to test the basic CRUD operations for a class.

Why? You can set up your JUnit failure method so it takes a snapshot of the database at the point of failure and mails it to you (as an XML attachment). This means you can run smoke tests nightly -- try doing that with a "complete" database that's been scribbled on by other tests and developers since the problem occurred.

--
For every complex problem there is an answer that is clear, simple, and wrong. -- H L Mencken

Re:Not for unit tests by Lordrashmi · 2006-09-24 06:48 · Score: 1

Sweet, 10GB XML file attached to my email. WOOO HOOO! :)

Snapshots by Ropati · 2006-09-20 00:51 · Score: 1

Consider getting storage that can provide data point-in-time copies (Snapshots). Use Snapshots of your production database for development. Using different Snapshots for different releases. If you don't like the changes, make a new Snapshot and rework the tables. You can also use Snapshots for upgrade testing.

You should use caution here. Moving your production data is never trivial. Snapshots are not free. Developement machines can load the point-in-time copy to the point where it could impact the production system. If your production load can't handle providing Snapshots, you should consider clones (Snapshots/Copies) of your database for development.

If your development application is on a virtual platform, (VMware in particular), you can do a Snapshot of your application (in VMware), test an upgrade and then rollback to the pre-upgrade OS. In this environment you might be able to script nightly baseline testing of builds.

--
machinator omnis sine licentia

My Test Dev by gregor_jk · 2006-09-20 00:53 · Score: 1

I make a copy of the production database with *real data*. I augument the copy of the production db with the new schema. I then merge the schema back into the production database when I am happy with the testing.

one word: virtualization! by StankDawg · 2006-09-20 00:57 · Score: 1

In your case, it sounds like a traditional test environment of seperate machines and multiple instances is not the way to go. I would suggest using a virtualization server like VMWare or MS Virtual server or other related software. what this allows you to do is get one environment set up and established, and then make an image of it. Then you can mount this image into a virtual environment where everyone can bang away at it and no matter how bad they destroy the database, all you have to do is mount the image again and you are back where you started from. you could also mount multiple images that each group (developers and QA testers) could have access to, if you do eventually go that route (and you will).

If you do make any changes that need to be kept, you will need to re-image the environment with those changes. you might want to do that every few months and keep several images available in several different states. However, now the developer team need to establish a policy for storing their code. If they hose the virtual environment, it is easy for YOU to mount a new image, but what about their code changes? They will need to have a policy on code storage and maintenance, but that is a different question altogether.

--
--- The revolution will be digitized! - http://www.binrev.com/ ---

proper design and planning? by Stigu · 2006-09-20 00:57 · Score: 2, Insightful

Ok, with a structured approach you can make testing walk in the park. First, listen to your costumer, what ar his needs? What does he wnt to do? Define input and output, and of course wht information needs to be stored, and what information can be tabulated. It's no use storing for example ge when you have a birthdate registered. Remove ALL information that can be derived. Make a paper drawing of the structure of your database. plan out the relations. Make sure to obey the CODD rules for design of a relational database. Don't go ovrboard with it though. Just remove the repeating groups (the reoccuring fields inside a table, if there are any) and draw up the general layout. Then proceed to define the content type (text field, integer, float, boolean,...) of each field and set up the test database. You allready have a paper blueprint, liturally so you can easily use an erasor and pencil to record any changes you make. If you've done your planning well, there won't be a need for changes. Now that you have a digital empty construct of the database, defined with the content types. you can then make masks on the input so that only the correct type of date, properly formatted (think of dates etc...) is sent to the database. Having done this you have now successfully created a proper test database. As mentioned earlier as a reply to this post, I agree, you should NEVER alow multiple user groups to access the database during testing time. Give every group it's own test database so the problems both in structure and usability can be recorded seperately from each user group. You get a much clearer view of the problems you might have this way. Better information makes for better sollutions. after the first round of testing compare notes of the different user groups problems, fix them, and go to the second round. Repeat as often as is nessecary. A good database is able to keep preforming forever. If the needs don't change a properly designed and tested databse will work till all our bones are in some history museum. And, provided it'sa relational database, changes in needs can usaully be complied with without having to redesign the entire database.

Re:proper design and planning? by Anonymous Coward · 2006-09-20 08:31 · Score: 0

Here's a carriage return:

They're cheap. Here's another:

Cheers.

Automated database-building scripts... by pdc · 2006-09-20 01:12 · Score: 1

We maintain an SQL script that creates the database or, when run in an existing database, upgrades any stored procedures that are out of date, alters tables, etc. This script (actually the smaller scripts it is assembled from) is checked in to Subversion like any other source code.

Our unit tests work at the C# level, not SQL (they test the objects implemented using the database, rather than the database itself). Most tests start by running the creation script to create a fresh database, do things to it, and then throw it away when the test is done. This way tests are isolated from each other and all the usual unit-testy requirements are met.

The down side is that tests that involve the database are a lot slower than unit tests that don't.

The installer (.msi file) uses the same script to install or upgrade the database on the production server. Testers work from their own copies of the database, completely isolated from whatever freakish mess we developers have perpetrated in our development databases.

For test databases... by Ramses0 · 2006-09-20 01:34 · Score: 3, Insightful

For (QA) test databases, it's generally not enough to just have a separate instance, you also need to support the following capabilities:

1- "Clone" whatever is most recent on production

2- Revert to "known good QA state" (ie: big red reset button)

3- Dump current state for later use.

You need to be able to clone so that ad-hoc testing can be run against production data w/o making production impact. This doesn't have to be live, but can be like a once-a-week/once-a-month activity, or rotate out a slave DB every once in a while, or have your DB people test your backups / etc.

You need the ability to revert to a known good state so that specific tests can be run and those can be more easily automated. Like: search "foo", 7 results found (not 6, not 8, not "it was 8 a few seconds ago but now it's 9 because there's a new result that was just added) ... the more confident QA is in the data, the more confident (and/or prone-to-automation) their can be.

The ability to dump out DB state is a very distant third, but can be helpful for post-testing analysis or being able to modify a particular DB snapshot to fit some particular testing needs and then dump that out to the file-system for later use.

QA is hard, thank you for trying to make it easier.

--Robert

HSQLDB for Unit Tests by Palshife · 2006-09-20 01:59 · Score: 1

For the scope of unit testing, try out HSQLDB. It's an in-memory database that you can connect to over JDBC, so even if you're using Hibernate or some other layered persistence engine you can simply switch your DataSource. If you're writing Java that follows the tenets of dependency injection, this is really straightforward.

Now, this can only really effectively test a few things, and generally, I find that it can only really be useful for exercising small operations, like individual DAO methods. This is actually where I'll shut up, since I've not yet found an effective method for testing inter-method and transaction based operations, but for unit testing HSQLDB has saved my life more times than I'd like to count.

--
Attention deficit disorder is a complicated issue, spanning several major... HEY LET'S GO RIDE BIKES!

Re:HSQLDB for Unit Tests by curunir · 2006-09-20 06:38 · Score: 1

I think the only time HSQL would make sense is if you are using a persistance layer like Hibernate (where you can just change the dialect during the test). Otherwise, the differences in SQL parsing mean that queries that run fine against Oracle, PostgreSQL or MySQL will either cause an error or just not work properly under HSQL. MySQL is particularly bad about relying on MySQLisms to get things done, but the other ones have their quirks too. So there really isn't a substitute for running the actual database queries against a test database running the software you'll be using in production. The closest you can get to an in-memory approximation of the production behavior is to test that the DAO is generating and expected SQL statement.

That's why we use a MockDataSource (most of the mock solutions provide one). Most of our DAO tests use a mock data source to validate that the DAO is generating the correct SQL. The nice part about this is that it's all done with Spring's IoC container. So while the configuration that developers run the test with most often can be a mock the DataSource, the continuous integration server (we use Hudson) uses an actual DataSource pointing to one of a few test databases we have (one is basically empty for testing CRUD operations, one has a lot of test data and another runs on a server with the same hardware setup as production that has a nightly production dump loaded). All this is determined on a per-test basis in the spring config file.

So, as a developer, when I run our DAO TestSuite, the whole thing takes less than a minute. After that passes, I can check in and let that trigger the test using a real database and get an email if something fails. This also has the added benefit that we can run tests against a production snapshot without ever giving developers unfettered access to that data (for security/privacy purposes, developers aren't allowed to see customer data).

--
"Don't blame me, I voted for Kodos!"

Embedded DB by Hard_Code · 2006-09-20 03:59 · Score: 1

Use an embedded (or at least small) database like McKoi or Apache Derby, have a script that defines the tables and some test data (which you can grab from a real test system). Then simply create the db once, and use the embedded jdbc url with your unit tests. Clear the database out, or destroy it before or after each unit test (you probably want to do it before each test, because there's no guarantee the last test exited cleanly). Ta da.

--

It's 10 PM. Do you know if you're un-American?

Point-in-time raw backups by toybuilder · 2006-09-20 04:41 · Score: 2, Informative

I also second the idea that developers and QA's normally should all have their own database running on separate servers.
Ideally, the developers and QA run against a smaller database that is (ideally) populated from scratch with a small dataset to speed development; and then for release testing use a much larger populated database or (if that's too difficult) a copy of the production database that has been appropriately scrubbed to get rid of confidential data.

The database offerings from the various major vendors allow you to "quiesce" the database which suspends new transactions, completes all pending transactions, and then ensure that all data and log are flushed to disk. Then, with the production system paused, take a hot point-in-time snapshot of the filesystem, effectively giving you a compelte database dump in a few seconds. (This requires a storage system that allows you to make snapshots -- NetApp's do this, for example.) Resume the database to let the production system continue, and then copy the snapshot of database files to another server and reconstruct a clone of the database.

Run the appropriate trimming/cleansing/schema update on the clone database, and then make a snapshot of THAT. You can then revert the database to a knowing starting point as you like. If your development requires schema changes, don't let developers make the schema changes directly -- insteead, insist on schema change DDL's to be scripted, and reapply the script to the snapshot at each refresh.

When doing the final release testing, get the latest snapshot of the production database, run the update scripts, and run the tests. If everything looks good, make another snapshot of the production database, and apply the updates to the production database.

Done right, you can always roll back the test

DB for Unit Testing? by Slashdot+Parent · 2006-09-20 05:08 · Score: 3, Insightful

A couple of points.

Typically, the term Unit Testing refers to the testing of a single, fine-grained unit of code. In other words, to do your true Unit Tests, you should not be accessing any database.
The question that I think you are asking, is "How do I get databases initialized with the correct schema and correct data for integration testing?" The answer is, as always, "It depends."

The two biggest factors for creating useful test environments are: "How often does your schema change?", and "How much data do you need in your database for meaningful test cases?"

Schema Changes: As a J2EE architect, the first time I saw Ruby on Rails' database migrations my first impulse was to wonder, "Why the !@#$ is this not in Hibernate?" I am not aware of any slick framework for J2EE apps to manage DB migrations, so you may have to use your own migration scripts. Hopefully, your schema is not changing much.

Getting Data In There: This totally depends on how much data you need. My "favorite" reply to you was to have one snapshot of your production data per developer. That works great, as long as you don't have much data. My last project had I don't even remember how many terrabytes of data in prod. Do you really think the client was going to spring for that much storage and that many Oracle licenses to get one instance per developer? Yeah right. We had a full snapshot for performance testing, but regular integration testing was done on a representative subset of data.

DBUnit is a great way to initialize a small amount of data. For larger datasets, you cannot get away with things like DBUnit, as it would take hours, if not days, to get the data in there. For our performance testing databases, we had the prod data snapshot stored on a RAID-1. Before testing started, we broke the mirror and did testing against the degraded array. When it came time to reset the data, we shut down Oracle and rebuilt the array to the good snapshot. That wound up being very fast for us. For medium amounts of data, you could probably get away with using SQL*Loader.

--
They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock

Re:DB for Unit Testing? by Anonymous Coward · 2006-09-22 00:37 · Score: 0

My last project had I don't even remember how many terrabytes of data in prod.
What's a terrabytes?

Functional DB testing versus unit-testing... by phamlen · 2006-09-20 05:48 · Score: 2, Insightful

One approach that has worked for me in the past is the "backup and recover" approach. Basically, it works like this:
1) You maintain a canonical "test" database (or multiple ones). This database has the same functionality as the production database but generally contains much less data. No one touches this database unless they need to permanently modify the test data. After each release, you make a backup of the database and release that backup to everyone who needs a test database. They restore it to their own environment.
2) You always write changes to the database as scripts so that you can run them against your test database and your production database. Your release process has to change to include running any database modification scripts on the canonical test database as well as the production database. This ensures that your new test database matches the production database for that release.
3) You need to modify your test process so that it runs a database restore at the appropriate points. In our case, we always restore before QA functional tests (because they leave the database in an altered state) but we don't restore for unit-tests (because we insist they leave the database in the same state they started.)

The advantages to this approach is that everyone has a copy of an actual database and you get to see all the funkiness of your real environment. The downside is that you have to be very disciplined in keeping the backups for all releases, and for running modification scripts against both the test and production databases appropriately.

-Peter

Like a Forest Fire by Flwyd · 2006-09-20 08:05 · Score: 2, Interesting

We define our schema in an XML format. We have a class that builds a DB from that format, subclassed by database type, making skeletal DB install an automated process. This also means it's the same process to install a client site using Oracle as it is to install a test database on a developer machine using Postgres.

When our master build runs test cases, it drops all tables and creates them all fresh using the XML definitions. Each JUnit test case is responsible for ensuring it has the data it needs. In some cases, this is done by setting up a facade on the regular service so that the test can worry about semantics and not data storage. In other cases, the test (or a utility) creates test data. You could presumably also copy part of your live data, though that makes it much more difficult to know what the correct answer is in advance.

If you follow this structure, multiple releases with different schemas is trivial. Just have a parameter for the DB URL in your test suite and let it build the correct database version for you when it checks your schema out of your source repository.

(Incidentally, keeping your database schema in your source repository also allows easy comparison of database structure between code versions, making it easier to figure out what must happen when you upgrade.)

--
Ceci n'est pas une signature.

DBUnit can be very useful by mikeburke · 2006-09-20 12:38 · Score: 3, Interesting

I work with a large, legacy codebase - about 2 million lines of code, 600 tables. Some bits are nicely written, some aren't. Concepts such as dependency injection, seperation via interfaces etc are not prevasive, so traditional unit testing approaches of mocks or HSQL are not useful (in fact I find they do not scale for 'meaningful' tests anyway).

So you have this legacy code base - you want to make changes, but how can you validate the result? One approach is to compare database states - one from a known good codebase, one from a modified codebase. DBUnit can be tremendously useful here - this is what I've done (perhaps too complex for explaining on Slashdot):

Create a common Unit Test base class that extends DBUnit's DatabaseTestCase. It will:

a) receive a list of modified table names from the concrete test class
b) if a system property is set, export a pristine copy of these tables prior to running the test - 'reference data'.
c) execute the use case (register a user, perform a transaction, whatever) - this just makes a 'blind' call into the
code proper.
d) if a system property is set, export the modified table data ('known good results')

The idea is you run this test twice:

1) With the original codebase, with result exporting enabled to generate known good results.

2) With the codebase under test - the results generated will be compared against known good results and DBUnit will flag any differences. You can get it to ignore stuff like sequnces,dates that will differ between runs.

The reference data generated in (b) is reloaded prior to running the test second test, so you start from the same point. Each concrete test class just has to:

* figure out what tables change within the test
* provide the test code itself

Everything else is managed by DBUnit - exporting/importing datasets, comparing datasets, etc.

Re:DBUnit can be very useful by Anonymous Coward · 2006-09-22 01:01 · Score: 0

seperation... prevasive... those sound like perfectly cromulent words. You should really learn how to spell "big words" before touching any "legacy-600-million-lines-code-base". Really.

What about SQLUnit? by Abobo · 2006-09-20 14:16 · Score: 2, Informative

http://sqlunit.sourceforge.net/ is based on JUnit and it specifically designed to test databases and result sets. It is what I use when building automated test streams. Supports many databases on fresh download and can be extended easily if required.

Make Sure Test Data Isn't Sensitive by queenb**ch · 2006-09-21 08:05 · Score: 1

I had one client who had a bunch of customer records compromised when they sent out some data to a development firm for "testing purposes". There are several products out there that will take actual records, scramble them and spit out a "test" database. I'd highly recommend doing that, no matter what other methodology you use.

2 cents,

QueenB

--
HDGary secures my bank :/

Re:TROLLKORE STRIKES AGAIN by Anonymous Coward · 2006-09-21 22:55 · Score: 0

This is not a bad thing.

Listen to your costumer?? by DaveInAZ · 2006-09-22 09:13 · Score: 1

First, listen to your costumer... Costumer? LOL! I'm pretty sure most costumers wouldn't have the faintest clue about how to set up a database testing environment. They might know about floppy hats and masks, but not floppy disks and markup. Sorry, dude. That was just the funniest typo I've seen in a long time. :-) I keep picturing some dude dressed like Will Shakespeare hunkering over a server, muttering "Verily, thou are a varlet!", or some such silliness.

Spring by pamdirac · 2006-09-22 16:55 · Score: 1

Spring provides TestCase subclasses that provide a Spring ApplicationContext and a TransactionManager. Spring automagically starts a transaction in setUp() and rolls it back in tearDown(). They provide hooks to execute setUp() and tearDown() code both inside and outside the transaction. You can force the transaction to commit if you want, but that's not really what you want to do. I've found that this works really well for a number of reasons 1) initialize the database once, 2) unit tests are independent because they do not alter the database state permanently, 3) transaction management is independent of the production code and unit tests and 4) it performs pretty well - most RDMSes will keep uncommitted transactions in memory. There really shouldn't be a need to commit anything because _unit_ tests should not need to cross transaction boundaries.

I used Hibernate with Spring, and I think that helps a lot. I don't know if you have any control over that. Hibernate makes database manipulations much more terse, and it is actually pretty easy to write a little code to create a small bit of database state just for a particular unit test. Writing single use code to just toss in 15 rows using straight JDBC is, by contrast, more err, interesting.

The other thing Spring does for you is that it obviates the need for any hokey J2EE specific harnesses. One of the most sucky things about EJBs is that you have to have a running app server for them to be usable. Spring frees you from this burden. Your ApplicationContext(s) are valid inside and outside the container.

The other thing I would point out is that database schemas should not be such a precious resource. I develop on my laptop, so I don't want to be tethered to my corporate LAN (or any other LAN) in order to develop. Oracle XE is free, easy and quite reasonable resource-wise. MySQL, Postgres and other OSS databases are easy to run locally. Let everyone create their own local database installations. This is perfect for unit/integration/acceptance testing, anything functional in nature. If you deploy with the database on a dedicated server, you will still need to test that scenario, but that only impacts things like performance, high availability, etc. These are certainly important issues, so do test them, but for unit tests, just run a local database.

--
John McNair

Why not try SQLite? by m2pc · 2006-09-23 18:12 · Score: 1

I used this recently for running test cases against Python code, and it worked great! I placed some DB population code in my setUp() method so you can run the test from any dir and it works -- no DB server needed!

It works like MS Access (file-based) but supports most of the SQL92 standard.

http://sqlite.org/

Slashdot Mirror

Strategies for Test Databases?

66 comments