Unit testing database code is a bit of a funny problem. Most developers can pretty easily get their heads around unit testing a piece of Java code using interfaces and mock objects. When it comes to database code or DAOs, it suddenly becomes particularly difficult. But why, what is so difficult about testing stuff against the database? Surprisingly enough, the answer is that it has nothing to do with coding or a particular framework, although these do play their parts. It comes down to a complex web of human interaction, version control and managing environments. Let me explain.
The standard unit test has three basic phases:
- Setup (@Before)
- Test (@Test)
- Tear down (@After)
The first sets the test environment into an expected state, the second runs the test and checks that the outcome is as expected, while the final one clears up any test resources.
How does this relate to database testing? Let’s say that we have a DAO that performs a particular select statement. Our test should be to retrieve a particular number of records from a known set. Easy enough. The precondition of course, is that you have a known set to begin with.
It’s ALL about the environment.
Most large development projects go like this: The database guys update the schema. The developers write the code. The developers need a particular data set to exercise the various use cases so they add it to the schema. It all becomes a bit messy.
Eventually, very complex data sets are set up by everyone concerned in a primary schema that keeps getting updated. The database schema generally is not version controlled, as it is constantly being redefined using DDL statements run by the DBAs. Most of the time you will be lucky to get a backup of a schema, with all of the data truncated, as the schema and supporting code (i.e. the application) moves between environments.
Getting back to the test. You set up your data by hand in the master schema so that there were three items in the widgets table where some condition was true. You write your test, it runs against the schema, pulls out the expected three widgets and everything is great. You check in the tests. A week later your colleague, Bob, adds another widget to satisfy his test condition. Your test all of a sudden returns 4 items and the test breaks.
Of course, Bob didn’t actually run your test because he was too busy with his own and the test suite isn’t clean anyway because everyone is falling over each other.
Sound familiar?
What about inserts? The precondition: no sprockets were in the table, the test: insert a sprocket, the postcondition: a sprocket is in your table. Kind of hard to test under the above conditions isn’t it? For one thing, the exact data of the test sprocket may be in the table, so checking by value may give you false positives, while deleting it may get rid of more records than you wanted. What about concurrent tests? With a group of developers running the same tests, they start tripping over each other very quickly and the whole effort becomes an exercise in frustration. At this point the development manager throws up his hands, says that this automated testing thing is a load of bollocks and to get back to your work because they didn’t deal with all this when he was doing VB. Somewhere else, Kent Beck sheds a tear…
Let’s examine what goes on in Ruby on Rails. One of the best ideas that was popularized by this framework was its method for database unit testing. A developer’s workspace has multiple environments by default – development, test and production. You develop against the development schema, designing table structures, and playing with the user interface to your heart’s content. When you run unit tests, the following happens – the schema from development is copied into the test database with no data in it. The framework imports version controlled sets of test data (saved as YAML files) into this new schema. Whenever a test is run, it is guaranteed that the database will be in this state. Any changes a test makes are visible only within the scope of this one test. The tear down step cleans out your changes. This makes life so much simpler, especially if you have been working in the nightmare scenario above.
So how do we get the same sort of effect in a corporate development environment?
You need multiple database schemas in order to unit test your db code.
Pause and re-read that line. It’s not negotiable. Probably two per developer. One with sample data to use while you work on the user interface. The other, a temporary one for unit testing. A whole development team using the one schema does not work. Most projects do it, but that doesn’t mean that it’s a good idea.
Some suggestions for how to manage this. The DBAs have their own schemas. The full DDL for the database is kept in version control. After each change, the full database DDL is dumped and checked in. No UPDATE TABLE statements. Ever. This way you are guaranteed that if you ever want to get a baseline of your system, you can also rebuild the database as it existed at this time. I worked on a very large telecoms project with a huge development team, and this worked. Well.
The test data for your environments is stored in version control – at the very least, as dumps of insert statements. For unit testing purposes, a dedicated unit test framework is beneficial. DBUnit performs the same task in Java as described above for Rails – it loads test data from dumps (a number of formats are supported), and guarantees that the test database exists in the expected state when each test is run.
To test your database code, refresh your test schema with the one from version control – typically using your chosen build system. Ant tasks are generally pretty good for this. Now run your test cases. Gorgeous! No tripping over other people, and your tests are guaranteed to work the same each time. No excuses for a red bar.
So why is unit testing databases so difficult if it doesn’t have to be? Most of the time it involves process change and getting out of bad habits, not just a tool. And change means convincing people. Generally, managers do not understand what benefit there is in multiple database schemas, as it is seen to increase complexity and therefore risk, and DBAs like to have full control over what is going on on their servers. The topic of databases and processes is also a great one for religious zeal.
The process outlined above should explain the hows and whys to the individuals involved. The changes above mean a little bit more setup initially, but a saner development process.
A nice side effect is how easy upgrading databases through your environments can become. Run the latest DDL against a fresh schema, get the differences between it and your target environment using a database compare tool, and fire it off. Beautiful.
Comments
2 responses to “Unit Testing the Database Tier”
Hey!…Thanks for the nice read, keep up the interesting posts..what a nice Sunday . Sasha Cohen
I am glad to share one database unit testing tool. It is named as AnyDbTest (http://www.anydbtest.com).
I know some guys are using DbUnit or other xUnit test framework to perform DB unit testing. I also tried to use them in my projects, but at last I had to give up these tools because I must keep focus on the database rather than switch to be as application developer.
AnyDbTest is declarative style testing tool. We will not need to program at all. What we do is to express what we want to test, rather than how to test. We only need to configure an Xml test file to tell AnyDbTest what we want to test. Rather than painstakingly writing test code for xUnit test framework. So AnyDbTest is the right choice for DBA or DB developers.
Features specific to AnyDbTest:
*Writing test case with Xml, rather than Java/C++/C#/VB test case code.
*Many kinds of assertion supported, such as StrictEqual, SetEqual, IsSupersetOf, Overlaps, and RecordCountEqual etc.
*Allows using Excel spreadsheet/Xml as the source of the data for the tests.
*Supports Sandbox test model, if test will be done in sandbox, all database operations will be rolled back meaning any changes will be undone.
*Unique cross-different-type-database testing, which means target and reference result set can come from two databases, even one is SQL Server, another is Oracle.