SQL Server Unit Testing with tSQLt

When one considers the amount of time and effort that Unit Testing consumes for the Database Developer, is surprising how few good SQL Server Test frameworks are around. tSQLt , which is open source and free to use, is one of the frameworks that provide a simple way to populate a table with test data as part of the unit test, and check the results with what should be expected. Sebastian and Dennis, who created tSQLt, explain.

Consider the last few database bugs you had to fix: Were they easy to find? When you fixed them, how sure were you that something else didn’t break? Did it result in an incorrect value being recorded? Was the user presented with erroneous data? If so, were they aware of it at the time, or were they unaware and it resulted in a potentially poor (or even dangerous) decision?

SQL code controls how data is retrieved, stored and maintained in many applications. It includes views, stored procedures, functions, triggers, the creation of tables and the relationships between them, and query statements embedded in other programming languages. Writing this code often involves decisions about the nature of data being processed, complicated joining and filtering to retrieve or modify the correct data, performance tuning, data cleansing, replication and data maintenance. In far too many cases, the testing performed on this code is insufficient to prevent serious defects.

Much of the testing done on SQL code today involves manually executing the code on a copy of production data and hand checking the results. Manual testing results in a laborious process and some test cases being forgotten as the code changes. Using existing data means that the code is not tested against cases that are not currently in the system, but may be tomorrow. Hand checking the results introduces subjectivity and human error to the process.

Sophisticated database developers have tried their hand with the unit testing tools available for SQL testing. However, many abandon their efforts as test case maintenance using these tools becomes prohibitively expensive. There are two primary reasons for this expense: making changes to the single set of data used to populate the test database causes test cases to break; or the test cases break as the structure of the database changes.

Many attempts at database unit testing prescribe a pre-populated test database which includes the record required by the tests. If a new test case is written that requires data that is not in the test database the existing data must be augmented. It does not take long before augmenting data for a new test case causes several other test cases relying on the same data set to break. Furthermore, some test cases become impossible to write with the existing test data. For example, a test case which tests the result of a query when data does not exist with certain criteria.

The solution to the problem of data sharing is to have each test case create its own set of test data. Each test case assumes an empty or nearly empty database. At the beginning of the test, the test case inserts all of the data required for the test case to execute. In order to satisfy constraints, data will often need to be inserted into columns and tables which are unrelated to the code being tested. As the number of test cases grows, there will be more places that must be updated if a column is modified or a table with a foreign key constraint is added.

The tSQLt unit testing framework for SQL Server was created to address these problems. It allows each test case to create the necessary data, while eliminating the pain of test case maintenance due to unrelated schema changes. It is a freely available, open-source project which can be downloaded from: http://www.tsqlt.org..

A Simple Test Case

Let’s walk through some examples of unit test cases written in tSQLt to look at how it solves these problems. Imagine we have an application that accepted user-entered input for daily temperature data, and recorded it in a table. The application started off rather informally so no restrictions were put on the data the user could enter, but now we need to make serious use of the data so we need to interpret consistently. Here’s an example of some of the data in the DailyWeatherInfo table:

1212-UnitTest1.jpg

There may be other data in the table, and by the time we deploy our solution, there might be new records which we haven’t seen before. One of the things we’d like to do is standardize all the values to the same system. We’ll choose Celsius for this example and, for simplicity, work with output values rounded off to the nearest integer value only. We’ll need a Fahrenheit to Celsius conversion function. Let’s write our first test case:

First (line 1), creates a new test class. A test class is a grouping of related test cases. In reality, a test class is a schema in the database. When we create the test case procedure on line 4, we create it on the test class schema. Notice how on line 14 we can refer to the test class name when we write run the test.

tSQLt allows test case names to have numbers, spaces and most other symbols in the name as long as it conforms to the naming standards of SQL Server stored procedures and the name starts with the word “test”. tSQLt identifies stored procedures on a test class schema whose name starts with the word “test” as test cases.

Lines 7 and 8 execute the function that we want to test and store the result in the @celsiusValue variable. Line 10 compares the expected result (0) with the actual result (@celsiusValue). If the two values are not equal, then the test case fails.

Let’s look at the result if the ConvertFahrenheitToCelsius function has a bug and returns 5 instead of 0:

The first line tells us which test case failed and why. The table from lines 7-13 provides a summary of all test cases which were executed and their results. Lines 11-12 are raised as an error so that other programs working with tSQLt can easily detect test case failure and quickly brings your attention to the fact that a failure happened when running in SQL Server Management Studio.

We would create several more test cases including tests for negative and positive values, null, and decimal values. Each time we add a test or change the code, we can re-run all of the tests and make sure that everything is still working.

Independence from Constraints

Now, let’s try a more complicated test case. Before cleansing the data, the manager wants to know how many Fahrenheit values will be converted to Celsius. Hence, we will want to create a view that returns the count. For the view to work, we’ll have to put data into the DailyWeatherInfo table. However, the other columns such as id, recordingUserId and recordedDate are not relevant. Nor is it important to this test case for there to actually be data in the User table foreign keyed to with the recordingUserId column. Finally, the locationId column that someone is thinking about adding next week also should not break this test case. We want our test case to be independent of these unrelated concerns.

To achieve this independence, tSQLt provides the FakeTable procedure as seen in this test case:

Notice how the test case only deals with the specific table and the columns that it needs. It is not complicated by the other details about the database. This keeps test cases easy to maintain, easy to read, faster to write and faster to execute.

FakeTable works by creating an empty copy of the original table without the constraints or relationships. Only the column names and data types are preserved. Each tSQLt test cases is executed within a transaction which is rolled back when the case completes. Therefore, the original table is put back in place when the test case finishes.

Operating inside of a transaction also means that any work done by the code you are testing is undone. This is important, to keep all test cases independent of one another. In SQL test frameworks without transactions, one test case can unwittingly modify data needed by another test case. If the test cases are executed in random order, some test cases will appear to fail randomly. Nobody wants to debug those situations, so tSQLt was written to automatically place each test case in a transaction to avoid those problems.

Comparing Sets

So far, we’ve only been looking at single valued results. Much of the time we really work with sets of data and want to validate the results of a query or a data modification statement. Just checking the number of rows returned or updated is not sufficient! Many defects are missed when test cases only check the amount of data returned but not the contents. However, writing code to loop through and check each result of a query or data modification is tedious.

The AssertEqualsTable procedure compares the data in two tables. Let’s take a look at a test case that uses AssertEqualsTable to check the result of a data modification.

As in the previous example we’re using FakeTable and INSERT statements to create the test data (lines 4-8). Next we execute the procedure we are testing, CleanseTemperatureData (line 10). After the procedure executes, we want to capture the changes it made to the DailyWeatherInfo into a working table which we’ll call “#Actual” (lines 12-14). Then we want to setup the expected results in a table of their own (lines 16-22). Finally we compare the results in the “#Expected” and “#Actual” tables (line 24).

This establishes a common pattern of writing unit tests which need to compare set based results: we perform the action, store the results in an “#Actual” table, create the expected results in an “#Expected” table and then compare the two.

Let’s take a look at the result of this test case if the CleanseTemperatureData procedure had a defect where it did not actually perform the calculation of Fahrenheit to Celsius, but only chopped off the “F”:

The column, _m_ indicates the result of the comparison for each row. The less than sign (<), indicates that a result was found in the Expected table but was not found Actual table (lines 5-7). Similarly a greater than sign (>) indicates that a result was found in the Actual table but was not found in the Expected table (lines 8-10). If a row was exactly matched, an equals sign (=) would be present in the _m_ column. For two rows to match, all values in every column must be equal. For a more in-depth look at the reasoning behind tSQLt’s table comparison algorithm, please click here.

Your Next Steps

You’ve now seen an approach to overcoming the greatest technical obstacles to database unit testing. The tSQLt framework supports your unit testing efforts by isolating your test cases from constraints and schema changes, handling the transaction management required to execute independent test cases and providing the framework basics such as assertions, grouping test cases into test classes and executing test cases. tSQLt can also produce output conforming to the JUnit XML output format so that it can be integrated with popular continuous integration servers. We encourage you to try it for yourself and write your own robust unit test cases.

Tags: , , , , , ,

  • 27416 views

  • Rate
    [Total: 3    Average: 4.3/5]
  • AlexK

    I cannot agree with the approach
    After unit testing my T-SQL for three years, I completely agree that “test case maintenance using these tools becomes prohibitively expensive”. I also agree that “making changes to the single set of data used to populate the test database causes test cases to break”.

    However, I don’t think that the suggested solution is efficient for large projects, and here are my reasons:

    1. Building test data for each test separately leads to huge duplication of code. You will have to add, for example, a country, a currency, and a customer thousands of times. Besides, when we have a large test suite, it simply takes too much time and slows us down.

    2. Populating test data via inserts is very brittle – it breaks when table structure changes.

    3. Ignoring constraints may lead to testing against impossible test data, which can render some tests useless.

    4. Last, but not least, T-SQL is simply not a good tool for this functionality. C# and NUnit, for example, already have the functionality we need, and it is very robust and mature because lots of people already use it.

    In my experience it is much more efficient to run all tests that do not modify data off one and the same data. Naturally, when we change this data, some tests break, as you have pointed out. To simplify maintenance, we refactor expected results out of our tests into separate files, regenerate the expected results, and verify that the changes in expected results are consistent with the changes in test data.

    This approach required dramatically smaller code base, requires less maintenance, and is less prone to errors than building test data for each test separately.

    What do you think?

  • DennisLloyd

    RE: I cannot agree with the approach
    Hi Alex,
    Thank you for your thoughtful reply and for testing your SQL. Let me take a moment to address each of the reasons you provide:

    1. My goal with writing unit tests is to keep duplication superficial at worst. A well written test case which inserts data will insert only the values needed into the necessary tables for the specific unit being tested. If the test is related to a country, a currency and a customer all at the same time, then it is not a bad thing to make that clear in the test case. When data is created up-front, that detail about the test’s dependency on that data is obscured; so when the test fails, you need to go research which data was relevant to that particular test case.

    When I have lots of very similar test cases, there are few strategies I can use to reduce duplication. Here are a few of them:
    a. Refactor the duplication into new stored procedures, passing in parameters for the values which change in each test case. A note of caution: you do not want to do this so much that your test case becomes obscure. It should be obvious from the test case which data is relevant and how the test operates.
    b. tSQLt supports Setup methods (similar to other unit test frameworks). Commonly created test data for a single test class could be created here.
    c. Tests could be written in a parameterized way. I’m not 100% happy with how this is currently done in tSQLt, but improving that is high on the backlog.

    I generally find it a code smell if I’m tediously repeating the same or very similar data all over the place. When I’m doing this, it normally tells me it’s time to refactor somewhere – either in my test cases or the code which I’m testing.

    You mention that when you have a large test suite, it slows you down. I am not sure if you mean it slows you down because running the test cases are slow, or if it slows you down because it takes longer to maintain. For the actual test case execution time it is admittedly, generally slower to create test data for each test case than to create static data up-front. The actual impact this has to your day-to-day development depends on a number of factors. I would like to see the overall performance improve though because I understand there are situations where the difference in speed is a critical factor. As for maintenance time, I actually think that having the test data in the test procedure reduces the amount of maintenance work as the developer does not need to jump between the test data set and the test case and the developer can freely modify the test data for a single test case without worrying about the impact to all other tests.

    2. Yes, populating test data via inserts is very brittle. This is why we created the FakeTable procedure. It allows you to write insert statements which are focused on the exact data which the test case needs – no more and no less. Therefore, schema changes affect the exact set of test cases which you would expect (i.e. those directly related to the schema change). Test cases unrelated to the schema change should not be affected. If they are, then you are inserting data in the test case which you do not need.

    On that note, I’ve seen test cases where “each test really needed lots of inserts”. This generally tells me that the unit under test is too large and is “hiding” other views, functions or stored procedures inside of it.

    3. We didn’t mean to imply in the article that constraints should be completely ignored. Constraints can be an impedance to creating targeted test cases and we found it helpful to put the constraints to the side when they were in the way. However, developers should not feel the need to write tests for situations that cannot happen. Moreover, tests should be written to exercise the constraints. It was not discussed in this article, but tSQLt has an ApplyConstraint stored procedure which, in conjunction with FakeTable, is very useful for isolating a particular constraint to test.

    4. As for T-SQL not being a good tool for this, I think this depends greatly on your situation. There are cases where I would not use tSQLt. For example, I would not use it in an application that had extremely minimal amounts of SQL code or where the SQL code is generated completely in memory before being executed.

    However, there are cases where I think it makes a lot of sense to write your test cases in T-SQL. Generally, I feel more productive when I am writing my test cases in the same language as the code I am writing. It reduces the number of context switches that my brain has to perform. It also allows me to keep focused in the same set of IDE windows, write with a similar style and execute my test cases faster.

    Other projects have very minimal amounts of C# (or whatever other language you like). The project that inspired the core ideas of tSQLt was an ETL project where more than 90% of the code was in T-SQL. It would have made little sense there to write the unit tests in NUnit. I would not say that a project needs to have 90% or more of the code written in SQL before I would start using a specialized SQL unit testing framework. It just happened that the framework was born out of this extreme. I find a specialized SQL unit testing framework useful when the SQL code is non-trivial and abundant enough that I’m spending more than a few minutes at a time reading or writing SQL code.

    As for robustness, tSQLt is newer than NUnit -that’s for sure. If you’re concerned about adopting something too new, that’s okay, I understand. There are times and places to try out new methods or tools. I would encourage you though, even if you are skeptical to give it a try on the side and see what you get out of it. Personally, in seven years of doing test driven development in SQL, I found that the dependency isolation techniques practiced in tSQLt opened me up to new solutions outside of SQL and the OO world from which the originated.

    Cheers,
    Dennis

  • AlexK

    I have concerns
    Hey Dennis,

    I have quite a few questions about your approach.

    1. If you refactor DailyWeatherInfo into two tables, what do you need to do to fix your tests?

    2. If you add a trigger on DailyWeatherInfo which rounds values to integers, like 11.3 to 11, and do not change your test, what happens? Could you still be getting actual results like 11.3?

    3. If you add a constraint on DailyWeatherInfo that makes some values impossible, and do not change your test, what happens? Could you still be getting actual results that are impossible with respect to that constraint?

    4. Have you considered generating expected results and storing them in separate files, rather than hard-coding them directly in your tests? That would speed up your development, and make maintenance easier.

    The very motivation of your approach, “making changes to the single set of data used to populate the test database causes test cases to break”, would become a no big deal: you could just change your “single set of data”, regenerate expected results for broken tests, make sure that the changes in expected results are consistent with the changes in the test data, and you are all set.

    That’s IMO much easier that isolating tests from each other.

    Also I have my doubts regarding this: “Constraints can be an impedance to creating targeted test cases and we found it helpful to put the constraints to the side when they were in the way”.

    Surely wihtout constraints you can come up with tests quicker, but you are more likely to make errors, and end up testing against impossible data and having loopholes in your tests.

    What do you think?

  • DennisLloyd

    RE: I have concerns
    Hi Alex,

    I just wanted to post a quick note and let you know we’re not ignoring your questions 🙂 I would like to put together a detailed answer with examples this week.

    In the meantime, I am curious about your approach in #4 above. Can you provide more detail about your development process and how it fits with generating your expected results? My guess is we may have quite different approaches here, but I want to understand your approach as well.

    Thanks again for your comments and I’ll have something together soon.

    -Dennis

  • eliassal

    Demo source code
    Sebastian, can you please provide source code for this demo
    Thanks in advance

  • khan04

    Hello from Khan
    Hello Sebastian,

    Thank you for nice informative article.
    Are there any similar commercial tool available in the market?

    Also is there any concept like code coverage implemented in tSQLt?

    Thanks,
    Ahmed

  • khan04

    Hello from Khan
    Hello Sebastian,

    Thank you for nice informative article.
    Are there any similar commercial tool available in the market?

    Also is there any concept like code coverage implemented in tSQLt?

    Thanks,
    Ahmed

  • khan04

    Hello from Khan
    Hello Sebastian,

    Thank you for nice informative article.
    Are there any similar commercial tool available in the market?

    Also is there any concept like code coverage implemented in tSQLt?

    Thanks,
    Ahmed

  • khan04

    Hello from Khan
    Hello Sebastian,

    Thank you for nice informative article.
    Are there any similar commercial tool available in the market?

    Also is there any concept like code coverage implemented in tSQLt?

    Thanks,
    Ahmed

  • Mark Anderson

    alexK would you mind elaborating on how you test tsql without tsql test