Are Unit Tests Overused?

Unit Testing has come to dominate the many types of test that are used in developing applications. This has inevitably been at the expense of other types, such as integration test. Does a successful unit test regime ensure quality, or should we see unit testing as just one of a range of tests that can together give us confidence in an application?

Unit tests and Test Driven Development, in particular, are very fashionable right now to the point where I feel we overuse them, in places where they are not especially helpful.

More damaging, though, in my opinion, is that unit tests forces us to expose implementation details of our code. We often need to expose the dependencies in our code so the tests can eliminate them, when all we really want to expose is the higher-level encapsulation. Sometimes it does make sense to try to test an implementation in isolation, but it’s important not to put the cart before the horse. If the needs of the test reduce the quality of the architecture, or if the test is a tautology then we should consider other techniques.

Rather than more and more unit testing, I’d prefer people channel extra energy into better ways of integration testing.

The Good and Bad of Unit Tests

Over the past decade, development processes such as Test Driven Development (TDD) have gained prominence, with the laudable goal of simplifying designs, and making them more flexible and easy to change. TDD encourages small, incremental improvements to a working system, writing a failing test (such as a unit test) that defines some part of the behavior of each unit (typically a method, or a single path through that method). We then strive to write the simplest code that will make our failing test pass, leading us to write very loosely coupled, well-encapsulated components, and so on. At least…that’s the beautiful theory.

The problem I find with unit tests is the same problem I find with design patterns, and other examples of very good ideas that find themselves tagged with the ‘best practice’ label: they get applied everywhere, often to the detriment of other forms of testing.

When we have succeeded in writing simple methods with very few dependencies or where it is relatively easy to refactor that code so we can test it in isolation, unit tests are very effective.

However, real code has dependencies we can’t remove easily. In his recent (quite heavily criticized) piece on Unit Testing Myths and Practices, Tom Fischer observed that…

“…dynamically cobbling together different data stores, libraries, web services via dependency injection, configuration files, and plug-ins spawned an entire new breed of defects for which unit testing proved marginally effective.”

I have some sympathy with this point of view. We’re supposed to remove all these dependencies in order to write our unit tests. This means that we can write very specific tests and, as long as we’ve applied the DRY (Don’t Repeat Yourself) principle to our tests as well as our code, then when we make a breaking change our failing test will point us at the exact piece of code that has the issue. These tests will also help us catch edge case bugs. However, by removing dependencies in this way, it also starts to limit the usefulness of what we can test.

Not every issue arises solely from a small piece of incorrect code. Many bugs only appear when the dependencies are in place, for example, when we’re using the real data rather than a test database. When we’re writing code that uses a third-party library (or even using code written by more than one developer), the hard bugs will often come from misunderstandings about how they are supposed to interact, so it’s common have two pieces of code that work in isolation but don’t actually work together. Unit testing eliminates the dependencies and can’t detect this.

Of course, no one ever argued that unit tests on their own are sufficient to catch all bugs, or that they are a replacement for system and integration testing. However, my experience is that, in practice, unit tests quickly become the proverbial hammer that makes everything look like a nail. Many developers do spend their time writing more and more, often only marginally useful unit tests, rather than spending that time on the other forms of testing that will catch the bugs that only appear when we’ve wired everything together.

There is, however, a more serious problem with unit testing than “overuse” and it’s that they can destroy encapsulation.

The Importance of Encapsulation

There is a theory of the Universe, the Holographic Universe Theory, which, if you’ll allow me to simplify a little, says that we can derive an entire description of the Universe from knowledge of just the state on the outside. In other words, we don’t care what’s happening inside because encoded on the sphere that surrounds it is a full description of the universe.

A similar theory underpins good software design, and is manifest in principles such as component-oriented design and encapsulation. The internal representation of a component is, or should be, hidden; we don’t care about its inner-workings as long as we fully understand its surface, the interface, which describes everything you need in order to understand how the component should behave and to test whether or not it is behaving correctly, and we only program to the interface.

For me, this remains one of the deepest insights into software engineering. It is a founding principle of UNIX, and differentiated UNIX from other operating systems at the time. A UNIX program is essentially a set of ‘black box’ programs connected by pipes, in the form of very narrow, very tightly specified interfaces. Each program “does one thing and does it well” and programs communicate via a simple text stream. This makes it easy to stick many programs together, just by having them send text to each other, and checking the outputs were correct for the given inputs.

UNIX offers a very persuasive example of how encapsulation enables change. Look at any standard UNIX tool, and you’ll find that none of the documentation focuses on its inner workings. It just says, “This is what happens when you run ls or you run cat“. It means that anyone can come along and write a new version of cat, as long as its interface works exactly as described in the documentation.

It also means that anyone can check that a program does what it’s supposed to do just by reading the documentation, so tests are independent of the implementation.

By Eliminating Dependencies, we can Destroy Encapsulation

The problem I see with overuse of unit tests is this: in order to isolate a piece of code, we often need to expose the dependencies so the test can eliminate them.

When designing code for release, we often want to keep the tests in a separate library (so that we don’t ship them to the customer). With unit testing, we also want to test individual implementations, which are often the things that are marked as private or internal. In order do this, we have to make public the lower-level implementation details; it’s necessary to write InternalsVisibleTo so that the testing framework has a privileged view of the code and can explicitly test this stuff. This means that implementation details wind up having dependencies elsewhere, which is a broken architecture.

Ultimately, if tests contain assumptions about, or depend on, a particular implementation, then not only are they testing the behavior, but also testing that those assumptions and dependencies are still present. If we were to replace an implementation with an entirely new one where the outward results are equivalent, but which had removed a dependency, then the tests would fail, not because the code is now incorrect but because the tests are incorrect.

Unit tests are supposed to prevent this by removing all of the dependencies, but run into the issue that in order to remove a dependency, we have to first know what the dependency is, and expose the implementation details where that dependency is made.

TDD’s “write the test first” philosophy can help here: if we don’t know the implementation in advance, we can’t make assumptions about it. However, in my experience, it’s common that tests change after we’ve written the implementation because of some newly discovered requirement, or we introduce new tests for bugs not covered by the original suite of tests. This is when the problem arises.

Ultimately, code is more maintainable the fewer implementation details we reveal through the API, because it allows us to change the implementation without having to change other pieces of code. The more we “poke holes” in the interface the more it reduces the quality of the API and reduces the maintainability of the code.

Once we write poorly encapsulated code, other developers will be more tempted to peek inside and write dependent code that “secretly” uses knowledge of the other code’s inner workings, such as the real type of the returned item. Unit tests won’t tell you about this hidden dependency; the original code still does what it always did. However, at this point, it’s no longer “safe” to change the original code, as the other, new code (which could be anywhere) is dependent on the specific implementation of the method.

Tests can only detect breakage; encapsulation can actively prevent it from ever happening in the first place, so when the two concepts are at odds, it’s better to sacrifice “unit testability” for encapsulation.

Exploratory Integration Testing

To make reliable software, unit tests play an important role, but I’ve come to regard them as a rather formal type of testing. A good analogy is the “bed of nails” tester in electronics where by rigorous design of the tests we can make sure our implementation conforms to certain behaviors.

Unit tests have become the hammer that makes everything look like a nail. What we really need, I think, is not more and more unit tests, but better encapsulation, very tightly defined interfaces, and a better way of writing and automating integration tests that allow us to do a more ‘exploratory’ form of testing, with dependencies in place.

If we have a set of ‘black box’ programs using a simple form of communication across very narrow, very tightly specified interfaces then it’s harder to test those interfaces because we can’t test the individual functions of the things we’re plugging together.

What we need to be able to do, in electronic engineering terms, is get out the oscilloscope and a battery. Place our electrodes, measure the behavior; change an input, measure it again and see how it changed. In this way, we detect bugs by finding the places where the behavior of the code changes unexpectedly. We need better tools to support this sort of exploratory testing. As we explore the behavior of our code, the idea is that such a tool would record as much of this behavior as possible (which means function return values, and so on). We then detect bugs by asking the tool to do the same things again and seeing what has changed.

Consider, as an analogy, the humble spreadsheet. We can plug all sorts of complex calculations into a financial spreadsheet. Once they are all in there, we can start running ‘what if’ scenarios, where we modify our inputs regarding various revenue and expenditure streams and see what comes out of our financial model. This in essence is what exploratory testing is all about, and I’d like to be able to test software in a similar fashion. If I use this input for this bit of code, what is the behavior? In this form of testing, we don’t care about implementation, and the library of tests develops as we’re developing and understanding our code. We write some code, we test it straightaway; we can try it out and see what it does very quickly. Once we understand how the code is supposed to perform, by exploring it, we can write assertions for it and turn these into formal automated tests.

An additional advantage of this sort of “top-down” exploratory testing is that if we’re working from the public API, the tests are more in the form of “does the program do its primary task” rather than “is this specific implementation correct”, while still being able to detect and narrow down the same kinds of errors. If we replace implementation A with implementation B, and our tests only look at the interface, then they will provide some verification that the replacement was a success. In other words, tests that work at a more abstract level will start passing once we pick the right implementation.

By testing at a higher level, with tests that are unaware of any dependency not exposed via the public API, we remove the problem of tests that test functionality as well as whether or not any underlying assumptions/dependencies are still valid. We also make it much less likely that we’ll introduce this problem later, by accident.

Of course, this form of “top down” testing means that a test might need to start up a significant chunk of the program we wish to test and this might make it harder to pin down the specific part of the program that has failed. However, the important point is that the failure exists and we’ve detected it, so preventing us from releasing bad code.


In my experience, developers channel the vast majority of their testing energies into unit tests, even in knowledge of the fact that just because the smallest components of a program all operate correctly in isolation, it doesn’t mean the program as an entity is correct.

I don’t suggest that anybody believes that unit tests are a substitute for integration and system testing, but I do suggest that there’s a tendency for developers to get too involved with the mechanism rather than the intent, especially when looking at the smallest parts of a piece of software, which can cause this to happen.

Writing good software requires balancing many different interests, which are sometimes at odds. If you emphasize any one of them (say, tests) above all the others, you start to make trade-offs in other areas. One of the claims of TDD is that code that is unit testable is also automatically code that is well-architected (loosely coupled components, simple well-defined interfaces, and so on), but this isn’t necessarily true, for instance because unit tests often need to break encapsulation, in order to be actual unit tests rather than integration tests.

The sort of exploratory testing I suggest is necessary, and for which good tools don’t currently exist, is top-down by nature, so we’d start at a higher level. The usual way to do this, now, is to run the program and play around with the UI, which is usually the most abstract interface in the entire application. I envisage the equivalent but for playing around with the code.

Tags: , , , ,


  • Rate
    [Total: 0    Average: 0/5]
  • Alex Kuznetsov

    what a coincidence
    I could not agree more. In fact, I am finishing an article emphasizing pretty much the same thing, only specific to database development.

    Surely, we need comprehensive integration tests. Otherwise our users are doomed to discover untested permutations and waste precious time troubleshooting.

    Just as surely, when we do have integration tests executing all possible combinations, and skip on unit tests, the end result is frequently the same – the quality of software does not suffer much, or not at all.

    My coworker Jay Fields has written up a nice summary:

  • Anonymous

    It’s TDD, not UTDD
    I think TDD is excellent, but I don’t see it as limited to unit tests. You should be doing TDD for integrations as well (why wait for a SOAP service implementation if you have the WSDL?).
    We have both integration tests (black box SoapUI suites) and unit tests (single classes as well as all-layers API tests) in our product and they help us immensely with quickly isolating the probable location of an issue. You need many levels of granularity to faster get to the relevant code and the design-by-contract of DI enables us to establish code "trust" so we can mock that code in other tests (for better focus and performance). But this never stops us from adding high level unit tests on our "real" external APIs where we use the full stack, down to database.

  • David

    Open Closed
    I believe unit tests (with regards to TDD) actually reduce the quality of code, not improve it because ultimately the developer ends up fighting the wrong battle.

    We had a situation recently where we hired a contractor who followed TDD to the letter and had a test suite consisting of hundreds and hundreds of unit tests.

    When we confronted him regarding this and the time taken to write all these tests his justification was that if he didn’t have all these tests then he’d waste more time trying to find the source of bugs and cited numerous occasions when these unit tests had helped him do exactly that.

    What I couldn’t fathom though is why did his unit tests fail on such a regular basis? After looking at his code it became clear that by following TDD so stringently that his code was brittle and more prone to breaking.

    In over 10 years of development I can count the number of unit tests I’ve written on one hand. I however focus on the Open/Closed principal that my code is open for extension but closed for modification. In this way I write layers of abstraction that allow each layer to be developer tested and made solid before moving onto the next layer. It is extremely rare that I make (or need to make) a change in one area of code that spiders through the rest of the codebase causing unforseen bugs.

    Ultimately TDD is more a crutch than it is an essential methodology.

  • Anonymous

    A completely different ball-game

    Your design is sound; however, you seem to want to throw the baby out with the bathwater.

    At the end of the day, if your code is written in a way that seems as though it were driven by tests (your description alludes to this being the truth), then that is good. I would wager that you still test your code to make sure it works. Further, when you change something, you likely re-test the parts that are likely to be affected by this change. If you’ve written your code well, this surface area is minimal. You test manually…so can you honestly say that automating these test would be a bad thing?

    If you never refactor, fine, you win; however, if this is the case, you are playing a completely different ball-game.

  • Theodore R. Smith

    Archived for posterity!
    I archived this insightful article over at

    Now, even if you lose the article or the site goes down, your words will live on to benefit people in the future.

  • Tom Fischer

    Going Beyond Problem Recognition
    I applaud the author’s recommendation. While Exploratory Integration Testing may not solve all of our problems with writing better code, it explicitly recognizes the shortcomings of prevailing practices. I particularly found the electrical engineering analogy quite appropriate. That’s a field with the need to measure system inputs and outputs goes without question despite the industry’s deep understanding of capacitors, inductors, resistors, etc.

  • Anonymous

    In my own development
    I find unit tests to be very useful when dealing with calculations and rules based complexity.

    Where the code is simple, like basic data transfer and CRUD operations integration testing pretty much eliminates the benefit from unit testing.

  • Josh

    Re: Open Closed
    I agree substantially with what David says.

    First, it’s been my experience that unit tests have taken on a life of their own at the expense not just of better code, but of code that contributes to a better product.

    Second, some managers – in their eagerness to get something out the door – have taken hearing "unit testing is complete" to mean "all testing is complete".

    Third, undue focus on unit testing maintains a very high proportion of test-writing in the hands of the same programmers who wrote the code. That basically defeats the purpose of testing.

  • Copenhas

    Some thoughts
    Pretty interesting article but why does unit testing have to break encapsulation? I suppose since you have to pull out dependencies and mock/stub out the interactions…. of course if those are defined then I would think you could mock/stub them out without exposing too much about the unit under test. It’ll have made the same assumptions.

    Also TDD is an iterative design process from the top down. Seems like that fits well with the exploratory integration testing idea. TDD can certainly produce tests that loose value over time or are too brittle for long term use, but test code is still code. It’ll need to be maintained and better written test code will be easier to do that with.

    Also who says unit testing has to be down to a single class. Maybe a unit is a single cohesive unit which could be a single class or function at the top level but internally is broken down into several that don’t need any outside dependencies besides their cousins (so to speak). Or that cohesive unit does have a few external dependencies and you mock out those.

    The idea of dependency injection is really for that top level entry point. You give it the dependencies so you can always shift how that code is used and fits in, including to be able to test. It doesn’t have to be at every single level of code. Perhaps this is one of the things that dirties up OOP code, the ability to do DI so easily at every level. Makes it harder to understand all the dependencies for the whole piece or what really should always go together.

    Usually if something is hard to test it’s hard to maintain. Too many dependencies, too many preconditions, too complex of logic. Breaking things up to be loosely coupled and more cohesive lends itself to more easily understood pieces and more easily tested ones. I completely agree that different granularities of testing should be used but I don’t agree with some of these arguments that bad experiences with unit tests is due to the unit tests themselves.

    I think overall (if I got the author’s point) I agree with what’s written in the article. A bad unit test isn’t going to solve your problems nor is a good unit test going to prove the system works.

  • Anonymous

    Ok, but…
    Whenever you need to break encapsulation in order to unit-test a class, chances are that you have already violated the single Responsibility Principle.

    Before contemplating whether "select is broken", I would ask myself "is this a code smell?"

  • Juozas

    Not a fan of Unit tests
    I event tried to learn how TDD could help me. I watched how one of TDD proponents was writing a project "the TDD way" –

    What I saw is how the dogmatic approach of doing the minimum to fix the newest failing test actually prevented programmers from arriving at good design. They couldn’t choose a good design for representing their domain for their first test, because that would not be simple. Likewise after having dozens of tests they still can’t refactor to a good design because that would address issues not yet covered by tests. So they are stuck with writing horrible code to make incomplete test suit pass.

  • AndyDent

    How Unit Tests cramp design
    "Whenever you need to break encapsulation in order to unit-test a class, chances are that you have already violated the single Responsibility Principle."

    That seems plausible but my experience makes me lean more towards the article author’s premise.

    The big problem with unit tests is not that they exercise your code but that they need to know the results of exercising the code.

    It’s the inspection to verify that the correct state change occurred which causes the encapsulation breakages to be necessary.

    I’ve just being doing some TDD for my database book sample code, where i"m putting a little framework on top of leveldb to make it easier to generate complex keys.

    I have had to refactor some headers to expose things to be unit testable.

    The worst thing is reconciling testing with the modern use of blocks (lambdas) which is making me rethink the SRP a bit.

    To satisfy SRP and testability, what might be a highly-readable inline block with a couple of lines gets pulled out into a separate, testable class. In isolation yes this class looks "safer" with its test wrapper for its logic but the cost is making the code consuming it less readable and (sometimes) harder to parallelize.

    I have to check, but I think there’s a related argument in Coplien’s DCI about how some architectures end up making things harder to understand.