Thoughts on automated testing

Published on 2015-05-07
Tagged: object-oriented testing

I've been thinking about testing a lot lately. I recently tried (and failed) to write a test for a simple new piece of functionality at work. I just needed to verify that an Android activity called a method in a new class I'd introduced. Should be simple, right? Unfortunately, our build system and our multiple competing dependency injection systems got in the way, and I couldn't figure out a way to write a test that would work for all the apps built from our code base.

These obstacles are a real problem. They're basically technical debt: they made the simple task of writing a test more difficult than it should be. When it takes literally ten times longer to write a working test than it does to write the code being tested, people aren't going to write tests. We have relatively poor test coverage, which comes as no surprise.

I'm writing this article to organize my thoughts on the subject of testing so that I can better understand the obstacles which make it more difficult, and hopefully eliminate them in the future. Hopefully someone else will find this useful, too.

Purpose of testing

The purpose of testing is to make it easier to change a program without unintentionally breaking anything. Tests verify that a program works the way it's supposed to work. A good set of tests can tell you this almost immediately. If you don't have a good set of tests, you'll find yourself accidentally introducing subtle problems, and you'll spend lots of time debugging them later. It's almost always better to invest time writing tests with new code than it is to debug all the issues that come up later. Writing tests is usually less time-consuming than debugging, and it also requires a relatively fixed amount of time, while debugging can consume an unbounded amount of time as bugs are introduced and reintroduced down the road.

Regrettably, what I said above is not true in all code bases. If your code base is set up in a way that makes it very difficult to write tests, you may not save any time. In this situation, understand what's blocking you and eliminate it. You will be more productive in the long run.

A weighted dependency graph of modules

Most programs can be thought of as a collection of modules. A module is a piece of code dedicated to some particular task. What a module actually is depends on the language. In Java, a module would be a top-level class and maybe a corresponding interface or abstract base class. In C++, it would usually be a pair of .h/.cpp files.

Modules help us understand programs by providing abstraction through encapsulation. A module should provide some minimal interface so that other modules can use it. The implementation of that interface should be hidden from the outside world. When we think about a module and its dependencies, we should only need to consider the interfaces of the modules it depends on; we don't need to worry about their implementations. Breaking a complicated program into discrete chunks lets us understand one piece at a time.

For the purposes of testing, it's helpful to think of a program as a weighted dependency graph. Modules are vertices. Dependencies are edges. The weight of an edge is how heavily a dependency is used. An edge with high weight indicates tight coupling. Depending on what kind of test we're writing, we'll want to test some subgraph of modules. We may need to isolate the modules being tested from the rest of the graph by mocking out their dependencies (cutting the edges). This is especially necessary for modules that have side effects on the outside world, like deleting a row from a database or connecting to an outside server. The difficulty of isolation depends on how many dependencies need to be mocked and how tightly coupled they are (the total weight of the cut edges).

Types of tests

There are many different types of tests. They can be defined in terms of the module dependency graph mentioned above. Smaller tests test a smaller subgraph and usually take less time to write. Tests will usually need to isolate the modules being tested from the rest of the graph using mocking or some other mechanism.

Unit tests are the simplest kind of test. They try to test one module at a time. Usually, in order to write a good unit test, you need to isolate the module being tested by mocking out all of its dependencies. If the module's dependencies are pure, they don't necessarily need to be mocked out though. I'm using the word pure in the same sense as it is used for functions: a pure module doesn't have any side effects on the outside world, and the results its methods compute depend only on the inputs. This is pretty common in compilers, and I take full advantage of this in Gypsum. For example, the lexer takes a string as input and generates a list of tokens. The parser consumes a list of tokens and generates a syntax tree. When writing unit tests for the parser, it's easier to leave the lexer in place and pass in strings as input than it is to pass in a lists of tokens as input. The only drawback is that when the lexer breaks, the tests for the parser fail. However, I can still be fairly confident that the lexer tests will fail, too, so if I fix those first, the parser tests will pass again.

Functional tests check interactions between two or more modules, still at a fairly low level. A functional test might be of the form "ensure a function from module A gets called when an object from module B is initialized". In a functional test, several modules are being tested, but their collective dependencies still need to be mocked out.

Integration tests also test interactions between modules, but at a higher level. For example, a test might cover the interaction between a GUI program and a database, or a client and a server. In terms of the dependency graph, there's not much difference between functional tests and integration tests; modules can be in different programs and can run on different machines. However, the more modules that are part of the test, and the more complex their interaction is with the rest of the system, the harder it is to isolate them. Because of this, integration tests are relatively difficult and time-consuming to write, compared with unit and functional tests.

End-to-end tests cover a whole system, with as little isolation as possible. For example, if we were testing a web browser, we could navigate to various sites, check that clicked links are followed, and verify that pages are rendered correctly by comparing against screenshots. These tests can give very broad coverage, but they are difficult to set up, and they take a long time to run. Rather than isolating modules in a system from each other, we are isolating the whole system from the outside world. With the web browser, we don't want our tests to break when one of the sites being tested removes a link or changes its layout. So we might install our web browser on a virtual machine and set up a "mock" web server on another machine to serve static copies of the test pages.

Tools for isolation

Dependency injection is a great tool for isolation and for reconfiguration in object-oriented programs. Rather than requiring each object to instantiate its dependencies when it's created, all dependencies are passed into the constructor. The dependency injection framework is responsible for creating objects and their dependencies on request. This is usually done using reflection or in auto-generated code, so there's relatively little boilerplate that needs to be written.

The great thing about dependency injection is that it's fairly easy to reconfigure what class is instantiated when an object of some interface is needed. For example, if we have some shared code that uses SQLite on the client and MySQL on the server, we can have that code depend on a common interface, and have the dependency injection system provide the correct implementation based on whether we're running in a client or server configuration.

This reconfiguration is also useful for testing. You can inject a fake implementation of a dependency in order to isolate the module that depends on it.

This brings me to mocking. Mocking is the construction of fake objects for testing. The sole purpose of mocking is isolation. There are a lot of mocking tools like Mockito that can do this dynamically with minimal boilerplate. Basically, you say what class you want to mock, and the tool will dynamically create a subclass with stub methods overriding all the real methods. You can instruct mock methods to return certain values when given certain inputs or after being called a certain number of times. You can also check that methods were called, possibly in a particular order.

Mocking frameworks can cover most of the common cases, but if you have a class which requires highly structured input (i.e., difficult to construct in a test), returns highly structured output, or has a lot of state, mocking may be a lot of work. If the module being tested is tightly coupled with a class like this, it may not be practical to isolate them, since any little change will break the test. Consider a redesign in cases like this. It may save you trouble later on.

One related problem you may encounter is a tight coupling with the framework you program is built with. For example, if you're writing an Android app, you may have a lot of activities and fragments, and it's hard to test related modules without building the whole app and installing on a device or emulator. In this case, consider a framework mocking tool like Robolectric, which mocks the entire Android API. It delivers lifecycle callbacks like onCreate, onResume in a consistent way, but it still allows you to run your tests quickly on your development machine without a huge amount of setup.

If for whatever reason you can't use a mocking framework for you test, you will likely need to run in a virtual machine or emulator, which effectively mocks an entire computer. If your test is running in an emulator, you're free to cause a lot more side effects (displaying UI on the screen, writing to the file system, etc.) because you can clean it all up later just by deleting the instance of the emulator. This can be the only practical way to writing large-scale integration and end-to-end tests. However, there is a major drawback: emulators take a long time to set up and tear down, which means your tests will take a long time to run. It may not be practical to run tests on every commit if testing takes hours, so consider a continuous integration system, which runs these tests on dedicated testing hardware (hopefully lots of it) on the latest commits as often as it can.

Tips to make testing easier

In order to do Test Driven Development right, you need to organize code in a way that makes it easy to test. Code that is easy to test also tends to be easy to maintain. Here are some tips on how to keep a code base testable.

Reduce dependencies when possible. When adding a dependency to a module, think about whether that dependency is really needed. Do these modules really need to interact? Is there a way for them to interact indirectly through some decentralized mechanism?

Try to keep modules loosely coupled. Make it easy to replace a dependency with an alternate implementation. Don't depend on implementation details. Avoid leaky abstractions in module interfaces. Keep interfaces simple; don't have ten methods with a bunch of parameters that need to be called in a particular order.

Reduce statefulness and side effects. Write pure methods (that have no side effects and return a result that depends only on arguments) when possible, and try to keep objects immutable. Modules with no side effects don't need to be isolated when testing other modules that depend on them (but you may still want to). Concurrency is also a lot easier to deal with.

Consider using a dependency injection system. This makes it somewhat easier to manage dependencies. Be careful though: because it's so easy to add new dependencies, you will tend to add a lot of extra dependencies when you don't really need to. Also, consider the different options for dependency injection. Systems implemented using reflection tend to be very flexible but can have a significant run-time overhead. Dagger generates code ahead of time, so it has less overhead, but it's also less flexible.

Keep code deterministic. Tests which usually pass but occasionally fail can be very difficult to debug. Keep your program as deterministic as possible: don't rely on timing, and make sure threads synchronize in a predictable way. Seed any random number generators while testing. Disable animations if you're writing GUI tests.

Have some random tests though. Program code should be as deterministic as possible, but tests don't have to be. It's a good idea to have a set of tests that generate random input. This is called fuzz testing (or monkey testing in the case of random clicks and key presses). It can identify situations that you didn't consider when writing the normal set of tests. Make sure it's possible to save the random input after a crash so it can be replayed and analyzed.

Conclusion

Tests save us all a lot of time. Although they can be somewhat tedious to write, they catch bugs at the time when they are easiest to fix: as soon as they are introduced. This frees up time that would have been spent debugging and gives us confidence to know we can make large changes without breaking everything. Ultimately, testing makes us a lot more productive.