Integration tests are best tests

Published on 2026-01-15
Tagged: testing

I'd like to share an unpopular opinion about testing: integration tests are the most important kind of tests: you should strive for excellent integration test coverage and invest relatively little time in unit tests. I've believed this since I worked on cmd/go, Go's build tool. We had a really excellent set of integration tests that were easy to write, quick to run, and reliable and finding regressions.

This goes against the conventional wisdom that you should have thorough unit test coverage, moderate integration test coverage, and a little end-to-end coverage. This is usually visualized as The Testing Pyramid:

Traditional testing pyramid diagram with a large bottom layer of unit tests, a medium middle layer of integration tests, and a small top layer of end-to-end tests

But I think it should look more like this:

Modified testing pyramid diagram with a medium bottom layer of unit tests, a large middle layer of integration tests, and a small top layer of end-to-end tests

Why write tests anyway?

I've actually never liked writing tests. I want to spend as little time as possible writing them. (That's probably a popular opinion, though not often voiced.) But automated tests have an enormous benefit: they tell me whether my code actually works. If it's not tested, it doesn't work. More importantly, tests give me the confidence to change things. I can fix a bug, add a feature, or optimize something without fear, because the tests tell me if I've broken something. So I do write tests, and I often spend more effort on tests than the code being tested because I want that confidence.

Tests may have significant drawbacks, depending on how you write them. If a test is significantly dependent on the code being tested, then you need to fix the test whenever you change the implementation. This is toil. Tightly coupled tests can be such a drag on productivity that they're worse than not having tests at all: they make change harder, not easier.

Why not unit tests?

For the sake of definition, a unit test covers a single component of a system, usually a package, a class, or a single function. To remain focused, a unit test often relies on fakes, mocks, and stubs for the component's dependencies.

Unit tests are the easiest type of test to write, but they give relatively little confidence in the code being tested, especially when they rely heavily on fakes. Fake dependencies behave differently than real implementations, so a thorough unit test can tell us everything is fine when there's actually a serious bug.

Unit tests pass, integration tests fail: two drawers that can't open because handles are in the way

Unit tests pass, integration tests fail.

Even though they're easy to write, unit tests can be the most toilsome tests to maintain. By their nature, they are close to the implementation, so any change to the implementation often requires a larger corresponding change to its test. A small refactoring, like adding a function parameter, can require updates to dozens of call sites in tests. This tight coupling is especially problematic when mocks are used. A test might verify that a mocked dependency's method was called with specific arguments, a specific number of times, in a specific order. A test like this breaks when practically anything changes.

This being said, there are some areas where I like to have thorough unit tests:

Foundational libraries: data structures, string manipulation
Serialization, lexing, parsing
Validation, especially in a security-sensitive context

These kinds of components have well-defined inputs and outputs, few dependencies, and little need to change, so their unit tests give solid benefit and are easy to maintain.

Why not end-to-end tests?

An end-to-end test covers an entire system and its dependencies. For example, an end-to-end test of a web app checks a real deployment in a test environment. The test includes an actual web browser, and cloud services like databases and load balancers.

End-to-end tests give you the best indication of whether a system really works, since they test in the most realistic environment. But they're the most difficult, expensive tests to set up and run. They're often run on a nightly build or as part of release qualification, not as presubmit tests on every change, so they may not give you quick feedback on whether something is broken.

I don't have much to say about end-to-end tests outside the conventional wisdom: you should have some end-to-end coverage, but you can get better bang for your buck elsewhere.

Why integration tests?

An integration test covers several components of a system but may still use fake implementations of external services like databases or LLMs. An integration test interacts with a system through its public interfaces and verifies behavior observed through those interfaces. An integration test is not concerned with internal implementation details of the system under test.

Integration tests are the sweet spot between unit tests and end-to-end tests.

They give confidence about an entire system, not just one component under test. They give the most benefit for the least amount of effort.
Since they can only interact with a system's public interfaces, they're much less tightly coupled with the implementation, so there's less need to adjust tests after changing the implementation.
Although they're inevitably slower than unit tests, they're much faster than end-to-end tests and can usually still be run on a developer's laptop or as presubmit tests, so they still give fast feedback.

Why not integration tests?

So if integration tests are so great, why doesn't everyone write them all the time? Unfortunately in practice, they tend to be more difficult to write than unit tests. A developer tends to write fewer integration tests when most of each test is complicated boilerplate.

To get the most benefit, it's necessary to invest in a test framework that handles the complication of setting up the test environment, wiring components together, building inputs, and verifying outputs. You may spend a substantial amount of time building such a framework: you may need an elaborate testing library, or a mini scripting language. But once in place, the marginal cost of writing a new integration test is very low. Then developers can write lots of tests.

Case study: Go's script tests

At my previous job, I worked on cmd/go, Go's build and dependency management tool (go build, go get, and so on). When we added modules in 2018, they required a substantial rewrite with a lot of new tests, so Russ Cox introduced a new test framework in script_test.go. The team expanded this framework over time after it became evident how useful it was.

Each test is a file in the testdata/script directory. As of this writing, there are nearly 900 tests. Let's look at a small one, run_hello.txt.

env GO111MODULE=off

# hello world
go run hello.go
stderr 'hello world'

-- hello.go --
package main
func main() { println("hello world") }

This test creates a temporary directory with a file named hello.go. It then executes go run hello.go and verfies the output written to stderr matches the regular expression hello world.

This is a basic test, but it would still be tedious to write it without the custom scripting language provided by this framework.

Each test contains a list of files to be extracted into a temporary directory. Files are separated by header lines like -- hello.go --. (You can use golang.org/x/tools/txtar to read or write archives like this.
The top portion of each file is a list of commands to run, one command per line. Several basic commands are supported:
- go, exec: run the go command, or any other executable.
- cat, cd, chmod, cp, exists, mkdir, mv, rm: basic file operations.
- grep, stdout, stderr: check if a file or output stream matches a regular expression.
Conditions like [linux] or [arm64] ensure a command is only run in certain environments.
! before a command means it's expected to fail. ? means it doesn't matter if it fails.
& after a command causes it to run in the background. wait blocks until it finishes.

There's more, but you get the idea. It's a minimal language, perfect for this purpose. We didn't want to rely on a full shell scripting language like Bash, since it would rely on dependencies outside the Go project, so we wrote our own. For example, we didn't want to care whether grep is installed or whether it's the GNU or BSD variant.

Go is fairly self-contained, but it does rely on module proxies, version control services, and the checksum database, so we faked all of those for the integration tests. The fake module proxy's files are in testdata/mod. A module proxy can be a simple file server, but modules are distributed as .zip archives. We found it easier to write test modules as txtar archives instead, so the fake proxy translates them into .zip files on request. It's always a good idea to spend a little more effort on your test framework if it means you can spend a little less effort on each test.

We liked this integration testing framework so much that we used it to write most new tests for cmd/go after it was introduced. We wrote only a few unit tests for specific components like the go.mod parser and the minimal version selection algorithm, but the majority of test coverage comes from integration tests. Our integration tests were really good at verifying behavior and catching regressions. And they were easy to write.

Beyond command line tools

Integration tests are easy to write for command-line tools, so the advice here doesn't apply as easily in other domains. Still, if you invest in a framework that makes integration tests easy to write, they tend to be more effective and cheaper to maintain than other kinds of tests.

I worked on Android apps for a couple years, and we used Robolectric to write integration tests. In a Roboletric test, you can look up views in the DOM by ID and simulate clicks and other interactions. The test framework mocks UI rendering, so tests run quickly and reliably with low resource requirements. We preferred these over end-to-end UI tests, where we installed apps on emulators, took screenshots, and replayed recorded clicks. Emulator tests were hard to write, slow to run, and incredibly flaky.

These days, I work on gRPC backend services. An integration test usually involves setting up a cluster of services on a single machine, sending requests, and verifying responses. Sometimes a test needs to "pierce the veil" and verify some internal state, like whether something got cached in an S3 bucket, but most tests depend very little on the actual implementation. These tests are still a bit hard to write, but improvements to the test library that sets up services and constructs requests have helped a lot.