Organizing Bazel WORKSPACE files
Every Bazel project has a WORKSPACE
file in its root directory. WORKSPACE
has several functions, but its main purpose is to declare external dependencies using repository rules. WORKSPACE
files are syntactically similar to BUILD
files used to define targets in the rest of the repository, but they're evaluated very differently.
Declaring a list of dependencies seems straightforward, but when there are a huge number of direct and indirect dependencies declared in various functions, it gets to be a lot to manage. WORSKPACE
has some surprising behavior when the same repository is declared more than once, which doesn't help either.
In this article, I'll explain how WORKSPACE
is evaluated, then I'll give some guidelines for organizing WORKSPACE
files to avoid confusion and ambiguity.
Design issues
Managing the WORKSPACE
file is one of the most difficult parts of using Bazel in my opinion. This stems from three main design issues:
WORKSPACE
files have very little structure. They're essentially Starlark scripts. If they were more strictly declarative, tools could manage them more easily. However, evaluating aWORKSPACE
file executes repository rules, which can run arbitrary commands on the host system. Tools can't do that easily or safely.WORKSPACE
files in external repositories are not evaluated recursively, so you're responsible for declaring not only your direct dependencies but also your indirect dependencies. Bazel doesn't give you tools for listing indirect dependencies or resolving conflicts between multiple declarations.WORKSPACE
files have a surprisingly complicated evaluation model. It's difficult for users to readWORKSPACE
and predict what version of each dependency will be actually be used.
These design issues date back to when Bazel was open sourced. WORKSPACE
hasn't changed much since then. There have been a number of attempts to improve Bazel's external dependency management, but it's an inherently difficult problem, and people tend to underestimate how much effort it will take to fix it.
How Bazel evaluates a WORKSPACE
file
I first tried to answer this on StackOverflow. The official documentation explains the semantics to a degree, but it's light on details. So below is my understanding of how this works.
A WORKSPACE
file is essentially a list of load
statements, repository declarations, and function calls. Bazel evaluates the file line-by-line.
A repository declaration is a call to a repository rule like http_archive
or go_repository
. Each repository has a name and some information on how to fetch it like URLs and SHA-256 sums. Repository rules are evaluated lazily: at the point where a repository is declared, the repository rule's code isn't actually executed.
A repository is fetched (meaning its repository rule is executed) the first time a file is loaded from it. Several things can cause this while WORKSPACE
is being evaluated:
- A
load
statement that mentions a.bzl
file in the repository is evaluated. Theload
statement might appear inWORKSPACE
or in another.bzl
file loaded fromWORKSPACE
. - A different repository rule is fetched, and that repository's declaration has an attribute that refers to a file. When a repository is fetched, the labels in its attributes are resolved to files, which may cause other repositories to be fetched. Labels may be part of explicit arguments, or they may be default values for attributes.
- A different repository rule could use
ctx.path
to dynamically resolve a label.
The important thing to understand is that a repository isn't fetched until a label mentioning that repository is resolved to a file. It's difficult to be sure about when that happens because there are several cases where it happens implicitly within repository rule implementations.
This leads to the most confusing aspect of WORKSPACE
evaluation:
A repository may be declared with the same name multiple times without error. This does not create multiple instances of the repository. When a repository is fetched, the latest declaration wins. After a repository is fetched, all following declarations are silently ignored.
It's difficult to determine when a repository is fetched, so to avoid ambiguity, you should ensure each repository is declared only once.
How to organize a WORKSPACE
file
Now we come to the practical advice. I recommend organizing statements in WORKSPACE
files in the following order:
workspace
declaration. This must appear before all other calls.load
statements forhttp_archive
,git_repository
, and repository rules defined in the main workspace. These symbols are needed in the rest of the file, so they must be loaded near the top.- Declarations for dependencies that provide repository rules needed later. For example,
bazel_gazelle
is needed forgo_repository
. - Declarations for direct dependencies. These may appear in the
WORKSPACE
file itself, or you might load and call a function from a .bzl file somewhere in your workspace. - Declarations for indirect dependencies. To declare these, you'll usually load and call functions from your direct dependencies. Check that these functions won't override your direct dependencies (see below).
Many projects declare indirect dependencies before direct dependencies (reversing 4 and 5 above). This causes problems because it limits your ability to depend on a specific version of a direct dependency. If a repository is declared by a function provided by one of your dependencies, that declaration may or may not override a later (direct) declaration. Your direct declaration will be silently ignored if the repository is fetched first.
Providing dependencies for other projects
If your project can be built with Bazel, and other projects can depend on it, you should provide a function that declares your direct and indirect dependencies so that other projects can declare them without knowing the details. Let's look at the function from @com_google_protobuf//:protobuf_deps.bzl
as an example:
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive") def protobuf_deps(): """Loads common dependencies needed to compile the protobuf library.""" if not native.existing_rule("bazel_skylib"): http_archive( name = "bazel_skylib", sha256 = "97e70364e9249702246c0e9444bccdc4b847bed1eb03c5a3ece4f83dfe6abc44", urls = [ "https://mirror.bazel.build/github.com/bazelbuild/bazel-skylib/releases/download/1.0.2/bazel-skylib-1.0.2.tar.gz", "https://github.com/bazelbuild/bazel-skylib/releases/download/1.0.2/bazel-skylib-1.0.2.tar.gz", ], ) if not native.existing_rule("zlib"): http_archive( name = "zlib", build_file = "@com_google_protobuf//:third_party/zlib.BUILD", sha256 = "629380c90a77b964d896ed37163f5c3a34f6e6d897311f1df2a7016355c45eff", strip_prefix = "zlib-1.2.11", urls = ["https://github.com/madler/zlib/archive/v1.2.11.tar.gz"], ) # Many more dependencies after this
There are several good lessons to learn from this file:
- Name the file
deps.bzl
or something similar (protobuf_deps.bzl
in this case), and put it in the root directory of the repository so it's easy to find. - Keep the file simple. Avoid loading other .bzl files, since that forces those repositories to be declared earlier.
- Don't override earlier declarations of the same repositories. You can check whether a dependency has been declared by calling
native.existing_rule
with its name, as above.
You may want to define a small function like this:
def _maybe(rule, name, **kwargs): if not native.existing_rule(name): rule(name = name, **kwargs)
Then you can declare dependencies like this:
_maybe( http_archive, name = "zlib", build_file = "@com_google_protobuf//:third_party/zlib.BUILD", sha256 = "629380c90a77b964d896ed37163f5c3a34f6e6d897311f1df2a7016355c45eff", strip_prefix = "zlib-1.2.11", urls = ["https://github.com/madler/zlib/archive/v1.2.11.tar.gz"], )
Conclusion
I'll wrap this up by saying dependency management is inherently complicated. When you depend on another project, you're trusting its authors to deliver something that performs well and is free of bugs and security vulnerabilities. Dependencies are often necessary; after all, we don't want to write our own crypto libraries. But taking on a dependency is dangerous, and it should be done consciously and carefully. For further reading, I strongly recommend Russ Cox's Our Software Dependency Problem.