Writing Bazel rules: simple binary rule
Bazel is an open source build system created by Google. It has a number of strengths that make it a good fit for large projects: distributed build, test, and cache; integrated code generation; support for multiple languages. It also scales extremely well. Bazel is used to build targets in Google's internal monorepo, which contains billions of lines of code. Large targets may include hundreds of thousands of actions, but incremental builds can still complete in seconds.
In this series of articles, I want to focus on one of Bazel's key strengths: the ability to extend the build system to support new languages with extensions written in Starlark. This time, we'll cover writing a simple rule that compiles and links a Go binary from sources. I'll cover libraries, tests, toolchains, and more in future articles.
The learning curve for extending Bazel is steeper than simpler build systems like Make or SCons. Bazel rules are highly structured, and learning this structure takes time. However, this structure helps you avoid introducing unnecessary complication and unexpected dependencies in large, complex builds.
How Bazel works
In each of these articles, I'll cover some of the theory of how Bazel works. Since this is the first article in the series, we'll start with the basics.
Starlark
Starlark is Bazel's configuration and extension language. It's essentially Python without some of the advanced features: Starlark has no classes, exceptions, or generators, and the module system is different. Starlark avoids being Turing complete by forbidding recursion dynamically and only allowing loops over data structures with fixed size. You can find a full list of differences in Bazel's documentation and in the language spec. These limitations prevent the build system from getting too complicated; most of the complexity should be pushed out into tools.
Aside: Starlark can be used on its own outside of Bazel. Facebook's Buck build system also uses Starlark. Alan Donovan gave a talk on Starlark at GothamGo 2017 with an example of using Starlark to configure a web server. He's also published an embeddable Starlark interpreter written in Go.
Repositories, packages, rules, labels
To build things in Bazel, you need to write build files (named BUILD
or BUILD.bazel
). They look like this:
load("@io_bazel_rules_go//go:def.bzl", "go_binary", "go_library", "go_test") go_library( name = "fetch_repo_lib", srcs = [ "fetch_repo.go", "module.go", "vcs.go", ], importpath = "github.com/bazelbuild/bazel-gazelle/cmd/fetch_repo", visibility = ["//visibility:private"], deps = ["@org_golang_x_tools_go_vcs//:vcs"], ) go_binary( name = "fetch_repo", embed = [":fetch_repo_lib"], visibility = ["//visibility:public"], ) go_test( name = "fetch_repo_test", srcs = ["fetch_repo_test.go"], embed = [":fetch_repo_lib"], deps = ["@org_golang_x_tools_go_vcs//:vcs"], )
Build files contain a number of targets, written as Starlark function calls. The syntax is declarative: you say what you want to build, not how to build it. In this example, we're defining a Go library ("fetch_repo_lib"
) with a handful for source files. A binary ("fetch_repo"
) is built from that library. We also have a test ("fetch_repo_test"
) built from that library and an additional source file ("fetch_repo_test.go"
).
Each build file implicitly defines a Bazel package. A package consists of the targets declared in the build file and all of the files in the package's directory and subdirectories, excluding targets and files defined in other packages' subdirectories. Visibility restrictions are usually applied at the package level, and globs (wildcard patterns used to match source files) end at package boundaries. Frequently (not always), you'll have one package per directory.
Targets and files are named using labels, which are strings that look like "@io_bazel_rules_go//go:def.bzl"
. Labels have three parts: a repository name (io_bazel_rules_go
), a package name (go
), and a file or target name (def.bzl
). The repository name and the package name may be omitted when a label refers to something in the same repository or package.
Repositories are defined in a file called WORKSPACE
, which lives in the root directory of a project. I'll get more into repository rules more in a future article. For now, just think of them as git repositories with names.
Loading, analysis, and execution
Bazel builds targets in three phases: loading, analysis, and execution (actually there are more, but these are the phases you need to understand when writing rules).
In the loading phase, Bazel reads and evaluates build files. It builds a graph of targets and dependencies. For example, if you ask to build fetch_repo_test
above, Bazel will build a graph with a fetch_repo_test
node that depends on fetch_repo_test.go
, :fetch_repo_lib
, and @org_golang_x_tools_go_vcs//:vcs
via srcs
, embed
, and deps
edges, respectively.
In the analysis phase, Bazel evaluates rules in the target graph. Rules declare files and actions that will produce those files. The output of analysis is the file-action graph. Bazel has built-in rules for Java, C++, Python, and a few other things. Other rules are implemented in Starlark. It's important to note that rules cannot directly perform any I/O; they merely tell Bazel how it should execute programs to build targets. This means rules can't make any decisions based on the contents of source files (so no automatic dependency discovery).
In the execution phase, Bazel runs actions in the file-action graph needed to produce files that are out of date. Bazel has several strategies for running actions. Locally, it runs actions within a sandbox that only exposes declared inputs. This makes builds more hermetic, since it's harder to accidentally depend on system files that vary from machine to machine. Bazel may also run actions on remote build servers where this isolation happens automatically.
Setting up the repository
Okay, we've gotten all the theory out of the way for today. Let's dive into the code. We're going to write "rules_go_simple
", a simplified version of github.com/bazelbuild/rules_go. Don't worry if you don't know Go — there's not any Go code in here today, and the implementation for other languages will be mostly the same.
I've created an example repository at github.com/jayconrod/rules_go_simple. For this article, we'll be looking at the v1
branch. In later articles, we'll add features to branches with higher version numbers.
The first thing we need is a WORKSPACE
file. Every Bazel project should have one of these in the repository root directory. WORKSPACE
configures external dependencies that we need to build and test our project. In our case, we have one dependency, @bazel_skylib
, which we use to quote strings in shell commands.
Bazel only evaluates the WORKSPACE
file for the current project; WORKSPACE
files of dependencies are ignored. We declare all our dependencies inside a function in deps.bzl
so that projects that depend on rules_go_simple
can share our dependencies.
Here's our WORKSPACE
file.
workspace(name = "rules_go_simple") load("@rules_go_simple//:deps.bzl", "go_rules_dependencies") go_rules_dependencies()
Here's deps.bzl
. Note that the _maybe
function is private (since it starts with _
) and cannot be loaded from other files.
load("@bazel_tools//tools/build_defs/repo:git.bzl", "git_repository") def go_rules_dependencies(): """Declares external repositories that rules_go_simple depends on. This function should be loaded and called from WORKSPACE files.""" # bazel_skylib is a set of libraries that are useful for writing # Bazel rules. We use it to handle quoting arguments in shell commands. _maybe( git_repository, name = "bazel_skylib", remote = "https://github.com/bazelbuild/bazel-skylib", commit = "3fea8cb680f4a53a129f7ebace1a5a4d1e035914", ) def _maybe(rule, name, **kwargs): """Declares an external repository if it hasn't been declared already.""" if name not in native.existing_rules(): rule(name = name, **kwargs)
Note that declaring a repository doesn't automatically download it. Bazel will only download a repository if it needs something inside.
Declaring the go_binary rule
To define our binary rule, we'll create a new file, internal/rules.bzl
. We'll start with a declaration like this:
go_binary = rule( implementation = _go_binary_impl, attrs = { "srcs": attr.label_list( allow_files = [".go"], doc = "Source files to compile for the main package of this binary", ), "_stdlib": attr.label( default = "//internal:stdlib", ), }, doc = "Builds an executable program from Go source code", executable = True, )
You may want to refer to the Bazel documentation for rule
and attr
here. There's a lot here, so let's break it down.
- We are defining a new rule named
go_binary
by assigning the result of therule
function to a variable with that name. go_binary
is implemented in the_go_binary_impl
function (passed as the first argument here), which Bazel will call during the analysis phase for eachgo_binary
target that's part of a build. The implementation function will declare output files and actions.go_binary
has an attribute namedsrcs
, which is alabel_list
.srcs
may be a list of files with names ending in".go"
.- Edit: There's also an attribute named
_stdlib
. This is a hidden attribute (its name starts with_
) that points to a target that builds the Go standard library//internal:stdlib
. This was a late addition to this series due to a change in Go 1.20. Don't worry too much about it unless you want to understand how Go is built. go_binary
must produce an executable file.
Note that all rules support a set of common attributes like name
, visibility
, and tags
. These don't need to be declared explicitly.
Implementing go_binary
Let's look at our implementation function next.
def _go_binary_impl(ctx): # Declare an output file for the main package and compile it from srcs. All # our output files will start with a prefix to avoid conflicting with # other rules. main_archive = ctx.actions.declare_file("{name}_/main.a".format(name = ctx.label.name)) go_compile( ctx, srcs = ctx.files.srcs, stdlib = ctx.files._stdlib, out = main_archive, ) # Declare an output file for the executable and link it. Note that output # files may not have the same name as the rule, so we still need to use the # prefix here. executable_path = "{name}_/{name}".format(name = ctx.label.name) executable = ctx.actions.declare_file(executable_path) go_link( ctx, main = main_archive, stdlib = ctx.files._stdlib, out = executable, ) # Return the DefaultInfo provider. This tells Bazel what files should be # built when someone asks to build a go_binary rule. It also says which # file is executable (in this case, there's only one). return [DefaultInfo( files = depset([executable]), executable = executable, )]
Implementation functions take a single argument, a ctx
object. This provides an API used to access rule attributes and to declare files and actions. It also exposes lots of useful metadata.
The first thing we do here is compile the main
package. (For readers unfamiliar with Go, packages are the compilation unit; multiple .go source files may be compiled into a single .a package file). We declare a main.a
output file using ctx.actions.declare_file
, which returns a File
object. We then call go_compile
to declare the compile action (which we'll get to in just a minute).
Next, we'll link our main.a
into a standalone executable. We declare our executable file, then call go_link
(which we'll also define in just a minute).
Finally, we need to tell Bazel what we've done by returning a list of providers. A provider is a struct
returned by a rule that contains information needed by other rules and by Bazel itself. DefaultInfo
is a special provider that all rules should return. Here, we store two useful pieces of information. files
is a depset
(more on depsets
another time) that lists the files that should be built when another rule depends on our rule or when someone runs bazel build
on our rule. No one cares about the main.a
file, so we just return the binary file here. And executable
points to our executable file. If someone runs bazel run
on our rule, this is the file that gets run.
go_compile
and go_link
actions
I chose to define the go_compile
and go_link
actions in separate functions. They could easily have been inlined in the rule above. However, actions are frequently shared by multiple rules. In future articles, when we define go_library
and go_test
rules, we'll need to compile more packages, and we'll need to link a new kind of binary. We can't call go_binary
from those rules, so it makes sense to pull these actions out into functions in actions.bzl
.
Here's go_compile
:
def go_compile(ctx, *, srcs, stdlib, out): """Compiles a single Go package from sources. Args: ctx: analysis context. srcs: list of source Files to be compiled. stdlib: list containing an importcfg file and a package directory for the standard library. out: output .a file. Should have the importpath as a suffix, for example, library "example.com/foo" should have the path "somedir/example.com/foo.a". """ stdlib_importcfg = stdlib[0] cmd = "go tool compile -o {out} -importcfg {importcfg} -- {srcs}".format( out = shell.quote(out.path), importcfg = shell.quote(stdlib_importcfg.path), srcs = " ".join([shell.quote(src.path) for src in srcs]), ) ctx.actions.run_shell( outputs = [out], inputs = srcs + stdlib, command = cmd, env = {"GOPATH": "/dev/null"}, # suppress warning mnemonic = "GoCompile", use_default_shell_env = True, )
This function builds a Bash command to invoke the compiler, then calls run_shell
to declare an action that runs that command. run_shell
takes our command, a list of input files that will be made available in the sandbox, and a list of output files that Bazel will expect.
Our go_link
function is similar.
def go_link(ctx, *, out, stdlib, main): """Links a Go executable. Args: ctx: analysis context. out: output executable file. stdlib: list containing an importcfg file and a package directory for the standard library. main: archive file for the main package. """ stdlib_importcfg = stdlib[0] cmd = "go tool link -o {out} -importcfg {importcfg} -- {main}".format( out = shell.quote(out.path), importcfg = shell.quote(stdlib_importcfg.path), main = shell.quote(main.path), ) ctx.actions.run_shell( outputs = [out], inputs = [main] + stdlib, command = cmd, env = {"GOPATH": "/dev/null"}, # suppress warning mnemonic = "GoLink", use_default_shell_env = True, )
I wanted to keep this article from getting too absurdly long, so I chose to to keep things simple instead of doing it the Right Way. In general, I'd caution against using any Bash commands in Bazel actions for several reasons. It's hard to write portable commands (macOS has different versions of most shell commands than Linux with different flags; and in Windows you'll probably need to rewrite everything in Powershell). It's hard to get quoting and escaping right (definitely use shell.quote
from @bazel_skylib
). It's hard to avoid including some implicit dependency. Bazel tries to isolate you from this a bit with the sandbox; I had to use use_default_shell_env = True
to be able to find go
on PATH
. We should generally avoid using tools installed on the user's system since they may differ across systems, but again, we're keeping it simple this time.
Instead of writing Bash commands, it's better to compile tools with Bazel and use those. That lets you write more sophisticated (and reproducible) build logic in your language of choice.
Exposing a public interface
It's useful to have declarations for all public symbols in one file. This way, you can refactor your rules without requiring users to update load
statements in their projects. load
statements import a public symbol from another .bzl file into the current file. They also expose that symbol for other files loading the current file. So all we have to do is create one file that loads our public symbols. That's def.bzl.
load("//internal:rules.bzl", _go_binary = "go_binary") go_binary = _go_binary
Edit: In very old versions of Bazel, simply loading a symbol in a .bzl file would make it available for loading in other files. In newer versions, a symbol must be defined in order for it to be loadable. It's still a good practice to put your public definitions in one file, but it takes a little more work. Above, we load the internal go_binary
as _go_binary
, then redefine that as go_binary
.
Testing the go_binary rule
To test go_binary
, we can define a sh_test
rule that runs a go_binary
rule and checks its output. Here's our build file, tests/BUILD.bazel
:
load("//:def.bzl", "go_binary") sh_test( name = "hello_test", srcs = ["hello_test.sh"], args = ["$(location :hello)"], data = [":hello"], ) go_binary( name = "hello", srcs = [ "hello.go", "message.go", ], )Our
go_binary
rule has two sources, hello.go
and message.go
. It just prints "Hello, world!"
.
Our test has a data
dependency on the hello
binary. This means that when the test is run, Bazel will build hello
and make it available. To avoid hardcoding the location of the binary in the test, we pass it in as an argument. See "$(location)"
substitution for how this works.
Here's our test script:
#!/bin/bash set -euo pipefail program="$1" got=$("$program") want="Hello, world!" if [ "$got" != "$want" ]; then cat >&2 <<EOF got: $got want: $want EOF exit 1 fi
You can test this out with bazel test //tests:hello_test
.