Writing Bazel rules: moving logic to execution
In this article, we're going to expand the capabilities of our Bazel rule set by adding new logic to the execution phase. For those following along at home, the new code will be on the v4 branch of rules_go_simple.
You may recall that Bazel operates in three phases:
- Loading phase: Bazel reads and evaluates the build files that define the targets the user asked it to build. It recursively loads the build files for the targets' dependencies. This happens on the machine where Bazel is invoked. The target graph is cached in memory.
- Analysis phase: Bazel evaluates rule implementation functions for the targets and their dependencies. Rule implementation functions declare files to generate, actions to generate them, and inputs for those actions. As with the loading phase, this work happens on the machine where Bazel is invoked, and the action graph is cached in memory.
- Execution phase: Bazel determines which files are out of date and executes actions needed to produce them. Actions may be executed locally in a sandbox or remotely using an execution service. The output files are cached, either locally or on a remote caching service.
Rule set authors can write code for any of these phases, and it's often possible to solve a problem multiple ways. However, the execution phase has many advantages. Execution code has full I/O access to source files, so it can be smarter. It can be written in any language, so it can be faster and more flexible than Starlark. Work can be distributed across many machines, and the results can be cached persistently, so it can be faster for a large team of developers.
Generally, you should keep your Starlark code (loading and analysis phase) as simple as you can and push complexity to the execution phase as much as possible.
The plan
In this article, we'll add several new features to rules_go_simple.
- A
go_test
rule will build Go tests. This requires parsing the test sources and generating code for themain
package. There's no general-purpose tool that does that, so we'll write our own. - We'll support Go build constraints, which let us filter platform-specific source files using comments like
//go:build linux
. - We'll rewrite our importcfg generating code in Go. The compiler and linker use these files to map import strings to compiled package files. Generating them in Bash is annoying, and we prefer to use a more powerful language.
To implement these features, we need to be able to read source files, so we'll do all the work in a new "builder" binary written in Go that runs in the execution phase. All our actions will be executed through the builder binary. We could implement each action with a separate binary, but it's often better to have one. From the user's perspective, it's faster to build one binary, and it doesn't change often enough to make partial cache invalidation a serious consideration.
To compile and link the builder, we'll define a new internal rule, go_tool_binary
. This is necessary since go_binary
will depend on the builder; we can't use the builder to build itself.
Finally, we'll update go_binary
and go_library
and introduce go_test
, all of which will depend on the builder.
The builder binary
The builder binary contains quite a bit of logic, most of which is specific to the way Go programs are built, so I won't go into much detail here. If you're curious, you can find the source code in //internal/builder.
builder.go is where the main
function is defined. main
checks the first command line argument (a "verb") and calls a function based on that. The verb may be compile
, link
, or test
.
The compile
command compiles a list of source files into a package file. First, we filter out source files intended to be compiled for different platforms using Go build constraints. Go has a standard package for this, so we don't have to write a parser or anything like that. Second, we write an importcfg file by listing the packages in the standard library and adding the direct dependencies of the package being compiled (from the deps
attribute). Finally, we invoke the Go compiler.
The link
command links an executable from a set of compiled archive files. This is pretty simple: we write an importcfg file with information about every archive file transitively imported from the main package, then we invoke the Go linker on the main package file.
The test
command is more complicated. Go tests are expected to appear in files with the suffix _test.go
. These files may be compiled together with the library being tested (giving them access to private symbols) or they may be compiled separately. A "test main" source file is generated, which is responsible for initializing the test framework and calling each test function. The test main file is compiled into a third archive. The test
action sorts all this out, writing importcfg files and invoking the compiler and linker as needed.
Internal rules
In order to compile and link the builder binary, we add our new rule, go_tool_binary
, in rules.bzl. This rule compiles and links a small binary. It doesn't support dependencies outside of the standard library or build constraints, but that's fine for small tools.
def _go_tool_binary_impl(ctx):
# Declare the output executable file.
executable = ctx.actions.declare_file(ctx.label.name)
# List other input files needed.
stdlib_dir = ctx.file._stdlib
inputs = [stdlib_dir] + ctx.files.srcs
# Run the script to compile and link the binary. The order of arguments
# is important!
arguments = ([executable.path, "go", stdlib_dir.path] +
[src.path for src in ctx.files.srcs])
ctx.actions.run(
mnemonic = "GoToolBinary",
executable = ctx.executable._script,
arguments = arguments,
inputs = inputs,
outputs = [executable],
use_default_shell_env = True,
)
return [DefaultInfo(
files = depset([executable]),
executable = executable,
)]
go_tool_binary = rule(
implementation = _go_tool_binary_impl,
attrs = {
"srcs": attr.label_list(
allow_files = [".go"],
mandatory = True,
doc = "Source files to compile for the main package of this binary",
),
"_stdlib": attr.label(
default = "//internal:stdlib",
allow_single_file = True,
doc = "Hidden dependency on the Go standard library",
),
"_script": attr.label(
allow_single_file = True,
executable = True,
cfg = "exec",
default = ":tool_binary.sh",
doc = "Script that compiles and links a builder binary",
),
},
doc = """Builds an executable program for the Go toolchain.
go_tool_binary is a simple version of go_binary. It is separate from go_binary
because go_binary depends on the Go toolchain, and the toolchain uses a binary
built with this rule to do most of its work.
This rule does not support dependencies or build constraints. All source files
will be compiled, and they may only depend on the standard library.
""",
executable = True,
)
As we did with go_binary
, we've set executable = True
here. Bazel will require go_tool_binary
to produce an executable.
You might notice the _stdlib
attribute. This also appears on go_binary
and go_library
. An attribute whose name starts with _
is hidden. It must have a default
value and cannot be set explicitly in a build file. In this case, _stdlib
points to //internal:stdlib
, a target that uses the go_stdlib
rule to compile the Go standard library. This rule wasn't originally part of this tutorial series, but it became necessary after Go 1.20 because the precompiled standard library is no longer included in distribution archives, so Bazel rules can't assume the location of compiled files. go_stdlib
is implemented using a Bash script and is an implicit dependency of all other rules that compile and link Go code.
go_tool_binary
is also implemented using a Bash script, invoked with ctx.actions.run
instead of ctx.actions.run_shell
. For brevity, I won't show the full script here, but note that since go_tool_binary
is used outside the //internal
package, we needed to add this snippet to internal/BUILD.bazel
:
# An exports_files declaration makes this source file available in other
# packages. It's an implicit dependency of go_tool_binary.
exports_files(
["tool_binary.sh"],
visibility = ["//visibility:public"],
)
Finally, we'll write //internal/builder:BUILD.bazel, which contains the single instance of our new rule.
load("//internal:rules.bzl", "go_tool_binary")
# builder is a tool used to perform various tasks related to building Go code,
# such as compiling packages, linking executables, and generating
# test sources.
go_tool_binary(
name = "builder",
srcs = [
"builder.go",
"compile.go",
"flags.go",
"importcfg.go",
"link.go",
"sourceinfo.go",
"test.go",
],
visibility = ["//visibility:public"],
)
Using the builder
Since we've done all the complicated stuff in the builder, our go_binary
, go_library
, and go_test
rules can be simplified. They just declare actions that run builder commands, passing in the necessary command-line arguments.
Here's the definition for go_test
in rules.bzl. The definitions of go_binary
and go_library
are similar, so I won't show them here.
go_test = rule(
implementation = _go_test_impl,
attrs = {
"srcs": attr.label_list(
allow_files = [".go"],
doc = ("Source files to compile for this test. " +
"May be a mix of internal and external tests."),
),
"deps": attr.label_list(
providers = [GoLibraryInfo],
doc = "Direct dependencies of the test",
),
"data": attr.label_list(
allow_files = True,
doc = "Data files available to this test",
),
"importpath": attr.string(
default = "",
doc = "Name by which test archives may be imported (optional)",
),
"_builder": attr.label(
default = "//internal/builder",
executable = True,
cfg = "exec",
),
"_stdlib": attr.label(
default = "//internal:stdlib",
allow_single_file = True,
doc = "Hidden dependency on the Go standard library",
),
},
doc = """Compiles and links a Go test executable. Functions with names
starting with "Test" in files with names ending in "_test.go" will be called
using the go "testing" framework.""",
test = True,
)
We need implicit dependencies on _builder
and _stdlib
. The _builder
dependency has cfg = "exec"
set, which means Bazel will build it for the execution platform, which might be different than the target platform if we were cross-compiling. Also, since go_test
builds tests that Bazel can execute, we need to set test = True
. This implies executable = True
, so we don't need to set that separately.
In the implementation, we just call the go_build_test
function, which does the heavy lifting:
def _go_test_impl(ctx):
executable = ctx.actions.declare_file(ctx.label.name)
go_build_test(
ctx,
importpath = ctx.attr.importpath,
srcs = ctx.files.srcs,
stdlib = ctx.file._stdlib,
builder = ctx.executable._builder,
deps = [dep[GoLibraryInfo] for dep in ctx.attr.deps],
out = executable,
rundir = ctx.label.package,
)
runfiles = _collect_runfiles(
ctx,
direct_files = ctx.files.data,
indirect_targets = ctx.attr.data + ctx.attr.deps,
)
return [DefaultInfo(
files = depset([executable]),
runfiles = runfiles,
executable = executable,
)]
We define go_build_test
in actions.bzl alongside go_compile
and go_link
.
def go_build_test(ctx, *, srcs, stdlib, builder, deps, rundir, importpath, out):
"""Compiles and links a Go test executable.
Args:
ctx: analysis context.
srcs: list of source Files to be compiled.
stdlib: a File for the compiled standard library directory.
builder: an executable File for the builder tool.
deps: list of GoLibraryInfo objects for direct dependencies.
importpath: import path of the internal test archive.
rundir: directory the test should change to before executing.
out: output executable file.
"""
direct_dep_infos = [d.info for d in deps]
transitive_dep_infos = depset(transitive = [d.deps for d in deps]).to_list()
inputs = (srcs +
[stdlib] +
[d.archive for d in direct_dep_infos] +
[d.archive for d in transitive_dep_infos])
args = ctx.actions.args()
args.add("test")
args.add("-stdlib", stdlib.path)
args.add_all(direct_dep_infos, before_each = "-direct", map_each = _format_arc)
args.add_all(transitive_dep_infos, before_each = "-transitive", map_each = _format_arc)
if rundir != "":
args.add("-dir", rundir)
if importpath != "":
args.add("-p", importpath)
args.add("-o", out)
args.add_all(srcs)
ctx.actions.run(
outputs = [out],
inputs = inputs,
executable = builder,
arguments = [args],
use_default_shell_env = True,
mnemonic = "GoTest",
)
As with go_tool_binary
, we're using ctx.actions.run
instead of ctx.actions.run_shell
. These functions take many of the same arguments. run
tells Bazel to invoke a command directly, while run_shell
tells Bazel to invoke a shell command with sh -c
. run
is more efficient, less error-prone (no need for quoting), and less OS-specific, so you should prefer run
over run_shell
whenever possible.
We are using Args
(obtained from ctx.actions.args
) to build our argument list. There are a couple advantages to using Args
instead of constructing a list of strings. First, it's more convenient: you don't have to convert Files
to strings, and there are useful facilities for formatting lists of options. Second, if you pass a directory (created with ctx.actions.declare_directory
) to Args.add_all
, the files within the directory are expanded as individual arguments (assuming expand_directories = True
, which it is by default). Third, if you pass a depset
to Args.add_all
, Bazel won't iterate the depset
unless the action is actually executed. This can improve Bazel's performance for builds with very large dependency graphs. If an action is cached, there's no need to construct its command line.
Conclusion
That pretty much wraps it up. Again, I tried to focus on the Bazel rule infrastructure here, so if you're looking for Go-specific details, check out the code in the builder directory. You can see some examples of these Go rules being used in //tests.
To summarize, Bazel rules should do as much work in the execution phase as possible. The purpose of the analysis phase is to declare files and actions. Rules written in Starlark should do just that with minimal logic. If you notice rules declaring unnecessary files (like internal temporary files) or unnecessary actions (like multiple actions that always execute together), try to consolidate. Simplifying Starlark rules will speed up analysis and will better leverage parallelism with remote caching and execution.