Writing Bazel rules: moving logic to execution
In this article, we're going to expand the capabilities of our Bazel rule set
by adding new logic to the execution phase. For those following along at home,
the new code will be on
the v4
branch
of github.com/jayconrod/rules_go_simple.
You may recall that Bazel operates in three phases:
- Loading phase: Bazel reads and evaluates the build files that define the targets the user asked it to build. It recursively loads the build files for the targets' dependencies. This happens on the machine where Bazel is invoked. The target graph is cached in memory.
- Analysis phase: Bazel evaluates rule implementation functions for the targets and their dependencies. Rule implementation functions declare files to generate, actions to generate them, and inputs for those actions. As with the loading phase, this work happens on the machine where Bazel is invoked, and the action graph is cached in memory.
- Execution phase: Bazel determines which files are out of date and executes actions needed to produce them. Actions may be executed locally in a sandbox or remotely using an execution service. The output files are cached, either locally or on a remote caching service.
Rule set authors can write code for any of these phases, and it's often possible to solve a problem multiple ways. However, the execution phase has many advantages, and you should prefer to implement rule logic there if at all possible. Execution code has I/O access to source files, so it can be smarter. It can be written in any language, so it can be faster and more flexible than Starlark. Work can be distributed across many machines, and the results can be cached persistently, so it can be faster for everyone.
The plan
In this article, we'll add several new features to rules_go_simple.
- A
go_test
rule will build Go tests. This requires parsing the test sources and generating code for themain
package, and there's no general-purpose tool that does that. We'll write our own. - We'll support
Go build
constraints, which let us filter platform-specific source files using
comments like
//go:build linux
. - We'll rewrite our
importcfg
generating code in Go. [EDIT 2023-10-14: in earlier versions of this series, this article was the first that usedimportcfg
files. Go stopped shipping pre-compiled standard library files, so we needed to use them earlier.]
In order to implement these features, we need to be able to read source files, so we'll do all the work in a new "builder" binary written in Go that runs in the execution phase. All our actions will be executed through the builder binary.
To compile and link the builder, we'll define a new internal
rule, go_tool_binary
. This is necessary
since go_binary
will depend on the builder; we can't use the
builder to build itself.
Finally, we'll update go_binary
and go_library
and
introduce go_test
, all of which will depend on the builder.
The builder binary
The builder binary contains quite a bit of logic, most of which is specific to the way Go programs are built, so I won't go into much detail here. If you're curious, you can find the source code in //internal/builder.
builder.go
is where the main
function is defined. main
checks
the first command line argument (a "verb") and calls a function based on
that. The verb may
be compile
, link
, or test
.
The compile
action compiles a list of source
files into a package file. First, we filter out source files intended to be
compiled for different platforms using Go build constraints. Go has a
standard package for this, so we don't have to write a parser or anything
like that. Second, we build an importcfg file by combining information from
the standard importcfg file and the direct dependencies of the package
being compiled (from the deps
attribute). Finally,
we invoke the Go compiler.
The link
action links an executable from a set of compiled
archive files. This is pretty simple: we build an importcfg file with
information about every archive file that may be needed, then we invoke the Go
linker on the main package file.
The test
action is more complicated. Go tests are expected to
appear in files with the suffix _test.go
. These files may be
compiled together with the library being tested (giving them access to private
symbols) or they may be compiled separately. A "test main" source file is
generated, which is responsible for initializing the test framework and
calling each test function. The test main file is compiled into a third
archive. The test
action sorts all this out, building importcfg
files and invoking the compiler and linker as needed.
What are all these importcfg files you ask? The Go compiler and linker
accept a file that maps Go import paths to package files. We need to build
an importcfg file every time we run one of those commands. We can generate
it pretty easily inside our builder binary, but we actually need an
importcfg for the entire standard library in order to compile and link
the builder binary. We generate that with a fancy go list
command inside go_stdlib
, which go_tool_binary
depends on. This is all pretty Go-specific, so don't worry about it
too much.
Internal rules
In order to compile and link the builder binary, we'll define a new
rule, go_tool_binary
,
in rules.bzl.
This rule compiles and links a small binary. It doesn't support dependencies
outside of the standard library or build constraints, but that's fine for
small tools.
def _go_tool_binary_impl(ctx): # Declare the output executable file. executable_path = "{name}_/{name}".format(name = ctx.label.name) executable = ctx.actions.declare_file(executable_path) go_build_tool( ctx, srcs = ctx.files.srcs, stdlib = ctx.attr._stdlib[GoStdLibInfo], out = executable, ) return [DefaultInfo( files = depset([executable]), executable = executable, )] go_tool_binary = rule( implementation = _go_tool_binary_impl, attrs = { "srcs": attr.label_list( allow_files = [".go"], mandatory = True, doc = "Source files to compile for the main package of this binary", ), "_stdlib": attr.label( default = "//internal:stdlib", providers = [GoStdLibInfo], doc = "Hidden dependency on the Go standard library", ), }, doc = "...", executable = True, )
As we did with go_binary
, we've set
executable = True
here. Bazel will require
go_tool_binary
to produce an executable.
You might notice the _stdlib
attribute. This is a hidden
dependency (starting with _
and having a default
value means hidden) on a rule that compiles the Go standard library.
This actually shows up in earlier articles, but we haven't really talked
about it because it was a late addition: a change to the code necessitated
by the change in Go, long after these articles were first published.
I won't say much more at the risk of going off into Go-specific weeds,
but suffice to say, this is anotehr rule that compiles the standard library
and produces an importcfg file for all packages. It returns some metadata
through the GoStdLibInfo
provider. go_tool_binary
depends on //internal:stdlib
becuase the builder needs to
import packages in the standard library and needs them to already be
compiled.
Also as with earlier rules, we define the function that creates actions separately. This is good practice: it allows groups of actions to be composable in a way that rules are not.
def go_build_tool(ctx, *, srcs, stdlib, out): command = """ set -o errexit export GOPATH=/dev/null # suppress warning go tool compile -o {out}.a -p main -importcfg {stdlib_importcfg} -- {srcs} go tool link -o {out} -importcfg {stdlib_importcfg} -- {out}.a """.format( out = shell.quote(out.path), stdlib_importcfg = shell.quote(stdlib.importcfg.path), srcs = " ".join([shell.quote(src.path) for src in srcs]), ) inputs = depset( direct = srcs, transitive = [stdlib.files], ) ctx.actions.run_shell( outputs = [out], inputs = inputs, command = command, mnemonic = "GoToolBuild", use_default_shell_env = True, )
Finally, we'll write //internal/builder:BUILD.bazel, which the single instance of our new rule.
load("//internal:rules.bzl", "go_tool_binary") go_tool_binary( name = "builder", srcs = [ "builder.go", "compile.go", "flags.go", "importcfg.go", "link.go", "sourceinfo.go", "test.go", ], visibility = ["//visibility:public"], )
Using the builder
Since we've done all the complicated stuff in the builder,
our go_binary
, go_library
, and go_test
rules should be relatively simple. They just declare actions that execute the
builder and pass in the necessary command-line arguments.
Here's the definition for go_test
in rules.bzl. The
definitions of go_binary
and go_library
are similar,
so I won't show them here.
ggo_test = rule( implementation = _go_test_impl, attrs = { "srcs": attr.label_list( allow_files = [".go"], doc = ("Source files to compile for this test. " + "May be a mix of internal and external tests."), ), "deps": attr.label_list( providers = [GoLibraryInfo], doc = "Direct dependencies of the test", ), "data": attr.label_list( allow_files = True, doc = "Data files available to this test", ), "importpath": attr.string( default = "", doc = "Name by which test archives may be imported (optional)", ), "_builder": attr.label( default = "//internal/builder", executable = True, cfg = "exec", ), "_stdlib": attr.label( default = "//internal:stdlib", providers = [GoStdLibInfo], doc = "Hidden dependency on the Go standard library", ), }, doc = """Compiles and links a Go test executable. Functions with names starting with "Test" in files with names ending in "_test.go" will be called using the go "testing" framework.""", test = True, )
Note that we need implicit dependencies on _builder
and _stdlib
. The _builder
dependency has
cfg = "exec"
set, which means Bazel will build it for the
execution platform, which might be different than the target platform
if we were cross-compiling. Also, since go_test
builds tests
that Bazel can execute, we need to set test = True
.
In the implementation, we just call go_build_test
, which does
the heavy lifting:
def _go_test_impl(ctx): executable_path = "{name}_/{name}".format(name = ctx.label.name) executable = ctx.actions.declare_file(executable_path) go_build_test( ctx, importpath = ctx.attr.importpath, srcs = ctx.files.srcs, stdlib = ctx.attr._stdlib[GoStdLibInfo], deps = [dep[GoLibraryInfo] for dep in ctx.attr.deps], out = executable, rundir = ctx.label.package, ) runfiles = _collect_runfiles( ctx, direct_files = ctx.files.data, indirect_targets = ctx.attr.data + ctx.attr.deps, ) return [DefaultInfo( files = depset([executable]), runfiles = runfiles, executable = executable, )]
We define go_build_test
in actions.bzl.
def go_build_test(ctx, *, importpath, srcs, stdlib, deps, out, rundir): """Compiles and links a Go test executable. Args: ctx: analysis context. importpath: import path of the internal test archive. srcs: list of source Files to be compiled. stdlib: a GoStdLibInfo provider for the standard library. deps: list of GoLibraryInfo objects for direct dependencies. out: output executable file. rundir: directory the test should change to before executing. """ direct_dep_infos = [d.info for d in deps] transitive_dep_infos = depset(transitive = [d.deps for d in deps]).to_list() inputs = (srcs + stdlib.files.to_list() + [d.archive for d in direct_dep_infos] + [d.archive for d in transitive_dep_infos]) args = ctx.actions.args() args.add("test") args.add("-stdimportcfg", stdlib.importcfg) args.add_all(direct_dep_infos, before_each = "-direct", map_each = _format_arc) args.add_all(transitive_dep_infos, before_each = "-transitive", map_each = _format_arc) if rundir != "": args.add("-dir", rundir) if importpath != "": args.add("-p", importpath) args.add("-o", out) args.add_all(srcs) ctx.actions.run( outputs = [out], inputs = inputs, executable = ctx.executable._builder, arguments = [args], mnemonic = "GoTest", use_default_shell_env = True, )
This is the first time we've declared an action with
ctx.actions.run
instead of
ctx.actions.run_shell
.
The usage of these two functions is quite different, though they take many of
the same arguments. run
tells Bazel to invoke a command directly
without interpreting it through the shell. This is more efficient, less
error-prone (no need for quoting), and less OS-specific, so you should
prefer run
over run_shell
whenever possible.
In our case, we still need run_shell
to build the standard
library and bootstrap our builder binary, but we'll avoid it for everything
else.
We are
using Args
(obtained
from ctx.actions.args
)
to build our argument list. There are a couple advantages to
using Args
instead of building a list of strings. First, there
are several conveniences: you don't have to convert Files
to
strings, and there are useful facilities for formatting lists of options.
Second, if you pass a directory (created with
ctx.actions.declare_directory
to Args.add_all
,
the files within the directory will be expanded as individual arguments
(assuming expand_directories = True
, which it is by default).
Third, if you pass a depset
to Args.add_all
,
Bazel won't iterate the depset
unless the action is actually
executed. This can improve Bazel's performance for builds with very large
dependency graphs. If an action is cached, there's no need to construct
its command line.
Conclusion
That pretty much wraps it up. Again, I tried to focus on the Bazel rule infrastructure here, so if you're looking for Go-specific details, check out the code in the builder directory. You can see some examples of these Go rules being used in //tests.
To summarize, a good set of Bazel rules should do as much work in the execution phase as possible. The purpose of the analysis phase is to declare files and actions. Rules written in Starlark should do just that with minimal logic. If you notice rules declaring unnecessary files (e.g., internal temporary files) or unnecessary actions (e.g., multiple actions that always execute together), try to consolidate. Simplifying Starlark rules will speed up analysis and will leverage remote caching and execution.
A note on rules_go
rules_go does not currently follow the advice above. It has very complicated logic in some places, especially around cgo (handling C and Go code mixed together, compiled separately). This is mostly a consequence of us trying to implement things that Bazel didn't quite support yet (e.g., C compilation from Starlark rules). Keep this in mind if you're using rules_go as a template for more advanced rule sets.
I'm hoping to get it into better shape in the near future. rules_go_simple is not only a useful example for this blog; it's a prototype for changes I want to make in rules_go in the future.