Writing Bazel rules: platforms and toolchains
One of Bazel's biggest strengths is its ability to isolate a build from the host system. This enables reproducible builds and remote execution, which lets Bazel scale to huge projects. This isolation isn't completely automatic though, especially when considering toolchains used to build these projects.
In the previous article, we walked through defining a repository rule which let us download and verify a Go toolchain. This time, we'll walk through the process of configuring our simple set of rules to use that toolchain. After this, our rules will be almost completely independent from the host system. Our users will be able to build Go projects without installing a Go toolchain.
Concepts
Before we get to the actual code, let's go over some platform and toolchain jargon. You may also want to read through the official documentation on Platforms and Toolchains.
A platform is a description of where software can run, defined with the platform
rule. The host platform is where Bazel itself runs. The execution platform is where Bazel actions run. Normally, this is the same as the host platform, but if you're using remote execution, the execution platform may be different. The target platform is where the software you're building should run. By default, this is also the same as the host platform, but if you're cross-compiling, it will be different.
A platform is described by a list of constraint values, defined with the constraint_value
rule. A constraint value is a fact about a platform, for example, that the CPU is x86_64, or the operating system is Linux. There are a number of constraint values defined in the github.com/bazelbuild/platforms
repository, which is automatically declared with the workspace name platforms
. You can list them with bazel query @platforms//...
. You can also define your own.
A constraint setting is a category of constraint values, at most one of which may be true for any platform. A constraint setting may be defined with the constraint_setting
rule. @platforms//os:os
and @platforms//cpu:cpu
are the two main settings to worry about, but again, you can define your own.
A toolchain is a target defined with the toolchain
rule that associates a toolchain implementation with a toolchain type. A toolchain type is target defined with the tooclhain_type
rule, which is a name that identifies a kind of toolchain. A toolchain implementation is a target that represents the actual toolchain by listing the files that are part of the toolchain (for example, the compiler and standard library) and code needed to use the toolchain. A toolchain implementation must return a ToolchainInfo
provider.
So that's a lot to take in. How does it all fit together?
Anyone who's defining a toolchain needs to declare a toolchain_type
target. This is just a name.
The actual toolchains are defined with toolchain
targets that point to implementations. We'll define a go_toolchain
rule for our implementation, but you can use any rule that returns ToolchainInfo
.
A rule can request a toolchain using its type by setting the toolchains
parameter of its rule
declaration. The rule implementation can then access the toolchain through ctx.toolchains
.
Users register toolchains they'd like to use by calling the register_toolchains
function in their WORKSPACE
file or by passing the --extra_toolchains
flag on the command line.
Finally, when Bazel begins a build, it checks the constraints for the execution and target platforms. It then selects a suitable set of toolchains that are compatible with those constraints. Bazel will provide the ToolchainInfo
objects of those toolchains to the rules that request them.
…
Got all that? Actually I'm not sure I do either. It's an elegant system, but it's difficult to grasp. If you want to see how Bazel selects or rejects registered toolchains, use the --toolchain_resolution_debug
flag.
If you've ever used a dependency injection system like Dagger or Guice, Bazel's toolchain system is conceptually similar. A toolchain type is like an interface. A toolchain is like a static method with a @Provides
annotation. A rule that requires a toolchain is like a constructor with an @Inject
annotation. The system automatically finds a suitable implementation for every injected interface.
Migrating rules to toolchains
Let's start using toolchains in rules_go_simple
.
First, we'll declare a toolchain_type
. Rules can request this with the label @rules_go_simple//:toolchain_type
.
toolchain_type( name = "toolchain_type", visibility = ["//visibility:public"], )
Since a toolchain_type
is basically an interface, we should document what can be done with that interface. Starlark is a dynamically typed language, and there's no place to write down required method or field names. I declared a dummy provider in providers.bzl
with some documentation, but you could write this in a README or wherever makes sense for your project.
Next, we'll create our toolchain implementation rule, go_toolchain
.
def _go_toolchain_impl(ctx): # Find important files and paths. go_cmd = None for f in ctx.files.tools: if f.path.endswith("/bin/go") or f.path.endswith("/bin/go.exe"): go_cmd = f break if not go_cmd: fail("could not locate go command") env = {"GOROOT": paths.dirname(paths.dirname(go_cmd.path))} # Generate the package list from the standard library. stdimportcfg = ctx.actions.declare_file(ctx.label.name + ".importcfg") ctx.actions.run( outputs = [stdimportcfg], inputs = ctx.files.tools + ctx.files.std_pkgs, arguments = ["stdimportcfg", "-o", stdimportcfg.path], env = env, executable = ctx.executable.builder, mnemonic = "GoStdImportcfg", ) # Return a TooclhainInfo provider. This is the object that rules get # when they ask for the toolchain. return [platform_common.ToolchainInfo( # Functions that generate actions. Rules may call these. # This is the public interface of the toolchain. compile = go_compile, link = go_link, build_test = go_build_test, # Internal data. Contents may change without notice. # Think of these like private fields in a class. Actions may use these # (they are methods of the class) but rules may not (they are clients). internal = struct( go_cmd = go_cmd, env = env, stdimportcfg = stdimportcfg, builder = ctx.executable.builder, tools = ctx.files.tools, std_pkgs = ctx.files.std_pkgs, ), )] go_toolchain = rule( implementation = _go_toolchain_impl, attrs = { "builder": attr.label( mandatory = True, executable = True, cfg = "host", doc = "Executable that performs most actions", ), "tools": attr.label_list( mandatory = True, doc = "Compiler, linker, and other executables from the Go distribution", ), "std_pkgs": attr.label_list( mandatory = True, doc = "Standard library packages from the Go distribution", ), }, doc = "Gathers functions and file lists needed for a Go toolchain", )
go_toolchain
is a normal rule that returns a ToolchainInfo
provider. When rules request the toolchain, they will get one of these objects. There are no mandatory fields, so you can put anything in here. I included three "methods" (which are actually just functions): compile
, link
, and build_test
. These correspond with the actions our rules need to create, so rules will call these instead of creating actions directly. I also included an internal
struct field, which includes private files and metadata. Our methods may access this struct, but clients of the toolchain should not, since these values can change without notice.
Next, we'll declare a go_toolchain
and a toolchain
, in BUILD.dist.bazel.tpl
. This file is a template that gets expanded into a build file for the go_download
repository rule. See the previous article for details.
# toolchain_impl gathers information about the Go toolchain. # See the GoToolchain provider. go_toolchain( name = "toolchain_impl", builder = ":builder", std_pkgs = [":std_pkgs"], tools = [":tools"], ) # toolchain is a Bazel toolchain that expresses execution and target # constraints for toolchain_impl. This target should be registered by # calling register_toolchains in a WORKSPACE file. toolchain( name = "toolchain", exec_compatible_with = [ {exec_constraints}, ], target_compatible_with = [ {target_constraints}, ], toolchain = ":toolchain_impl", toolchain_type = "@rules_go_simple//:toolchain_type", )
We need to define the {exec_constraints}
and {target_constraints}
template parameters in the go_download
rule. See repo.bzl
.
To complete the toolchain implementation, we'll modify our go_compile
, go_link
, and go_build_test
functions. They can obtain the toolchain using ctx.toolchains
. Here's go_compile
after this change:
def go_compile(ctx, srcs, out, importpath = "", deps = []): """Compiles a single Go package from sources. Args: ctx: analysis context. srcs: list of source Files to be compiled. out: output .a File. importpath: the path other libraries may use to import this package. deps: list of GoLibraryInfo objects for direct dependencies. """ toolchain = ctx.toolchains["@rules_go_simple//:toolchain_type"] args = ctx.actions.args() args.add("compile") args.add("-stdimportcfg", toolchain.internal.stdimportcfg) dep_infos = [d.info for d in deps] args.add_all(dep_infos, before_each = "-arc", map_each = _format_arc) if importpath: args.add("-p", importpath) args.add("-o", out) args.add_all(srcs) inputs = (srcs + [dep.info.archive for dep in deps] + [toolchain.internal.stdimportcfg] + toolchain.internal.tools + toolchain.internal.std_pkgs) ctx.actions.run( outputs = [out], inputs = inputs, executable = toolchain.internal.builder, arguments = [args], env = toolchain.internal.env, mnemonic = "GoCompile", )
Finally, we'll update our rules to request the toolchain and call these functions. Here's go_library
after this change.
def _go_library_impl(ctx): # Load the toolchain. toolchain = ctx.toolchains["@rules_go_simple//:toolchain_type"] # Declare an output file for the library package and compile it from srcs. archive = ctx.actions.declare_file("{name}_/pkg.a".format(name = ctx.label.name)) toolchain.compile( ctx, srcs = ctx.files.srcs, importpath = ctx.attr.importpath, deps = [dep[GoLibraryInfo] for dep in ctx.attr.deps], out = archive, ) # Return the output file and metadata about the library. return [ DefaultInfo( files = depset([archive]), runfiles = ctx.runfiles(collect_data = True), ), GoLibraryInfo( info = struct( importpath = ctx.attr.importpath, archive = archive, ), deps = depset( direct = [dep[GoLibraryInfo].info for dep in ctx.attr.deps], transitive = [dep[GoLibraryInfo].deps for dep in ctx.attr.deps], ), ), ] go_library = rule( _go_library_impl, attrs = { "srcs": attr.label_list( allow_files = [".go"], doc = "Source files to compile", ), "deps": attr.label_list( providers = [GoLibraryInfo], doc = "Direct dependencies of the library", ), "data": attr.label_list( allow_files = True, doc = "Data files available to binaries using this library", ), "importpath": attr.string( mandatory = True, doc = "Name by which the library may be imported", ), }, doc = "Compiles a Go archive from Go sources and dependencies", toolchains = ["@rules_go_simple//:toolchain_type"], )
Using toolchains
Let's check whether this works with a minimal go_binary
rule. Here's our BUILD.bazel
file:
load("@rules_go_simple//:def.bzl", "go_binary") go_binary( name = "hello", srcs = ["hello.go"], )
And here's our WORKSPACE
file. It downloads and registers toolchains for Linux, macOS, and Windows.
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive") http_archive( name = "rules_go_simple", sha256 = "8389d6c120d19a6d23de59dfb8299c7a860dbb24078408ed9b98f726c29a4fc9", urls = ["https://github.com/jayconrod/rules_go_simple/releases/download/5.1.0/rules_go_simple-5.1.0.tar.gz"], ) load("@rules_go_simple//:deps.bzl", "go_download", "go_rules_dependencies") go_rules_dependencies() go_download( name = "go_darwin", goarch = "amd64", goos = "darwin", sha256 = "a9088c44a984c4ba64179619606cc65d9d0cb92988012cfc94fbb29ca09edac7", urls = ["https://dl.google.com/go/go1.13.4.darwin-amd64.tar.gz"], ) go_download( name = "go_linux", goarch = "amd64", goos = "linux", sha256 = "692d17071736f74be04a72a06dab9cac1cd759377bd85316e52b2227604c004c", urls = ["https://dl.google.com/go/go1.13.4.linux-amd64.tar.gz"], ) go_download( name = "go_windows", goarch = "amd64", goos = "windows", sha256 = "ab8b7f7a2a4f7b58720fb2128b32c7471092961ff46a01d9384fb489d8212a0b", urls = ["https://dl.google.com/go/go1.13.4.windows-amd64.zip"], ) register_toolchains( "@go_darwin//:toolchain", "@go_linux//:toolchain", "@go_windows//:toolchain", )
Not to be forgotten, here's our hello.go
source file:
package main import ( "fmt" "runtime" ) func main() { fmt.Printf("Hello from %s %s %s\n", runtime.Version(), runtime.GOOS, runtime.GOARCH) }
We can build with bazel build //:hello
. You can add the -s
flag to print commands and verify that the downloaded toolchain is used.
Conclusion
Platforms and toolchains are a mechanism for decoupling a set of rules from the tools they depend on. This is most immediately useful for isolating the build from the machine it runs on. It also provides flexibility for users: it lets developers (not necessarily rule authors) write their own toolchains. In our case, someone could create a toolchain for gccgo or TinyGo, and it would work with rules_go_simple
as long as it satisfies the interface we documented for our toolchain_type
.
Ultimately, the toolchain system separates what is being built (rules) from how to build it (toolchain). This means when you change one component, you don't need to rewrite all the build files in your repository. Change is isolated, which is important in any system that needs to scale.