Writing Bazel rules: platforms and toolchains

Published on 2019-12-07
Edited on 2020-02-01
Tagged: bazel go

View All Posts

This article is part of the series "Writing Bazel rules".

One of Bazel's biggest strengths is its ability to isolate a build from the host system. This enables reproducible builds and remote execution, which lets Bazel scale to huge projects. This isolation isn't completely automatic though, especially when considering toolchains used to build these projects.

In the previous article, we walked through defining a repository rule which let us download and verify a Go toolchain. This time, we'll walk through the process of configuring our simple set of rules to use that toolchain. After this, our rules will be almost completely independent from the host system. Our users will be able to build Go projects without installing a Go toolchain.

Concepts

Before we get to the actual code, let's go over some platform and toolchain jargon. You may also want to read through the official documentation on Platforms and Toolchains.

A platform is a description of where software can run, defined with the platform rule. The host platform is where Bazel itself runs. The execution platform is where Bazel actions run. Normally, this is the same as the host platform, but if you're using remote execution, the execution platform may be different. The target platform is where the software you're building should run. By default, this is also the same as the host platform, but if you're cross-compiling, it will be different.

A platform is described by a list of constraint values, defined with the constraint_value rule. A constraint value is a fact about a platform, for example, that the CPU is x86_64, or the operating system is Linux. There are a number of constraint values defined in the github.com/bazelbuild/platforms repository, which is automatically declared with the workspace name platforms. You can list them with bazel query @platforms//.... You can also define your own.

A constraint setting is a category of constraint values, at most one of which may be true for any platform. A constraint setting may be defined with the constraint_setting rule. @platforms//os:os and @platforms//cpu:cpu are the two main settings to worry about, but again, you can define your own.

A toolchain is a target defined with the toolchain rule that associates a toolchain implementation with a toolchain type. A toolchain type is target defined with the tooclhain_type rule, which is a name that identifies a kind of toolchain. A toolchain implementation is a target that represents the actual toolchain by listing the files that are part of the toolchain (for example, the compiler and standard library) and code needed to use the toolchain. A toolchain implementation must return a ToolchainInfo provider.

So that's a lot to take in. How does it all fit together?

Anyone who's defining a toolchain needs to declare a toolchain_type target. This is just a name.

The actual toolchains are defined with toolchain targets that point to implementations. We'll define a go_toolchain rule for our implementation, but you can use any rule that returns ToolchainInfo.

A rule can request a toolchain using its type by setting the toolchains parameter of its rule declaration. The rule implementation can then access the toolchain through ctx.toolchains.

Users register toolchains they'd like to use by calling the register_toolchains function in their WORKSPACE file or by passing the --extra_toolchains flag on the command line.

Finally, when Bazel begins a build, it checks the constraints for the execution and target platforms. It then selects a suitable set of toolchains that are compatible with those constraints. Bazel will provide the ToolchainInfo objects of those toolchains to the rules that request them.

Got all that? Actually I'm not sure I do either. It's an elegant system, but it's difficult to grasp. If you want to see how Bazel selects or rejects registered toolchains, use the --toolchain_resolution_debug flag.

If you've ever used a dependency injection system like Dagger or Guice, Bazel's toolchain system is conceptually similar. A toolchain type is like an interface. A toolchain is like a static method with a @Provides annotation. A rule that requires a toolchain is like a constructor with an @Inject annotation. The system automatically finds a suitable implementation for every injected interface.

Migrating rules to toolchains

Let's start using toolchains in rules_go_simple.

First, we'll declare a toolchain_type. Rules can request this with the label @rules_go_simple//:toolchain_type.

toolchain_type(
    name = "toolchain_type",
    visibility = ["//visibility:public"],
)

Since a toolchain_type is basically an interface, we should document what can be done with that interface. Starlark is a dynamically typed language, and there's no place to write down required method or field names. I declared a dummy provider in providers.bzl with some documentation, but you could write this in a README or wherever makes sense for your project.

Next, we'll create our toolchain implementation rule, go_toolchain.

def _go_toolchain_impl(ctx):
    # Find important files and paths.
    go_cmd = None
    for f in ctx.files.tools:
        if f.path.endswith("/bin/go") or f.path.endswith("/bin/go.exe"):
            go_cmd = f
            break
    if not go_cmd:
        fail("could not locate go command")
    env = {"GOROOT": paths.dirname(paths.dirname(go_cmd.path))}

    # Generate the package list from the standard library.
    stdimportcfg = ctx.actions.declare_file(ctx.label.name + ".importcfg")
    ctx.actions.run(
        outputs = [stdimportcfg],
        inputs = ctx.files.tools + ctx.files.std_pkgs,
        arguments = ["stdimportcfg", "-o", stdimportcfg.path],
        env = env,
        executable = ctx.executable.builder,
        mnemonic = "GoStdImportcfg",
    )

    # Return a TooclhainInfo provider. This is the object that rules get
    # when they ask for the toolchain.
    return [platform_common.ToolchainInfo(
        # Functions that generate actions. Rules may call these.
        # This is the public interface of the toolchain.
        compile = go_compile,
        link = go_link,
        build_test = go_build_test,

        # Internal data. Contents may change without notice.
        # Think of these like private fields in a class. Actions may use these
        # (they are methods of the class) but rules may not (they are clients).
        internal = struct(
            go_cmd = go_cmd,
            env = env,
            stdimportcfg = stdimportcfg,
            builder = ctx.executable.builder,
            tools = ctx.files.tools,
            std_pkgs = ctx.files.std_pkgs,
        ),
    )]

go_toolchain = rule(
    implementation = _go_toolchain_impl,
    attrs = {
        "builder": attr.label(
            mandatory = True,
            executable = True,
            cfg = "host",
            doc = "Executable that performs most actions",
        ),
        "tools": attr.label_list(
            mandatory = True,
            doc = "Compiler, linker, and other executables from the Go distribution",
        ),
        "std_pkgs": attr.label_list(
            mandatory = True,
            doc = "Standard library packages from the Go distribution",
        ),
    },
    doc = "Gathers functions and file lists needed for a Go toolchain",
)

go_toolchain is a normal rule that returns a ToolchainInfo provider. When rules request the toolchain, they will get one of these objects. There are no mandatory fields, so you can put anything in here. I included three "methods" (which are actually just functions): compile, link, and build_test. These correspond with the actions our rules need to create, so rules will call these instead of creating actions directly. I also included an internal struct field, which includes private files and metadata. Our methods may access this struct, but clients of the toolchain should not, since these values can change without notice.

Next, we'll declare a go_toolchain and a toolchain, in BUILD.dist.bazel.tpl. This file is a template that gets expanded into a build file for the go_download repository rule. See the previous article for details.

# toolchain_impl gathers information about the Go toolchain.
# See the GoToolchain provider.
go_toolchain(
    name = "toolchain_impl",
    builder = ":builder",
    std_pkgs = [":std_pkgs"],
    tools = [":tools"],
)

# toolchain is a Bazel toolchain that expresses execution and target
# constraints for toolchain_impl. This target should be registered by
# calling register_toolchains in a WORKSPACE file.
toolchain(
    name = "toolchain",
    exec_compatible_with = [
        {exec_constraints},
    ],
    target_compatible_with = [
        {target_constraints},
    ],
    toolchain = ":toolchain_impl",
    toolchain_type = "@rules_go_simple//:toolchain_type",
)

We need to define the {exec_constraints} and {target_constraints} template parameters in the go_download rule. See repo.bzl.

To complete the toolchain implementation, we'll modify our go_compile, go_link, and go_build_test functions. They can obtain the toolchain using ctx.toolchains. Here's go_compile after this change:

def go_compile(ctx, srcs, out, importpath = "", deps = []):
    """Compiles a single Go package from sources.

    Args:
        ctx: analysis context.
        srcs: list of source Files to be compiled.
        out: output .a File.
        importpath: the path other libraries may use to import this package.
        deps: list of GoLibraryInfo objects for direct dependencies.
    """
    toolchain = ctx.toolchains["@rules_go_simple//:toolchain_type"]

    args = ctx.actions.args()
    args.add("compile")
    args.add("-stdimportcfg", toolchain.internal.stdimportcfg)
    dep_infos = [d.info for d in deps]
    args.add_all(dep_infos, before_each = "-arc", map_each = _format_arc)
    if importpath:
        args.add("-p", importpath)
    args.add("-o", out)
    args.add_all(srcs)

    inputs = (srcs +
              [dep.info.archive for dep in deps] +
              [toolchain.internal.stdimportcfg] +
              toolchain.internal.tools +
              toolchain.internal.std_pkgs)
    ctx.actions.run(
        outputs = [out],
        inputs = inputs,
        executable = toolchain.internal.builder,
        arguments = [args],
        env = toolchain.internal.env,
        mnemonic = "GoCompile",
    )

Finally, we'll update our rules to request the toolchain and call these functions. Here's go_library after this change.

def _go_library_impl(ctx):
    # Load the toolchain.
    toolchain = ctx.toolchains["@rules_go_simple//:toolchain_type"]

    # Declare an output file for the library package and compile it from srcs.
    archive = ctx.actions.declare_file("{name}_/pkg.a".format(name = ctx.label.name))
    toolchain.compile(
        ctx,
        srcs = ctx.files.srcs,
        importpath = ctx.attr.importpath,
        deps = [dep[GoLibraryInfo] for dep in ctx.attr.deps],
        out = archive,
    )

    # Return the output file and metadata about the library.
    return [
        DefaultInfo(
            files = depset([archive]),
            runfiles = ctx.runfiles(collect_data = True),
        ),
        GoLibraryInfo(
            info = struct(
                importpath = ctx.attr.importpath,
                archive = archive,
            ),
            deps = depset(
                direct = [dep[GoLibraryInfo].info for dep in ctx.attr.deps],
                transitive = [dep[GoLibraryInfo].deps for dep in ctx.attr.deps],
            ),
        ),
    ]

go_library = rule(
    _go_library_impl,
    attrs = {
        "srcs": attr.label_list(
            allow_files = [".go"],
            doc = "Source files to compile",
        ),
        "deps": attr.label_list(
            providers = [GoLibraryInfo],
            doc = "Direct dependencies of the library",
        ),
        "data": attr.label_list(
            allow_files = True,
            doc = "Data files available to binaries using this library",
        ),
        "importpath": attr.string(
            mandatory = True,
            doc = "Name by which the library may be imported",
        ),
    },
    doc = "Compiles a Go archive from Go sources and dependencies",
    toolchains = ["@rules_go_simple//:toolchain_type"],
)

Using toolchains

Let's check whether this works with a minimal go_binary rule. Here's our BUILD.bazel file:

load("@rules_go_simple//:def.bzl", "go_binary")

go_binary(
    name = "hello",
    srcs = ["hello.go"],
)

And here's our WORKSPACE file. It downloads and registers toolchains for Linux, macOS, and Windows.

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

http_archive(
    name = "rules_go_simple",
    sha256 = "8389d6c120d19a6d23de59dfb8299c7a860dbb24078408ed9b98f726c29a4fc9",
    urls = ["https://github.com/jayconrod/rules_go_simple/releases/download/5.1.0/rules_go_simple-5.1.0.tar.gz"],
)

load("@rules_go_simple//:deps.bzl", "go_download", "go_rules_dependencies")

go_rules_dependencies()

go_download(
    name = "go_darwin",
    goarch = "amd64",
    goos = "darwin",
    sha256 = "a9088c44a984c4ba64179619606cc65d9d0cb92988012cfc94fbb29ca09edac7",
    urls = ["https://dl.google.com/go/go1.13.4.darwin-amd64.tar.gz"],
)

go_download(
    name = "go_linux",
    goarch = "amd64",
    goos = "linux",
    sha256 = "692d17071736f74be04a72a06dab9cac1cd759377bd85316e52b2227604c004c",
    urls = ["https://dl.google.com/go/go1.13.4.linux-amd64.tar.gz"],
)

go_download(
    name = "go_windows",
    goarch = "amd64",
    goos = "windows",
    sha256 = "ab8b7f7a2a4f7b58720fb2128b32c7471092961ff46a01d9384fb489d8212a0b",
    urls = ["https://dl.google.com/go/go1.13.4.windows-amd64.zip"],
)

register_toolchains(
    "@go_darwin//:toolchain",
    "@go_linux//:toolchain",
    "@go_windows//:toolchain",
)

Not to be forgotten, here's our hello.go source file:

package main

import (
	"fmt"
	"runtime"
)

func main() {
	fmt.Printf("Hello from %s %s %s\n", runtime.Version(), runtime.GOOS, runtime.GOARCH)
}

We can build with bazel build //:hello. You can add the -s flag to print commands and verify that the downloaded toolchain is used.

Conclusion

Platforms and toolchains are a mechanism for decoupling a set of rules from the tools they depend on. This is most immediately useful for isolating the build from the machine it runs on. It also provides flexibility for users: it lets developers (not necessarily rule authors) write their own toolchains. In our case, someone could create a toolchain for gccgo or TinyGo, and it would work with rules_go_simple as long as it satisfies the interface we documented for our toolchain_type.

Ultimately, the toolchain system separates what is being built (rules) from how to build it (toolchain). This means when you change one component, you don't need to rewrite all the build files in your repository. Change is isolated, which is important in any system that needs to scale.