Writing Bazel rules: moving logic to execution

In this article, we're going to expand the capabilities of our Bazel rule set by adding new logic to the execution phase. For those following along at home, the new code will be in the v4 directory, which is part of github.com/jayconrod/rules_go_simple.

You may recall that Bazel operates in three phases:

Rule set authors can write code for any of these phases, and it's often possible to solve a problem multiple ways. However, the execution phase has many advantages, and you should prefer to implement rule logic there if at all possible. Execution code has I/O access to source files, so it can be smarter. It can be written in any language, so it can be faster and more flexible than Starlark. Work can be distributed across many machines, and the results can be cached persistently, so it can be faster for everyone.

The plan

In this article, we'll add several new features to rules_go_simple.

In order to implement these features, we need to be able to read source files, so we'll do all the work in a new "builder" binary written in Go that will run in the execution phase. All our actions will be executed through the builder binary.

To compile and link the builder, we'll define a new internal rule, go_tool_binary. This is necessary since go_binary will depend on the builder; we can't use the builder to build itself.

Finally, we'll update go_binary and go_library and introduce go_test, all of which will depend on the builder.

The builder binary

The builder binary contains quite a bit of logic, most of which is specific to the way Go programs are built, so I won't go into much detail here. If you're curious, you can find the source code in //v4/internal/builder.

builder.go is where the main function is defined. main checks the first command line argument (a "verb") and calls a function based on that. The verb may be stdimportcfg, compile, link, or test.

The stdimportcfg action creates an importcfg file for the standard library. In Go, importcfg files map import paths like "net/http" to compiled package files like /home/jay/go/pkg/linux_amd64/net/http.a. Until now, we've been using the ‑I and ‑L compiler and linker options, which let us specify directories to search. Using search paths requires extra I/O and is somewhat inflexible and error-prone, so it's better to use importcfg files. The importcfg file for the standard library will be built once and used as an input for other actions.

The compile action compiles a list of source files into a package file. First, we filter out source files intended to be compiled for different platforms using Go build constraints. Go has a standard package for this, so we don't have to write a parser or anything like that. Second, we build an importcfg file by combining information from the standard importcfg file and the direct dependencies of the package being compiled (from the deps attribute). Finally, we invoke the Go compiler.

The link action links an executable from a set of compiled archive files. This is pretty simple: we build an importcfg file with information about every archive file that may be needed, then we invoke the Go linker on the main package file.

The test action is more complicated. Go tests are expected to appear in files with the suffix _test.go. These files may be compiled together with the library being tested (giving them access to private symbols) or they may be compiled separately. A "test main" source file is generated, which is responsible for initializing the test framework and calling each test function. The test main file is compiled into a third archive. The test action sorts all this out, building importcfg files and invoking the compiler and linker as needed.

Internal rules

In order to compile and link the builder binary, we'll define a new rule, go_tool_binary, in rules.bzl. This rule compiles and links a small binary. It doesn't support dependencies outside of the standard library or build constraints, but that's fine for small tools.

def _go_tool_binary_impl(ctx):
    executable_path = "{name}%/{name}".format(name = ctx.label.name)
    executable = ctx.actions.declare_file(executable_path)
    go_build_tool(
        ctx,
        srcs = ctx.files.srcs,
        out = executable,
    )
    return [DefaultInfo(
        files = depset([executable]),
        executable = executable,
    )]

go_tool_binary = rule(
    implementation = _go_tool_binary_impl,
    attrs = {
        "srcs": attr.label_list(
            allow_files = [".go"],
            doc = "Source files to compile for the main package of this binary",
        ),
    },
    doc = "...",
    executable = True,
)

As we did for other rules, we define the function that creates actions separately. This is good practice: it allows groups of actions to be composable in a way that rules are not.

def go_build_tool(ctx, srcs, out):
    cmd_tpl = ("go tool compile -o {out}.a {srcs} && " +
               "go tool link -o {out} {out}.a")
    cmd = cmd_tpl.format(
        out = shell.quote(out.path),
        srcs = " ".join([shell.quote(src.path) for src in srcs]),
    )
    ctx.actions.run_shell(
        outputs = [out],
        inputs = srcs,
        command = cmd,
        mnemonic = "GoToolBuild",
        use_default_shell_env = True,
    )

Next, we'll need a rule that produces the importcfg file for the standard library. All compile, link, and test actions will depend on this.

def _go_stdimportcfg_impl(ctx):
    f = ctx.actions.declare_file(ctx.label.name + ".txt")
    go_write_stdimportcfg(ctx, f)
    return [DefaultInfo(files = depset([f]))]

go_stdimportcfg = rule(
    implementation = _go_stdimportcfg_impl,
    attrs = {
        "_builder": attr.label(
            default = "//v4/internal/builder",
            executable = True,
            cfg = "host",
        ),
    },
    doc = "...",
)

This rule has a single attribute, "_builder". Attributes that start with "_" are implicit: they cannot be set by the user, and they must have default values. Implicit attributes are useful for specifying implicit dependencies.

Note also that we've set executable = True. Bazel will require //v4/internal/builder to produce an executable. The executable File will be available through ctx.executable._builder.

We've also set cfg = "host". This ensures the executable will be built for the platform where Bazel is running actions (technically the execution platform, which may be different than the platform where Bazel is invoked).

Here's the function that creates the action.

def go_write_stdimportcfg(ctx, out):
    ctx.actions.run(
        outputs = [out],
        arguments = ["stdimportcfg", "-o", out.path],
        executable = ctx.executable._builder,
        mnemonic = "GoStdImportcfg",
        use_default_shell_env = True,
    )

This is the first time we've used ctx.actions.run to create an action instead of ctx.actions.run_shell. run invokes an executable directly, which is more efficient and less prone to quoting problems.

Finally, we'll write //v4/internal/builder:BUILD.bazel, which declares targets for our two new rules.

load(
    "//v4/internal:rules.bzl",
    "go_tool_binary",
    "go_stdimportcfg",
)

package(default_visibility = ["//visibility:public"])

go_tool_binary(
    name = "builder",
    srcs = [
        "builder.go",
        "compile.go",
        "flags.go",
        "importcfg.go",
        "link.go",
        "sourceinfo.go",
        "test.go",
    ],
)

go_stdimportcfg(name = "stdimportcfg")

Using the builder

Since we've done all the complicated stuff in the builder, our go_binary, go_library, and go_test rules should be relatively simple. They just declare actions that execute the builder and pass in the necessary command-line arguments.

Here's the definition for go_test in rules.bzl. The definitions of go_binary and go_library are similar, so I won't show them here.

go_test = rule(
    implementation = _go_test_impl,
    attrs = {
        "srcs": attr.label_list(
            allow_files = [".go"],
            doc = ("Source files to compile for this test. " +
                   "May be a mix of internal and external tests."),
        ),
        "deps": attr.label_list(
            providers = [GoLibrary],
            doc = "Direct dependencies of the test",
        ),
        "data": attr.label_list(
            allow_files = True,
            doc = "Data files available to this test",
        ),
        "importpath": attr.string(
            default = "",
            doc = "Name by which test archives may be imported (optional)",
        ),
        "_builder": attr.label(
            default = "//v4/internal/builder",
            executable = True,
            cfg = "host",
        ),
        "_stdimportcfg": attr.label(
            default = "//v4/internal/builder:stdimportcfg",
            allow_single_file = True,
        ),
    },
    doc = "...",
    test = True,
)

Note that we need implicit dependencies on _builder and _stdimportcfg. Also, since go_test builds tests that Bazel can execute, we need to set test = True.

In the implementation, we just call go_build_test, which does the heavy lifting:

def _go_test_impl(ctx):
    executable_path = "{name}%/{name}".format(name = ctx.label.name)
    executable = ctx.actions.declare_file(executable_path)
    go_build_test(
        ctx,
        srcs = ctx.files.srcs,
        deps = [dep[GoLibrary] for dep in ctx.attr.deps],
        out = executable,
        importpath = ctx.attr.importpath,
        rundir = ctx.label.package,
    )

    return [DefaultInfo(
        files = depset([executable]),
        runfiles = ctx.runfiles(collect_data = True),
        executable = executable,
    )]

We define go_build_test in actions.bzl.

def go_build_test(ctx, srcs, deps, out, rundir = "", importpath = ""):
    direct_dep_infos = [d.info for d in deps]
    transitive_dep_infos = depset(transitive = [d.deps for d in deps]).to_list()
    inputs = (srcs +
              [ctx.file._stdimportcfg] +
              [d.archive for d in direct_dep_infos] +
              [d.archive for d in transitive_dep_infos])

    args = ctx.actions.args()
    args.add("test")
    args.add("-stdimportcfg", ctx.file._stdimportcfg)
    args.add_all(
        direct_dep_infos,
        before_each = "-direct",
        map_each = _format_arc,
    )
    args.add_all(
        transitive_dep_infos,
        before_each = "-transitive",
        map_each = _format_arc,
    )
    if rundir != "":
        args.add("-dir", rundir)
    if importpath != "":
        args.add("-p", importpath)
    args.add("-o", out)
    args.add_all(srcs)
    
    ctx.actions.run(
        outputs = [out],
        inputs = inputs,
        executable = ctx.executable._builder,
        arguments = [args],
        mnemonic = "GoTest",
        use_default_shell_env = True,
    )

We are using Args (obtained from ctx.actions.args) to build our argument list. There are a couple advantages to using Args instead of building a list of strings. First, there are several conveniences: you don't have to convert files to strings, and there are useful facilities for formatting lists of options. Second, if you pass a directory (created with actions.declare_directory to Args.add_all, the files within the directory will be expanded as individual arguments (assuming expand_directories = True is passed, which will be the default in the future). Third, if you pass a depset to Args.add_all, Bazel won't iterate the depset unless the action is actually executed. This can improve performance when very large dependency graphs.

Conclusion

That pretty much wraps it up. Again, I tried to focus on the Bazel rule infrastructure here, so if you're looking for Go-specific details, check out the code in the builder directory. You can see some examples of these Go rules being used in //v4/tests.

To summarize, a good set of Bazel rules should do as much work in the execution phase as possible. The purpose of the analysis phase is to declare files and actions. Rules written in Starlark should do just that with minimal logic. If you notice rules declaring unnecessary files (e.g., internal temporary files) or unnecessary actions (e.g., multiple actions that always execute together), try to consolidate. Simplifying Starlark rules will speed up analysis and will leverage remote caching and execution.

A note on rules_go

rules_go does not currently follow the advice above. It has very complicated logic in some places, especially around cgo (handling C and Go code mixed together, compiled separately). This is mostly a consequence of us trying to implement things that Bazel didn't quite support yet (e.g., C compilation from Starlark rules). Keep this in mind if you're using rules_go as a template for more advanced rule sets.

I'm hoping to get it into better shape in the near future. rules_go_simple is not only a useful example for this blog; it's a prototype for changes I want to make in rules_go in the future.