Writing Bazel rules: moving logic to execution

Published on 2018-12-26
Edited on 2023-10-15
Tagged: bazel go

This article is part of the series "Writing Bazel rules".

Writing Bazel rules: simple binary rule
Writing Bazel rules: library rule, depsets, providers
Writing Bazel rules: data and runfiles
Writing Bazel rules: moving logic to execution
Writing Bazel rules: repository rules
Writing Bazel rules: platforms and toolchains

In this article, we're going to expand the capabilities of our Bazel rule set by adding new logic to the execution phase. For those following along at home, the new code will be on the v4 branch of github.com/jayconrod/rules_go_simple.

You may recall that Bazel operates in three phases:

Loading phase: Bazel reads and evaluates the build files that define the targets the user asked it to build. It recursively loads the build files for the targets' dependencies. This happens on the machine where Bazel is invoked. The target graph is cached in memory.
Analysis phase: Bazel evaluates rule implementation functions for the targets and their dependencies. Rule implementation functions declare files to generate, actions to generate them, and inputs for those actions. As with the loading phase, this work happens on the machine where Bazel is invoked, and the action graph is cached in memory.
Execution phase: Bazel determines which files are out of date and executes actions needed to produce them. Actions may be executed locally in a sandbox or remotely using an execution service. The output files are cached, either locally or on a remote caching service.

Rule set authors can write code for any of these phases, and it's often possible to solve a problem multiple ways. However, the execution phase has many advantages, and you should prefer to implement rule logic there if at all possible. Execution code has I/O access to source files, so it can be smarter. It can be written in any language, so it can be faster and more flexible than Starlark. Work can be distributed across many machines, and the results can be cached persistently, so it can be faster for everyone.

The plan

In this article, we'll add several new features to rules_go_simple.

A go_test rule will build Go tests. This requires parsing the test sources and generating code for the main package, and there's no general-purpose tool that does that. We'll write our own.
We'll support Go build constraints, which let us filter platform-specific source files using comments like //go:build linux.
We'll rewrite our importcfg generating code in Go. [EDIT 2023-10-14: in earlier versions of this series, this article was the first that used importcfg files. Go stopped shipping pre-compiled standard library files, so we needed to use them earlier.]

In order to implement these features, we need to be able to read source files, so we'll do all the work in a new "builder" binary written in Go that runs in the execution phase. All our actions will be executed through the builder binary.

To compile and link the builder, we'll define a new internal rule, go_tool_binary. This is necessary since go_binary will depend on the builder; we can't use the builder to build itself.

Finally, we'll update go_binary and go_library and introduce go_test, all of which will depend on the builder.

The builder binary

The builder binary contains quite a bit of logic, most of which is specific to the way Go programs are built, so I won't go into much detail here. If you're curious, you can find the source code in //internal/builder.

builder.go is where the main function is defined. main checks the first command line argument (a "verb") and calls a function based on that. The verb may be compile, link, or test.

The compile action compiles a list of source files into a package file. First, we filter out source files intended to be compiled for different platforms using Go build constraints. Go has a standard package for this, so we don't have to write a parser or anything like that. Second, we build an importcfg file by combining information from the standard importcfg file and the direct dependencies of the package being compiled (from the deps attribute). Finally, we invoke the Go compiler.

The link action links an executable from a set of compiled archive files. This is pretty simple: we build an importcfg file with information about every archive file that may be needed, then we invoke the Go linker on the main package file.

The test action is more complicated. Go tests are expected to appear in files with the suffix _test.go. These files may be compiled together with the library being tested (giving them access to private symbols) or they may be compiled separately. A "test main" source file is generated, which is responsible for initializing the test framework and calling each test function. The test main file is compiled into a third archive. The test action sorts all this out, building importcfg files and invoking the compiler and linker as needed.

What are all these importcfg files you ask? The Go compiler and linker accept a file that maps Go import paths to package files. We need to build an importcfg file every time we run one of those commands. We can generate it pretty easily inside our builder binary, but we actually need an importcfg for the entire standard library in order to compile and link the builder binary. We generate that with a fancy go list command inside go_stdlib, which go_tool_binary depends on. This is all pretty Go-specific, so don't worry about it too much.

Internal rules

In order to compile and link the builder binary, we'll define a new rule, go_tool_binary, in rules.bzl. This rule compiles and links a small binary. It doesn't support dependencies outside of the standard library or build constraints, but that's fine for small tools.

def _go_tool_binary_impl(ctx):
  # Declare the output executable file.
  executable_path = "{name}_/{name}".format(name = ctx.label.name)
  executable = ctx.actions.declare_file(executable_path)
  go_build_tool(
      ctx,
      srcs = ctx.files.srcs,
      stdlib = ctx.attr._stdlib[GoStdLibInfo],
      out = executable,
  )
  return [DefaultInfo(
      files = depset([executable]),
      executable = executable,
  )]

go_tool_binary = rule(
  implementation = _go_tool_binary_impl,
  attrs = {
      "srcs": attr.label_list(
          allow_files = [".go"],
          mandatory = True,
          doc = "Source files to compile for the main package of this binary",
      ),
      "_stdlib": attr.label(
          default = "//internal:stdlib",
          providers = [GoStdLibInfo],
          doc = "Hidden dependency on the Go standard library",
      ),
  },
  doc = "...",
  executable = True,
)

As we did with go_binary, we've set executable = True here. Bazel will require go_tool_binary to produce an executable.

You might notice the _stdlib attribute. This is a hidden dependency (starting with _ and having a default value means hidden) on a rule that compiles the Go standard library. This actually shows up in earlier articles, but we haven't really talked about it because it was a late addition: a change to the code necessitated by the change in Go, long after these articles were first published. I won't say much more at the risk of going off into Go-specific weeds, but suffice to say, this is anotehr rule that compiles the standard library and produces an importcfg file for all packages. It returns some metadata through the GoStdLibInfo provider. go_tool_binary depends on //internal:stdlib becuase the builder needs to import packages in the standard library and needs them to already be compiled.

Also as with earlier rules, we define the function that creates actions separately. This is good practice: it allows groups of actions to be composable in a way that rules are not.

def go_build_tool(ctx, *, srcs, stdlib, out):
  command = """
set -o errexit
export GOPATH=/dev/null  # suppress warning
go tool compile -o {out}.a -p main -importcfg {stdlib_importcfg} -- {srcs}
go tool link -o {out} -importcfg {stdlib_importcfg} -- {out}.a
""".format(
      out = shell.quote(out.path),
      stdlib_importcfg = shell.quote(stdlib.importcfg.path),
      srcs = " ".join([shell.quote(src.path) for src in srcs]),
  )
  inputs = depset(
      direct = srcs,
      transitive = [stdlib.files],
  )
  ctx.actions.run_shell(
      outputs = [out],
      inputs = inputs,
      command = command,
      mnemonic = "GoToolBuild",
      use_default_shell_env = True,
  )

Finally, we'll write //internal/builder:BUILD.bazel, which the single instance of our new rule.

load("//internal:rules.bzl", "go_tool_binary")

  go_tool_binary(
      name = "builder",
      srcs = [
          "builder.go",
          "compile.go",
          "flags.go",
          "importcfg.go",
          "link.go",
          "sourceinfo.go",
          "test.go",
      ],
      visibility = ["//visibility:public"],
  )

Using the builder

Since we've done all the complicated stuff in the builder, our go_binary, go_library, and go_test rules should be relatively simple. They just declare actions that execute the builder and pass in the necessary command-line arguments.

Here's the definition for go_test in rules.bzl. The definitions of go_binary and go_library are similar, so I won't show them here.

ggo_test = rule(
  implementation = _go_test_impl,
  attrs = {
      "srcs": attr.label_list(
          allow_files = [".go"],
          doc = ("Source files to compile for this test. " +
                 "May be a mix of internal and external tests."),
      ),
      "deps": attr.label_list(
          providers = [GoLibraryInfo],
          doc = "Direct dependencies of the test",
      ),
      "data": attr.label_list(
          allow_files = True,
          doc = "Data files available to this test",
      ),
      "importpath": attr.string(
          default = "",
          doc = "Name by which test archives may be imported (optional)",
      ),
      "_builder": attr.label(
          default = "//internal/builder",
          executable = True,
          cfg = "exec",
      ),
      "_stdlib": attr.label(
          default = "//internal:stdlib",
          providers = [GoStdLibInfo],
          doc = "Hidden dependency on the Go standard library",
      ),
  },
  doc = """Compiles and links a Go test executable. Functions with names
starting with "Test" in files with names ending in "_test.go" will be called
using the go "testing" framework.""",
  test = True,
)

Note that we need implicit dependencies on _builder and _stdlib. The _builder dependency has cfg = "exec" set, which means Bazel will build it for the execution platform, which might be different than the target platform if we were cross-compiling. Also, since go_test builds tests that Bazel can execute, we need to set test = True.

In the implementation, we just call go_build_test, which does the heavy lifting:

def _go_test_impl(ctx):
  executable_path = "{name}_/{name}".format(name = ctx.label.name)
  executable = ctx.actions.declare_file(executable_path)
  go_build_test(
      ctx,
      importpath = ctx.attr.importpath,
      srcs = ctx.files.srcs,
      stdlib = ctx.attr._stdlib[GoStdLibInfo],
      deps = [dep[GoLibraryInfo] for dep in ctx.attr.deps],
      out = executable,
      rundir = ctx.label.package,
  )

  runfiles = _collect_runfiles(
      ctx,
      direct_files = ctx.files.data,
      indirect_targets = ctx.attr.data + ctx.attr.deps,
  )
  return [DefaultInfo(
      files = depset([executable]),
      runfiles = runfiles,
      executable = executable,
  )]

We define go_build_test in actions.bzl.

def go_build_test(ctx, *, importpath, srcs, stdlib, deps, out, rundir):
  """Compiles and links a Go test executable.

  Args:
      ctx: analysis context.
      importpath: import path of the internal test archive.
      srcs: list of source Files to be compiled.
      stdlib: a GoStdLibInfo provider for the standard library.
      deps: list of GoLibraryInfo objects for direct dependencies.
      out: output executable file.
      rundir: directory the test should change to before executing.
  """
  direct_dep_infos = [d.info for d in deps]
  transitive_dep_infos = depset(transitive = [d.deps for d in deps]).to_list()

  inputs = (srcs +
            stdlib.files.to_list() +
            [d.archive for d in direct_dep_infos] +
            [d.archive for d in transitive_dep_infos])

  args = ctx.actions.args()
  args.add("test")
  args.add("-stdimportcfg", stdlib.importcfg)
  args.add_all(direct_dep_infos, before_each = "-direct", map_each = _format_arc)
  args.add_all(transitive_dep_infos, before_each = "-transitive", map_each = _format_arc)
  if rundir != "":
      args.add("-dir", rundir)
  if importpath != "":
      args.add("-p", importpath)
  args.add("-o", out)
  args.add_all(srcs)

  ctx.actions.run(
      outputs = [out],
      inputs = inputs,
      executable = ctx.executable._builder,
      arguments = [args],
      mnemonic = "GoTest",
      use_default_shell_env = True,
  )

This is the first time we've declared an action with ctx.actions.run instead of ctx.actions.run_shell. The usage of these two functions is quite different, though they take many of the same arguments. run tells Bazel to invoke a command directly without interpreting it through the shell. This is more efficient, less error-prone (no need for quoting), and less OS-specific, so you should prefer run over run_shell whenever possible. In our case, we still need run_shell to build the standard library and bootstrap our builder binary, but we'll avoid it for everything else.

We are using Args (obtained from ctx.actions.args) to build our argument list. There are a couple advantages to using Args instead of building a list of strings. First, there are several conveniences: you don't have to convert Files to strings, and there are useful facilities for formatting lists of options. Second, if you pass a directory (created with ctx.actions.declare_directory to Args.add_all, the files within the directory will be expanded as individual arguments (assuming expand_directories = True, which it is by default). Third, if you pass a depset to Args.add_all, Bazel won't iterate the depset unless the action is actually executed. This can improve Bazel's performance for builds with very large dependency graphs. If an action is cached, there's no need to construct its command line.

Conclusion

That pretty much wraps it up. Again, I tried to focus on the Bazel rule infrastructure here, so if you're looking for Go-specific details, check out the code in the builder directory. You can see some examples of these Go rules being used in //tests.

To summarize, a good set of Bazel rules should do as much work in the execution phase as possible. The purpose of the analysis phase is to declare files and actions. Rules written in Starlark should do just that with minimal logic. If you notice rules declaring unnecessary files (e.g., internal temporary files) or unnecessary actions (e.g., multiple actions that always execute together), try to consolidate. Simplifying Starlark rules will speed up analysis and will leverage remote caching and execution.

A note on rules_go

rules_go does not currently follow the advice above. It has very complicated logic in some places, especially around cgo (handling C and Go code mixed together, compiled separately). This is mostly a consequence of us trying to implement things that Bazel didn't quite support yet (e.g., C compilation from Starlark rules). Keep this in mind if you're using rules_go as a template for more advanced rule sets.

I'm hoping to get it into better shape in the near future. rules_go_simple is not only a useful example for this blog; it's a prototype for changes I want to make in rules_go in the future.