Writing Bazel rules: simple binary rule
Bazel is an open source build tool created by Google. It has a number of strengths that make it a good fit for large projects: highly parallel remote build and execution, remote caching, mature support for many languages, and the ability to scale to monorepos with billions of lines of code.
Bazel's learning curve can be steep compared with simple build tools like Make and language-specific tools like Maven, especially when you want to extend Bazel to do something new.
This tutorial is the first in a series on writing Bazel rules. If you want to extend Bazel to support a new language or tool, this is a great place to start. In this article, we'll write a Bazel rule that builds a simple binary written in Go. In later articles, we'll cover libraries, tests, toolchains, modules, and more.
I'll assume you're familiar with basic Bazel basics. If you're new to Bazel, head over to Getting Started and follow the First build guides in your preferred language.
How Bazel works
In each of these articles, I'll cover some of the theory of how Bazel works. Since this is the first article in the series, we'll start with the basics.
Starlark
Starlark is Bazel's configuration and extension language. It's essentially Python without some of the advanced features: Starlark has no classes, exceptions, or generators, and the module system is quite different. Starlark avoids being Turing complete by dynamically forbidding recursion and only allowing loops over data structures with fixed size. You can find a full list of differences in Bazel's documentation and in the language spec. These limitations prevent Starlark extensions from getting too complicated; most of the complexity should be pushed out into tools.
Aside: Starlark can be used on its own outside of Bazel. Facebook's Buck build tool also uses Starlark. Alan Donovan gave a talk on Starlark at GothamGo 2017 with an example of using Starlark to configure a web server. He's also published an embeddable Starlark interpreter written in Go.
Concepts: rules, packages, and more
To build things in Bazel, you need to write build files (named BUILD
or BUILD.bazel
). They look like this:
load("@rules_go//go:def.bzl", "go_binary", "go_library", "go_test")
go_library(
name = "fetch_repo_lib",
srcs = [
"fetch_repo.go",
"module.go",
"vcs.go",
],
importpath = "github.com/bazelbuild/bazel-gazelle/cmd/fetch_repo",
visibility = ["//visibility:private"],
deps = ["@org_golang_x_tools_go_vcs//:vcs"],
)
go_binary(
name = "fetch_repo",
embed = [":fetch_repo_lib"],
visibility = ["//visibility:public"],
)
go_test(
name = "fetch_repo_test",
srcs = ["fetch_repo_test.go"],
embed = [":fetch_repo_lib"],
deps = ["@org_golang_x_tools_go_vcs//:vcs"],
)
A build file contains a number of targets, written as Starlark function calls. The syntax is declarative: you say what you want to build, not how to build it. In this example, we define a Go library ("fetch_repo_lib"
) with a handful of source files. A binary ("fetch_repo"
) is built from that library. We also have a test ("fetch_repo_test"
) built from that library and an additional source file ("fetch_repo_test.go"
).
The thing being called to define a target is a rule. go_binary
is one of the rules in the example above. A rule has a number of attributes like srcs
and deps
used to describe a target. It also has an implementation function (not shown yet) that tells Bazel what commands to run and what output files to expect when building a target.
Each build file implicitly defines a Bazel package. A package consists of the targets declared in the build file and all of the files in the package's directory and subdirectories, excluding targets and files defined in other packages' subdirectories. Visibility restrictions are usually applied at the package level, and globs (wildcard patterns used to match source files) only match files within a package. Frequently (not always), you'll have one package per directory.
Targets and files are named using labels, which are strings that look like "@rules_go//go:def.bzl"
. Labels have three parts: a repository name (rules_go
), a package name (go
), and a file or target name (def.bzl
). The repository name and the package name may be omitted when a label refers to something in the same repository or package.
Load, analysis, and execution phases
Bazel builds targets in three phases: load, analysis, and execution. (Actually, it's more complicated, but for the purpose of writing simple rules, these are all you need to know about).
In the loading phase, Bazel reads and evaluates build files. It builds a graph of targets and dependencies. For example, if you build fetch_repo_test
above, Bazel constructs a graph with a fetch_repo_test
node that depends on fetch_repo_test.go
, :fetch_repo_lib
, and @org_golang_x_tools_go_vcs//:vcs
via srcs
, embed
, and deps
edges, respectively.
In the analysis phase, Bazel evaluates rules in the target graph. Each rule implementation function declares output files and actions to produce those files. The result of the analysis phase is the file-action graph.
In the execution phase, Bazel runs actions in the file-action graph needed to produce output files. Bazel has several strategies for running actions. By default, it runs actions within a sandbox that only exposes declared inputs and hides other files. This makes builds more hermetic, since it's harder to accidentally depend on system files that vary from machine to machine. Bazel may also run actions on remote build servers where this isolation happens automatically.
It's important to note that a rules cannot directly perform any I/O during the loading or analysis phases. A rule merely tells Bazel how it should execute commands to build targets. Only the commands themselves can read and write files, during the execution phase. This means a rule can't make any decisions based on the contents of source files. Build files need to explicitly declare dependencies; it's not possible for rules to automatically discover dependencies during the build.
Setting up rules_go_simple
Okay, we've gotten all the theory out of the way for today. Let's dive into the code. We're going to write "rules_go_simple
", a simplified version of rules_go. Don't worry if you don't know Go — it's just used as an example. We won't be writing any Go code today.
I've created an example repository at github.com/jayconrod/rules_go_simple. For this article, we'll be looking at the v1
branch. In later articles, we'll add features to branches with higher version numbers.
The first thing we need is a MODULE.bazel
file. Every Bazel project should have one of these in the project root directory. MODULE.bazel
declares external dependencies on other Bazel modules that we need to build and test our project. In our case, we have one dependency, bazel_skylib
, a module with several Starlark libraries that are useful for writing and testing rules. We'll use it to quote strings in shell commands.
Here's our MODULE.bazel
file (with most comments removed).
module(name = "rules_go_simple")
# bazel_skylib is a common library for writing and testing Bazel rules.
bazel_dep(name = "bazel_skylib", version = "1.7.1")
Declaring a repository causes Bazel to download its MODULE.bazel
file but not its source code. Bazel will only download a repository if it needs something inside.
The first time you run a build command, Bazel will generate MODULE.bazel.lock
, a JSON file that contains some metadata about dependencies. You should check this file into version control, but you don't need to edit it manually. For bazel_skylib
, it contains SHA256 sums for MODULE.lock
and source.json
. It's important to record these sums to detect if a module is tampered with later; a released version of a Bazel module cannot be modified once published. You can run bazel mod tidy
to completely fill out MODULE.bazel.lock
and to remove any metadata that's not needed anymore.
Declaring the go_binary rule
To define our binary rule, we'll create a new file, internal/rules.bzl
. We'll start with a declaration like this:
go_binary = rule(
implementation = _go_binary_impl,
attrs = {
"srcs": attr.label_list(
allow_files = [".go"],
doc = "Source files to compile for the main package of this binary",
),
"_stdlib": attr.label(
allow_single_file = True,
default = "//internal:stdlib",
),
},
doc = "Builds an executable program from Go source code",
executable = True,
)
You may want to refer to the Bazel documentation for rule
and attr
. There's a lot here, so let's break it down.
- We are defining a new rule named
go_binary
by assigning the result of therule
function to a variable with that name. A rule is a callable value, but it's not actually a function. go_binary
is implemented in the_go_binary_impl
function (passed as the first argument here), which Bazel will call during the analysis phase for eachgo_binary
target that's part of a build. The implementation function will declare output files and actions.go_binary
has an attribute namedsrcs
, which is alabel_list
.srcs
may be a list of files with names ending in".go"
.go_binary
must produce an executable file.- The
rule
constructor has adoc
argument, as doesattr.label_list
and other attribute constructors. This lets us write documentation in a structured location. There's a tool called Stardoc that can render documentation from these arguments. - There's an attribute named
_stdlib
. This is a hidden attribute (its name starts with_
) that points to a target//internal:stdlib
that compiles the Go standard library . This was a late addition to this series; before Go 1.20, the standard library was precompiled, and we could assume it was installed. Don't worry too much about the details yet. We'll cover this more when we get to toolchains.
Note that all rules automatically support a set of common attributes like name
, visibility
, and tags
. These don't need to be declared explicitly.
Implementing go_binary
Let's look at our implementation function next.
def _go_binary_impl(ctx):
# Declare an output file for the main package and compile it from srcs.
main_archive = ctx.actions.declare_file("{name}.a".format(name = ctx.label.name))
go_compile(
ctx,
srcs = ctx.files.srcs,
importpath = "main",
stdlib = ctx.file._stdlib,
out = main_archive,
)
# Declare an output file for the executable and link it.
executable = ctx.actions.declare_file(ctx.label.name)
go_link(
ctx,
main = main_archive,
stdlib = ctx.file._stdlib,
out = executable,
)
# Return the DefaultInfo provider. This tells Bazel what files should be
# built when someone asks to build a go_binary rule. It also says which
# file is executable (in this case, there's only one).
return [DefaultInfo(
files = depset([executable]),
executable = executable,
)]
An implementation function takes a single argument, a ctx
object. This provides an API used to access rule attributes and to declare files and actions. It also exposes lots of useful metadata.
The first thing we do here is compile the main
package. (For readers unfamiliar with Go, the package is the compilation unit; multiple .go source files may be compiled into a single .a package file). We declare a main.a
output file using ctx.actions.declare_file
, which returns a File
object. We then call go_compile
to declare the compile action (which we'll get to in just a minute).
Next, we'll link our main.a
into a standalone executable. We declare our executable file, then call go_link
(which we'll also define in just a minute).
Finally, we need to tell Bazel what we've done by returning a list of providers. A provider is a struct
value returned by a rule that contains information needed by other rules and by Bazel itself. DefaultInfo
is a special provider that all rules should return. Here, we store two useful pieces of information. files
is a depset
(more on depsets
another time) that lists the files that should be built when another rule depends on our rule or when someone runs bazel build
on our rule. No one cares about the main.a
file, so we just return the binary file here. And executable
points to our executable file. If someone runs bazel run
on our rule, this is the file that Bazel should execute.
go_compile
and go_link
actions
I chose to define the go_compile
and go_link
actions in separate functions. They could easily be inlined in the rule above. However, actions are frequently shared by multiple rules. In later articles, when we define go_library
and go_test
rules, we'll need to compile more packages, and we'll need to link a new kind of binary. We can't call go_binary
from those rules, so it makes sense to pull these actions out into functions in actions.bzl
.
Here's go_compile
.
def go_compile(ctx, *, srcs, importpath, stdlib, out):
"""Compiles a single Go package from sources.
Args:
ctx: analysis context.
srcs: list of source Files to be compiled.
importpath: the path other libraries may use to import this package.
stdlib: a File for the compiled standard library directory.
out: output .a File.
"""
cmd = r"""
importcfg=$(mktemp)
pushd {stdlib} >/dev/null
for file in $(find -L . -type f); do
without_suffix="${{file%.a}}"
pkg_path="${{without_suffix#./}}"
abs_file="$PWD/$file"
printf "packagefile %s=%s\n" "$pkg_path" "$abs_file" >>"$importcfg"
done
popd >/dev/null
go tool compile -o {out} -p {importpath} -importcfg "$importcfg" -- {srcs}
""".format(
stdlib = shell.quote(stdlib.path),
out = shell.quote(out.path),
importpath = shell.quote(importpath),
srcs = " ".join([shell.quote(src.path) for src in srcs]),
)
ctx.actions.run_shell(
mnemonic = "GoCompile",
outputs = [out],
inputs = srcs + [stdlib],
command = cmd,
env = {"GOPATH": "/dev/null"}, # suppress warning
use_default_shell_env = True,
)
This function builds a Bash command to invoke the compiler, then calls ctx.actions.run_shell
to declare an action that runs that command. run_shell
takes our command, a list of input files that will be made available in the sandbox, and a list of output files that Bazel will expect. (Don't worry too much about the Bash command itself; it's mostly Go-specific, and we'll clean it up in later article.)
Our go_link
function is similar.
def go_link(ctx, *, main, stdlib, out):
"""Links a Go executable.
Args:
ctx: analysis context.
main: archive file for the main package.
stdlib: a File for the compile standard library directory.
out: output executable file.
"""
cmd = r"""
importcfg=$(mktemp)
pushd {stdlib} >/dev/null
for file in $(find -L . -type f); do
without_suffix="${{file%.a}}"
pkg_path="${{without_suffix#./}}"
abs_file="$PWD/$file"
printf "packagefile %s=%s\n" "$pkg_path" "$abs_file" >>"$importcfg"
done
popd >/dev/null
go tool link -o {out} -importcfg "$importcfg" -- {main}
""".format(
stdlib = shell.quote(stdlib.path),
main = shell.quote(main.path),
out = shell.quote(out.path),
)
ctx.actions.run_shell(
mnemonic = "GoLink",
outputs = [out],
inputs = [main, stdlib],
command = cmd,
env = {"GOPATH": "/dev/null"}, # suppress warning
use_default_shell_env = True,
)
I wanted to keep this article from getting too absurdly long, so I chose to to keep things simple instead of doing it the Right Way. In general, I'd caution against using any Bash commands in Bazel actions for several reasons.
- It's hard to write portable commands. macOS has different versions of most shell commands than Linux with different flags. In Windows you'll need to rewrite everything in Powershell or Batch.
- It's hard to get quoting and escaping right. Definitely use
shell.quote
from@bazel_skylib
). - You can't assume most commands are installed in the host environment. This tutorial assumes
go
is installed, but we don't normally want to depend on that, and we'll fix it in a later article. - To help isolate actions from the host environment, Bazel clears most environment variables and hides files using the sandbox. To make this work, we set
use_default_shell_env = True
, which plumbsPATH
and a few other environment variables into the action environment. This has a major drawback: ifPATH
changes, it invalidates your cache, and you need to rebuild everything.
Instead of writing Bash commands, it's better to compile tools with Bazel and use those. That lets you write more sophisticated and hermetic actions in your language of choice.
Exposing a public interface
It's useful to have declarations for all public symbols in one file. This way, you can refactor your rules without requiring users to update load
statements in their projects. A load
statement imports public symbols from another .bzl file into the current file. So all we have to do is create one file that loads our public symbols. That's def.bzl.
load("//internal:rules.bzl", _go_binary = "go_binary")
go_binary = _go_binary
In very old versions of Bazel, simply loading a symbol in a .bzl file would make it available for loading in other files. In modern versions, a symbol must be defined in order for it to be loadable. It's still a good practice to put your public definitions in one file, but it takes a little more work. Above, we load the internal go_binary
as _go_binary
, then redefine that as go_binary
.
Testing the go_binary
rule
To test go_binary
, we can define a sh_test
rule that runs a go_binary
rule and checks its output. Here's our build file, tests/BUILD.bazel
:
load("//:def.bzl", "go_binary")
sh_test(
name = "hello_test",
srcs = ["hello_test.sh"],
args = ["$(rootpath :hello)"],
data = [":hello"],
)
go_binary(
name = "hello",
srcs = [
"hello.go",
"message.go",
],
)
Our go_binary
rule has two sources, hello.go
and message.go
. It just prints "Hello, world!"
.
Our test has a data
dependency on the hello
binary. This means that when the test is run, Bazel will build hello
and make it available. To avoid hardcoding the location of the binary in the test, we pass it in as an argument. See Predefined source/output path variables for how this works.
Here's our test script:
#!/bin/bash
set -euo pipefail
program="$1"
got=$("$program")
want="Hello, world!"
if [ "$got" != "$want" ]; then
cat >&2 <<EOF
got:
$got
want:
$want
EOF
exit 1
fi
You can test this out with bazel test //tests:hello_test
.
For more complicated rules, Skylib also has a unittest
module and build_test
and analysis_test
rules. However, there's no common framework for running end-to-end tests that invoke Bazel.