Writing Bazel rules: simple binary rule

Published on 2018-07-31
Edited on 2025-09-08
Tagged: bazel go

View All Posts | RSS RSS feed

This article is part of the series "Writing Bazel rules".

Bazel is an open source build tool created by Google. It has a number of strengths that make it a good fit for large projects: highly parallel remote build and execution, remote caching, mature support for many languages, and the ability to scale to monorepos with billions of lines of code.

Bazel's learning curve can be steep compared with simple build tools like Make and language-specific tools like Maven, especially when you want to extend Bazel to do something new.

This tutorial is the first in a series on writing Bazel rules. If you want to extend Bazel to support a new language or tool, this is a great place to start. In this article, we'll write a Bazel rule that builds a simple binary written in Go. In later articles, we'll cover libraries, tests, toolchains, modules, and more.

I'll assume you're familiar with basic Bazel basics. If you're new to Bazel, head over to Getting Started and follow the First build guides in your preferred language.

How Bazel works

In each of these articles, I'll cover some of the theory of how Bazel works. Since this is the first article in the series, we'll start with the basics.

Starlark

Starlark is Bazel's configuration and extension language. It's essentially Python without some of the advanced features: Starlark has no classes, exceptions, or generators, and the module system is quite different. Starlark avoids being Turing complete by dynamically forbidding recursion and only allowing loops over data structures with fixed size. You can find a full list of differences in Bazel's documentation and in the language spec. These limitations prevent Starlark extensions from getting too complicated; most of the complexity should be pushed out into tools.

Aside: Starlark can be used on its own outside of Bazel. Facebook's Buck build tool also uses Starlark. Alan Donovan gave a talk on Starlark at GothamGo 2017 with an example of using Starlark to configure a web server. He's also published an embeddable Starlark interpreter written in Go.

Concepts: rules, packages, and more

To build things in Bazel, you need to write build files (named BUILD or BUILD.bazel). They look like this:

load("@rules_go//go:def.bzl", "go_binary", "go_library", "go_test")

go_library(
    name = "fetch_repo_lib",
    srcs = [
        "fetch_repo.go",
        "module.go",
        "vcs.go",
    ],
    importpath = "github.com/bazelbuild/bazel-gazelle/cmd/fetch_repo",
    visibility = ["//visibility:private"],
    deps = ["@org_golang_x_tools_go_vcs//:vcs"],
)

go_binary(
    name = "fetch_repo",
    embed = [":fetch_repo_lib"],
    visibility = ["//visibility:public"],
)

go_test(
    name = "fetch_repo_test",
    srcs = ["fetch_repo_test.go"],
    embed = [":fetch_repo_lib"],
    deps = ["@org_golang_x_tools_go_vcs//:vcs"],
)

A build file contains a number of targets, written as Starlark function calls. The syntax is declarative: you say what you want to build, not how to build it. In this example, we define a Go library ("fetch_repo_lib") with a handful of source files. A binary ("fetch_repo") is built from that library. We also have a test ("fetch_repo_test") built from that library and an additional source file ("fetch_repo_test.go").

The thing being called to define a target is a rule. go_binary is one of the rules in the example above. A rule has a number of attributes like srcs and deps used to describe a target. It also has an implementation function (not shown yet) that tells Bazel what commands to run and what output files to expect when building a target.

Each build file implicitly defines a Bazel package. A package consists of the targets declared in the build file and all of the files in the package's directory and subdirectories, excluding targets and files defined in other packages' subdirectories. Visibility restrictions are usually applied at the package level, and globs (wildcard patterns used to match source files) only match files within a package. Frequently (not always), you'll have one package per directory.

Targets and files are named using labels, which are strings that look like "@rules_go//go:def.bzl". Labels have three parts: a repository name (rules_go), a package name (go), and a file or target name (def.bzl). The repository name and the package name may be omitted when a label refers to something in the same repository or package.

Load, analysis, and execution phases

Bazel builds targets in three phases: load, analysis, and execution. (Actually, it's more complicated, but for the purpose of writing simple rules, these are all you need to know about).

In the loading phase, Bazel reads and evaluates build files. It builds a graph of targets and dependencies. For example, if you build fetch_repo_test above, Bazel constructs a graph with a fetch_repo_test node that depends on fetch_repo_test.go, :fetch_repo_lib, and @org_golang_x_tools_go_vcs//:vcs via srcs, embed, and deps edges, respectively.

In the analysis phase, Bazel evaluates rules in the target graph. Each rule implementation function declares output files and actions to produce those files. The result of the analysis phase is the file-action graph.

In the execution phase, Bazel runs actions in the file-action graph needed to produce output files. Bazel has several strategies for running actions. By default, it runs actions within a sandbox that only exposes declared inputs and hides other files. This makes builds more hermetic, since it's harder to accidentally depend on system files that vary from machine to machine. Bazel may also run actions on remote build servers where this isolation happens automatically.

It's important to note that a rules cannot directly perform any I/O during the loading or analysis phases. A rule merely tells Bazel how it should execute commands to build targets. Only the commands themselves can read and write files, during the execution phase. This means a rule can't make any decisions based on the contents of source files. Build files need to explicitly declare dependencies; it's not possible for rules to automatically discover dependencies during the build.

Setting up rules_go_simple

Okay, we've gotten all the theory out of the way for today. Let's dive into the code. We're going to write "rules_go_simple", a simplified version of rules_go. Don't worry if you don't know Go — it's just used as an example. We won't be writing any Go code today.

I've created an example repository at github.com/jayconrod/rules_go_simple. For this article, we'll be looking at the v1 branch. In later articles, we'll add features to branches with higher version numbers.

The first thing we need is a MODULE.bazel file. Every Bazel project should have one of these in the project root directory. MODULE.bazel declares external dependencies on other Bazel modules that we need to build and test our project. In our case, we have one dependency, bazel_skylib, a module with several Starlark libraries that are useful for writing and testing rules. We'll use it to quote strings in shell commands.

Here's our MODULE.bazel file (with most comments removed).

module(name = "rules_go_simple")

# bazel_skylib is a common library for writing and testing Bazel rules.
bazel_dep(name = "bazel_skylib", version = "1.7.1")

Declaring a repository causes Bazel to download its MODULE.bazel file but not its source code. Bazel will only download a repository if it needs something inside.

The first time you run a build command, Bazel will generate MODULE.bazel.lock, a JSON file that contains some metadata about dependencies. You should check this file into version control, but you don't need to edit it manually. For bazel_skylib, it contains SHA256 sums for MODULE.lock and source.json. It's important to record these sums to detect if a module is tampered with later; a released version of a Bazel module cannot be modified once published. You can run bazel mod tidy to completely fill out MODULE.bazel.lock and to remove any metadata that's not needed anymore.

Declaring the go_binary rule

To define our binary rule, we'll create a new file, internal/rules.bzl. We'll start with a declaration like this:

go_binary = rule(
    implementation = _go_binary_impl,
    attrs = {
        "srcs": attr.label_list(
            allow_files = [".go"],
            doc = "Source files to compile for the main package of this binary",
        ),
        "_stdlib": attr.label(
            allow_single_file = True,
            default = "//internal:stdlib",
        ),
    },
    doc = "Builds an executable program from Go source code",
    executable = True,
)

You may want to refer to the Bazel documentation for rule and attr. There's a lot here, so let's break it down.

Note that all rules automatically support a set of common attributes like name, visibility, and tags. These don't need to be declared explicitly.

Implementing go_binary

Let's look at our implementation function next.

def _go_binary_impl(ctx):
    # Declare an output file for the main package and compile it from srcs.
    main_archive = ctx.actions.declare_file("{name}.a".format(name = ctx.label.name))
    go_compile(
        ctx,
        srcs = ctx.files.srcs,
        importpath = "main",
        stdlib = ctx.file._stdlib,
        out = main_archive,
    )

    # Declare an output file for the executable and link it.
    executable = ctx.actions.declare_file(ctx.label.name)
    go_link(
        ctx,
        main = main_archive,
        stdlib = ctx.file._stdlib,
        out = executable,
    )

    # Return the DefaultInfo provider. This tells Bazel what files should be
    # built when someone asks to build a go_binary rule. It also says which
    # file is executable (in this case, there's only one).
    return [DefaultInfo(
        files = depset([executable]),
        executable = executable,
    )]

An implementation function takes a single argument, a ctx object. This provides an API used to access rule attributes and to declare files and actions. It also exposes lots of useful metadata.

The first thing we do here is compile the main package. (For readers unfamiliar with Go, the package is the compilation unit; multiple .go source files may be compiled into a single .a package file). We declare a main.a output file using ctx.actions.declare_file, which returns a File object. We then call go_compile to declare the compile action (which we'll get to in just a minute).

Next, we'll link our main.a into a standalone executable. We declare our executable file, then call go_link (which we'll also define in just a minute).

Finally, we need to tell Bazel what we've done by returning a list of providers. A provider is a struct value returned by a rule that contains information needed by other rules and by Bazel itself. DefaultInfo is a special provider that all rules should return. Here, we store two useful pieces of information. files is a depset (more on depsets another time) that lists the files that should be built when another rule depends on our rule or when someone runs bazel build on our rule. No one cares about the main.a file, so we just return the binary file here. And executable points to our executable file. If someone runs bazel run on our rule, this is the file that Bazel should execute.

go_compile and go_link actions

I chose to define the go_compile and go_link actions in separate functions. They could easily be inlined in the rule above. However, actions are frequently shared by multiple rules. In later articles, when we define go_library and go_test rules, we'll need to compile more packages, and we'll need to link a new kind of binary. We can't call go_binary from those rules, so it makes sense to pull these actions out into functions in actions.bzl.

Here's go_compile.

def go_compile(ctx, *, srcs, importpath, stdlib, out):
    """Compiles a single Go package from sources.

    Args:
        ctx: analysis context.
        srcs: list of source Files to be compiled.
        importpath: the path other libraries may use to import this package.
        stdlib: a File for the compiled standard library directory.
        out: output .a File.
    """

    cmd = r"""
    importcfg=$(mktemp)
    pushd {stdlib} >/dev/null
    for file in $(find -L . -type f); do
      without_suffix="${{file%.a}}"
      pkg_path="${{without_suffix#./}}"
      abs_file="$PWD/$file"
      printf "packagefile %s=%s\n" "$pkg_path" "$abs_file" >>"$importcfg"
    done
    popd >/dev/null
    go tool compile -o {out} -p {importpath} -importcfg "$importcfg" -- {srcs}
    """.format(
        stdlib = shell.quote(stdlib.path),
        out = shell.quote(out.path),
        importpath = shell.quote(importpath),
        srcs = " ".join([shell.quote(src.path) for src in srcs]),
    )
    ctx.actions.run_shell(
        mnemonic = "GoCompile",
        outputs = [out],
        inputs = srcs + [stdlib],
        command = cmd,
        env = {"GOPATH": "/dev/null"},  # suppress warning
        use_default_shell_env = True,
    )

This function builds a Bash command to invoke the compiler, then calls ctx.actions.run_shell to declare an action that runs that command. run_shell takes our command, a list of input files that will be made available in the sandbox, and a list of output files that Bazel will expect. (Don't worry too much about the Bash command itself; it's mostly Go-specific, and we'll clean it up in later article.)

Our go_link function is similar.

def go_link(ctx, *, main, stdlib, out):
    """Links a Go executable.

    Args:
        ctx: analysis context.
        main: archive file for the main package.
        stdlib: a File for the compile standard library directory.
        out: output executable file.
    """
    cmd = r"""
    importcfg=$(mktemp)
    pushd {stdlib} >/dev/null
    for file in $(find -L . -type f); do
      without_suffix="${{file%.a}}"
      pkg_path="${{without_suffix#./}}"
      abs_file="$PWD/$file"
      printf "packagefile %s=%s\n" "$pkg_path" "$abs_file" >>"$importcfg"
    done
    popd >/dev/null
    go tool link -o {out} -importcfg "$importcfg" -- {main}
    """.format(
        stdlib = shell.quote(stdlib.path),
        main = shell.quote(main.path),
        out = shell.quote(out.path),
    )
    ctx.actions.run_shell(
        mnemonic = "GoLink",
        outputs = [out],
        inputs = [main, stdlib],
        command = cmd,
        env = {"GOPATH": "/dev/null"},  # suppress warning
        use_default_shell_env = True,
    )

I wanted to keep this article from getting too absurdly long, so I chose to to keep things simple instead of doing it the Right Way. In general, I'd caution against using any Bash commands in Bazel actions for several reasons.

Instead of writing Bash commands, it's better to compile tools with Bazel and use those. That lets you write more sophisticated and hermetic actions in your language of choice.

Exposing a public interface

It's useful to have declarations for all public symbols in one file. This way, you can refactor your rules without requiring users to update load statements in their projects. A load statement imports public symbols from another .bzl file into the current file. So all we have to do is create one file that loads our public symbols. That's def.bzl.

load("//internal:rules.bzl", _go_binary = "go_binary")

go_binary = _go_binary

In very old versions of Bazel, simply loading a symbol in a .bzl file would make it available for loading in other files. In modern versions, a symbol must be defined in order for it to be loadable. It's still a good practice to put your public definitions in one file, but it takes a little more work. Above, we load the internal go_binary as _go_binary, then redefine that as go_binary.

Testing the go_binary rule

To test go_binary, we can define a sh_test rule that runs a go_binary rule and checks its output. Here's our build file, tests/BUILD.bazel:

load("//:def.bzl", "go_binary")

sh_test(
    name = "hello_test",
    srcs = ["hello_test.sh"],
    args = ["$(rootpath :hello)"],
    data = [":hello"],
)

go_binary(
    name = "hello",
    srcs = [
        "hello.go",
        "message.go",
    ],
)

Our go_binary rule has two sources, hello.go and message.go. It just prints "Hello, world!". Our test has a data dependency on the hello binary. This means that when the test is run, Bazel will build hello and make it available. To avoid hardcoding the location of the binary in the test, we pass it in as an argument. See Predefined source/output path variables for how this works.

Here's our test script:

#!/bin/bash

set -euo pipefail

program="$1"
got=$("$program")
want="Hello, world!"

if [ "$got" != "$want" ]; then
  cat >&2 <<EOF
got:
$got

want:
$want
EOF
  exit 1
fi

You can test this out with bazel test //tests:hello_test.

For more complicated rules, Skylib also has a unittest module and build_test and analysis_test rules. However, there's no common framework for running end-to-end tests that invoke Bazel.