Writing Bazel rules: library rule, depsets, providers

Published on 2018-08-15
Tagged: bazel go

In the last article, we built a go_binary rule that compiled and linked a Go executable from a list of sources. This time, we'll define a go_library rule that can compile a Go package that can be depended on by other libraries and binaries.

This article focuses on rules that communicate with each other to build a dependency graph that can be used by a linker (or a linker-like action). All the of the code is from github.com/jayconrod/rules_go_simple/v2.

Once again, you don't need to know Go to understand this. I'm just using Go as an example because that's what I work on.

Background

Before we jump in, we need to cover three important concepts: structs, providers, and depsets. They are data structures used to pass information between rules, and we'll need them to gather information about dependencies.

Structs

Structs are a basic data structure in Skylark (technically, structs are not part of the Skylark language; they are provided by Bazel). A struct value is essentially a tuple with a name for each value. You can create a struct value by calling the struct function:

my_value = struct(
    foo = 12,
    bar = 34,
)

You can access fields in the struct the same way you would access fields in an object in Python.

print(my_value.foo + my_value.bar)

You can use the dir function to get a list of field names of a struct. getattr and hasattr work the way you'd expect, but you can't modify or delete attributes after they're set because struct values are immutable. There are also to_json and to_proto methods on every struct, which you may find useful.

Providers

A provider is a named struct that contains information about a rule. Rule implementation functions return provider structs when they're evaluated. Providers can be read by anything that depends on the rule. In the last article, our go_binary rule returned a DefaultInfo provider (one of the built-in providers). This time, we'll define a GoLibrary provider that carries metadata about our libraries.

You can define a new provider by calling the provider function.

MyProvider = provider(
    doc = "My custom provider",
    fields = {
        "foo": "A foo value",
        "bar": "A bar value",
    },
)

Depsets

Bazel provides a special purpose data structure called a depset. Like any set, a depset is a set of unique values. Depsets distinguish themselves from other kinds of sets by being fast to merge and having a well-defined iteration order.

Depsets are typically used to accumulate information like sources or header files over potentially large dependency graphs. In this article, we'll use depsets to accumulate information about dependencies. The linker will be able to use this information without needing to explicitly write all transitive dependencies in the go_binary rule.

A depset comprises a list of direct elements, a list of transitive children, and an iteration order.

Diagram of a depset

Constructing a depset is fast because it just involves creating an object with direct and transitive lists. This takes O(D+T) time where D is the number of elements in the direct list and T is the number of transitive children. Bazel deduplicates elements of both lists when constructing sets. Iterating a depset or converting it to a list takes O(n) time where n is the number of elements in the set and all of its children, including duplicates.

Defining go_library

The GoLibrary provider

Ok, the theory is out of the way, let's get to the code.

First, we'll define a new provider. GoLibrary will carry information about each library and its dependencies. We'll define it in a new file, providers.bzl.

GoLibrary = provider(
    doc = "Contains information about a Go library",
    fields = {
        "info": """A struct containing information about this library.
        Has the following fields:
            importpath: Name by which the library may be imported.
            archive: The .a file compiled from the library's sources.
        """,
        "deps": "A depset of info structs for this library's dependencies",
    },
)

Technically, we don't need to list the fields or provide any documentation here, but we may be able to generate HTML documentation from this some day.

The go_library rule

Now we can define the go_library rule. It uses the same go_compile function as go_binary. Here's the new rule declaration in rules.bzl.

go_library = rule(
    _go_library_impl,
    attrs = {
        "srcs": attr.label_list(
            allow_files = [".go"],
            doc = "Source files to compile",
        ),
        "deps": attr.label_list(
            providers = [GoLibrary],
            doc = "Direct dependencies of the library",
        ),
        "importpath": attr.string(
            mandatory = True,
            doc = "Name by which the library may be imported",
        ),
    },
    doc = "Compiles a Go archive from Go sources and dependencies",
)

There are three attributes here. srcs is a list of labels that refer to source .go files or rules that generate .go files. deps is a list of labels that refer to other Go library rules. They don't have to be go_library specifically, but they have to return GoLibrary providers to be compatible. importpath is just a string. We'll use that to name the output files such that the Go compiler and linker can find them.

Here's the implementation of the rule.

def _go_library_impl(ctx):
    # Declare an output file for the library package and compile it from srcs.
    archive = declare_archive(ctx, ctx.attr.importpath)
    go_compile(
        ctx,
        srcs = ctx.files.srcs,
        deps = [dep[GoLibrary] for dep in ctx.attr.deps],
        out = archive,
    )

    # Return the output file and metadata about the library.
    return [
        DefaultInfo(files = depset([archive])),
        GoLibrary(
            info = struct(
                importpath = ctx.attr.importpath,
                archive = archive,
            ),
            deps = depset(
                direct = [dep[GoLibrary].info for dep in ctx.attr.deps],
                transitive = [dep[GoLibrary].deps for dep in ctx.attr.deps],
            ),
        ),
    ]

First, we use declare_archive, a new function defined in actions.bzl, to declare our output file. (For curious Go users, an archive with the import path github.com/foo/bar will be named rule_label%/github.com/foo/bar/baz.a. We can pass the directory rule_label% to the compiler and linker with -I and -L flags respectively so that the archives may be found. -importcfg is a better mechanism for this, but I didn't want to complicate this article too much.)

Next, we compile the library using our go_compile function from before. We access the list of source files through ctx.files.srcs, which is a flat list of files from the srcs attribute. Individual targets in srcs may refer to multiple source files (for example, if we refer to a filegroup or a rule that generates source code), but we just want a flat list. We access dependencies through ctx.attr.deps, which is a list of Targets. Providers can be read from a Target with a subscript expression (dep[GoLibrary] above).

Finally, we return a list of two providers, DefaultInfo and GoLibrary. The GoLibrary.info field is a struct with information about the library being compiled. It's important that this struct is immutable and is relatively small, since it will be added to a depset (the GoLibrary.deps field of other libraries) and hashed.

go_compile and go_link

There was an important change to go_compile and go_link. Did you catch it? Both now accept a deps argument, a list of GoLibrary objects for direct dependencies.

go_compile uses this to generate -I flags for the compiler (import search paths). The compiler only needs search paths for compiled direct dependencies.

def go_compile(ctx, srcs, out, deps = []):
    """Compiles a single Go package from sources.

    Args:
        ctx: analysis context.
        srcs: list of source Files to be compiled.
        out: output .a file. Should have the importpath as a suffix,
            for example, library "example.com/foo" should have the path
            "somedir/example.com/foo.a".
        deps: list of GoLibrary objects for direct dependencies.
    """
    dep_import_args = []
    dep_archives = []
    for dep in deps:
        dep_import_args.append("-I " + shell.quote(_search_dir(dep.info)))
        dep_archives.append(dep.info.archive)

    cmd = "go tool compile -o {out} {imports} -- {srcs}".format(
        out = shell.quote(out.path),
        imports = " ".join(dep_import_args),
        srcs = " ".join([shell.quote(src.path) for src in srcs]),
    )
    ctx.actions.run_shell(
        outputs = [out],
        inputs = srcs + dep_archives,
        command = cmd,
        mnemonic = "GoCompile",
        use_default_shell_env = True,
    )

go_link uses this to generate -L flags for the linker. The linker needs to know about all transitive dependencies, not just the direct dependencies of the binary. That's why we needed GoLibrary.deps; the linker needs to know about everything.

def go_link(ctx, out, main, deps = []):
    """Links a Go executable.

    Args:
        ctx: analysis context.
        out: output executable file.
        main: archive file for the main package.
        deps: list of GoLibrary objects for direct dependencies.
    """
    deps_set = depset(
        direct = [d.info for d in deps],
        transitive = [d.deps for d in deps],
    )
    dep_lib_args = []
    dep_archives = []
    for dep in deps_set.to_list():
        dep_lib_args.append("-L " + shell.quote(_search_dir(dep)))
        dep_archives.append(dep.archive)

    cmd = "go tool link -o {out} {libs} -- {main}".format(
        out = shell.quote(out.path),
        libs = " ".join(dep_lib_args),
        main = shell.quote(main.path),
    )
    ctx.actions.run_shell(
        outputs = [out],
        inputs = [main] + dep_archives,
        command = cmd,
        mnemonic = "GoLink",
        use_default_shell_env = True,
    )

go_binary now includes a deps attribute and calls go_link with GoLibrary providers from those targets. I won't reproduce the entire source here because it's a very small change from last time.

Exposing a public interface

All our definitions are in an internal directory, and we need to make them available for other people to use. So we load them in def.bzl, which just contains our public definitions. We expose both go_library and GoLibrary. The latter will be needed by anyone who wants to implement compatible rules.

load("//v2/internal:rules.bzl", "go_binary", "go_library")
load("//v2/internal:providers.bzl", "GoLibrary")

Testing the go_library rule

We'll test our new functionality the same way we did before: using an sh_test that runs a go_binary built with our new functionality:

sh_test(
    name = "bin_with_libs_test",
    srcs = ["bin_with_libs_test.sh"],
    args = ["$(location :bin_with_libs)"],
    data = [":bin_with_libs"],
)

go_binary(
    name = "bin_with_libs",
    srcs = ["bin_with_libs.go"],
    deps = [":foo"],
)

go_library(
    name = "foo",
    srcs = ["foo.go"],
    importpath = "rules_go_simple/v2/tests/foo",
    deps = [
        ":bar",
        ":baz",
    ],
)

go_library(
    name = "bar",
    srcs = ["bar.go"],
    importpath = "rules_go_simple/v2/tests/bar",
    deps = [":baz"],
)

go_library(
    name = "baz",
    srcs = ["baz.go"],
    importpath = "rules_go_simple/v2/tests/baz",
)

You can test this out with bazel test //v2/tests/....