Writing Bazel rules: library rule, depsets, providers
In the last article, we built a go_binary
rule that compiled and linked a Go executable from a list of sources. This time, we'll define a go_library
rule that can compile a Go package that can be depended on by other libraries and binaries.
This article focuses on rules that communicate with each other to build a dependency graph that can be used by a linker (or a linker-like action). All the of the code is from github.com/jayconrod/rules_go_simple on the v2
branch.
Once again, you don't need to know Go to understand this. I'm just using Go as an example because that's what I work on.
Background
Before we jump in, we need to cover three important concepts: structs, providers, and depsets. They are data structures used to pass information between rules, and we'll need them to gather information about dependencies.
Structs
Structs are a basic data structure in Starlark (technically, structs are not part of the Starlark language; they are provided by Bazel). A struct value is essentially a tuple with a name for each value. You can create a struct value by calling the struct
function:
my_value = struct( foo = 12, bar = 34, )
You can access fields in the struct the same way you would access fields in an object in Python.
print(my_value.foo + my_value.bar)
You can use the dir
function to get a list of field names of a struct. getattr
and hasattr
work the way you'd expect, but you can't modify or delete attributes after they're set because struct values are immutable. There are also to_json
and to_proto
methods on every struct, which you may find useful.
Providers
A provider
is a named struct that contains information about a rule. Rule implementation functions return provider structs when they're evaluated. Providers can be read by anything that depends on the rule. In the last article, our go_binary
rule returned a DefaultInfo
provider (one of the built-in providers). This time, we'll define a GoLibraryInfo
provider that carries metadata about our libraries.
You can define a new provider by calling the provider
function.
MyProvider = provider( doc = "My custom provider", fields = { "foo": "A foo value", "bar": "A bar value", }, )
Depsets
Bazel provides a special purpose data structure called a depset. Like any set, a depset is a set of unique values. Depsets distinguish themselves from other kinds of sets by being fast to merge and having a well-defined iteration order.
Depsets are typically used to accumulate information like sources or header files over potentially large dependency graphs. In this article, we'll use depsets to accumulate information about dependencies. The linker will be able to use this information without needing to explicitly write all transitive dependencies in the go_binary
rule.
A depset comprises a list of direct elements, a list of transitive children, and an iteration order.
Constructing a depset is fast because it just involves creating an object with direct and transitive lists. This takes O(D+T) time where D is the number of elements in the direct list and T is the number of transitive children. Bazel deduplicates elements of both lists when constructing sets. Iterating a depset or converting it to a list takes O(n) time where n is the number of elements in the set and all of its children, including duplicates.
Defining go_library
The GoLibraryInfo
provider
Ok, the theory is out of the way, let's get to the code.
First, we'll define a new provider. GoLibraryInfo
will carry information about each library and its dependencies. We'll define it in a new file, providers.bzl
.
GoLibraryInfo = provider( doc = "Contains information about a Go library", fields = { "info": """A struct containing information about this library. Has the following fields: importpath: Name by which the library may be imported. archive: The .a file compiled from the library's sources. """, "deps": "A depset of info structs for this library's dependencies", }, )
Technically, we don't need to list the fields or provide any documentation here, but we may be able to generate HTML documentation from this some day.
The go_library
rule
Now we can define the go_library
rule. It uses the same go_compile
function as go_binary
. Here's the new rule declaration in rules.bzl.
go_library = rule( _go_library_impl, attrs = { "srcs": attr.label_list( allow_files = [".go"], doc = "Source files to compile", ), "deps": attr.label_list( providers = [GoLibraryInfo], doc = "Direct dependencies of the library", ), "importpath": attr.string( mandatory = True, doc = "Name by which the library may be imported", ), }, doc = "Compiles a Go archive from Go sources and dependencies", )
There are three attributes here. srcs
is a list of labels that refer to source .go files or rules that generate .go files. deps
is a list of labels that refer to other Go library rules. They don't have to be go_library
specifically, but they have to return GoLibraryInfo
providers to be compatible. importpath
is just a string. We'll use that to name the output files such that the Go compiler and linker can find them.
Here's the implementation of the rule.
def _go_library_impl(ctx): # Declare an output file for the library package and compile it from srcs. archive = declare_archive(ctx, ctx.attr.importpath) go_compile( ctx, srcs = ctx.files.srcs, deps = [dep[GoLibraryInfo] for dep in ctx.attr.deps], out = archive, ) # Return the output file and metadata about the library. return [ DefaultInfo(files = depset([archive])), GoLibraryInfo( info = struct( importpath = ctx.attr.importpath, archive = archive, ), deps = depset( direct = [dep[GoLibraryInfo].info for dep in ctx.attr.deps], transitive = [dep[GoLibraryInfo].deps for dep in ctx.attr.deps], ), ), ]
First, we use declare_archive
, a new function defined in actions.bzl, to declare our output file. (For curious Go users, an archive with the import path github.com/foo/bar
will be named rule_label%/github.com/foo/bar/baz.a
. We can pass the directory rule_label%
to the compiler and linker with -I
and -L
flags respectively so that the archives may be found. -importcfg
is a better mechanism for this, but I didn't want to complicate this article too much.)
Next, we compile the library using our go_compile
function from before. We access the list of source files through ctx.files.srcs
, which is a flat list of files from the srcs
attribute. Individual targets in srcs
may refer to multiple source files (for example, if we refer to a filegroup
or a rule that generates source code), but we just want a flat list. We access dependencies through ctx.attr.deps
, which is a list of Targets
. Providers can be read from a Target
with a subscript expression (dep[GoLibraryInfo]
above).
Finally, we return a list of two providers, DefaultInfo
and GoLibraryInfo
. The GoLibraryInfo.info
field is a struct with information about the library being compiled. It's important that this struct is immutable and is relatively small, since it will be added to a depset
(the GoLibraryInfo.deps
field of other libraries) and hashed.
go_compile
and go_link
There was an important change to go_compile
and go_link
. Did you catch it? Both now accept a deps
argument, a list of GoLibraryInfo
objects for direct dependencies.
go_compile
uses this to generate -I
flags for the compiler (import search paths). The compiler only needs search paths for compiled direct dependencies.
def go_compile(ctx, srcs, out, deps = []): """Compiles a single Go package from sources. Args: ctx: analysis context. srcs: list of source Files to be compiled. out: output .a file. Should have the importpath as a suffix, for example, library "example.com/foo" should have the path "somedir/example.com/foo.a". deps: list of GoLibraryInfo objects for direct dependencies. """ dep_import_args = [] dep_archives = [] for dep in deps: dep_import_args.append("-I " + shell.quote(_search_dir(dep.info))) dep_archives.append(dep.info.archive) cmd = "go tool compile -o {out} {imports} -- {srcs}".format( out = shell.quote(out.path), imports = " ".join(dep_import_args), srcs = " ".join([shell.quote(src.path) for src in srcs]), ) ctx.actions.run_shell( outputs = [out], inputs = srcs + dep_archives, command = cmd, mnemonic = "GoCompile", use_default_shell_env = True, )
go_link
uses this to generate -L
flags for the linker. The linker needs to know about all transitive dependencies, not just the direct dependencies of the binary. That's why we needed GoLibraryInfo.deps
; the linker needs to know about everything.
def go_link(ctx, out, main, deps = []): """Links a Go executable. Args: ctx: analysis context. out: output executable file. main: archive file for the main package. deps: list of GoLibraryInfo objects for direct dependencies. """ deps_set = depset( direct = [d.info for d in deps], transitive = [d.deps for d in deps], ) dep_lib_args = [] dep_archives = [] for dep in deps_set.to_list(): dep_lib_args.append("-L " + shell.quote(_search_dir(dep))) dep_archives.append(dep.archive) cmd = "go tool link -o {out} {libs} -- {main}".format( out = shell.quote(out.path), libs = " ".join(dep_lib_args), main = shell.quote(main.path), ) ctx.actions.run_shell( outputs = [out], inputs = [main] + dep_archives, command = cmd, mnemonic = "GoLink", use_default_shell_env = True, )
go_binary
now includes a deps
attribute and calls go_link
with GoLibraryInfo
providers from those targets. I won't reproduce the entire source here because it's a very small change from last time.
Exposing a public interface
All our definitions are in an internal directory, and we need to make them available for other people to use. So we load them in def.bzl
, which just contains our public definitions. We expose both go_library
and GoLibraryInfo
. The latter will be needed by anyone who wants to implement compatible rules.
load( "//internal:rules.bzl", _go_binary = "go_binary", _go_library = "go_library", ) load( "//internal:providers.bzl", _GoLibraryInfo = "GoLibraryInfo", ) go_binary = _go_binary go_library = _go_library GoLibraryInfo = _GoLibraryInfo
Testing the go_library
rule
We'll test our new functionality the same way we did before: using an sh_test
that runs a go_binary
built with our new functionality:
sh_test( name = "bin_with_libs_test", srcs = ["bin_with_libs_test.sh"], args = ["$(location :bin_with_libs)"], data = [":bin_with_libs"], ) go_binary( name = "bin_with_libs", srcs = ["bin_with_libs.go"], deps = [":foo"], ) go_library( name = "foo", srcs = ["foo.go"], importpath = "rules_go_simple/tests/foo", deps = [ ":bar", ":baz", ], ) go_library( name = "bar", srcs = ["bar.go"], importpath = "rules_go_simple/tests/bar", deps = [":baz"], ) go_library( name = "baz", srcs = ["baz.go"], importpath = "rules_go_simple/tests/baz", )
You can test this out with bazel test //tests/...
.