Writing Bazel rules: library rule, depsets, providers
In the last article, we built a go_binary
rule that compiled and linked a Go executable from a list of sources. This time, we'll define a go_library
rule that can compile a Go package that other libraries and binaries can depend on.
This article focuses on rules that communicate with each other to build a dependency graph that can be used by a linker (or a linker-like action). All the of the code is from rules_go_simple on the v2
branch.
Once again, you don't need to know Go to understand this. I'm just using Go as an example because that's what I like to work in.
Background
Before we jump in, we need to cover three important concepts: structs, providers, and depsets. They are data structures used to pass information between rules, and we'll need them to gather information about dependencies.
Structs
A struct value is a dictionary of key-value pairs, kind of like an object in Python or JavaScript. Although the struct is a basic data structure, it's provided by Bazel and is not technically part of the Starlark language. You can create a struct value by calling the struct
function:
my_value = struct(
foo = 12,
bar = 34,
)
You can access fields in the struct the same way you would access fields in an object in Python.
print(my_value.foo + my_value.bar)
You can use the dir
function to get a list of field names of a struct. getattr
and hasattr
work the way you'd expect, but you can't modify or delete attributes after they're set because struct values are immutable. You can convert a struct (or any value) to and from JSON with json.encode
and json.decode
.
Providers
A provider
is a named struct type that conveys information about a rule. A rule implementation function returns provider structs when its' evaluated. A provider can be read by a rule that depends on a rule that returns it. In the last article, our go_binary
rule returned a DefaultInfo
provider (one of the built-in providers). In this article we'll define a GoLibraryInfo
provider that carries metadata about our libraries.
You can define a new provider by calling the provider
function.
MyProvider = provider(
doc = "My custom provider",
fields = {
"foo": "A foo value",
"bar": "A bar value",
},
)
You can create a provider value just like a struct value:
my_provider = MyProvider(foo = 12, bar = 34)
Depsets
Bazel provides a special purpose data structure called a depset. Like any set, a depset is a set of unique values. Depsets distinguish themselves by being fast to merge and by having a well-defined iteration order.
Depsets are typically used to accumulate information like sources or header files over large dependency graphs. A dependency graph may contain hundreds of thousands of nodes, so it's important that all depset operations run in linear time and space. In this article, we'll use depsets to accumulate information about Go dependencies like import paths and compiled file names. The linker will be able to use this information without requiring go_binary
to explicitly list all transitive dependencies.
A depset comprises a list of direct elements, a list of transitive depset children, and an iteration order.
Constructing a depset is fast because it just involves creating an object with direct and transitive lists. This takes O(D+T) time where D is the number of elements in the direct list and T is the number of transitive children. Bazel deduplicates elements of both lists when constructing sets. Iterating a depset or converting it to a list takes O(n) time where n is the number of elements in the set and all of its children, including duplicates.
Defining go_library
The GoLibraryInfo
provider
Ok, the theory is out of the way, let's get to the code.
First, we define a new provider. GoLibraryInfo
carries information about each library and its dependencies. We define it in a new file, providers.bzl
.
GoLibraryInfo = provider(
doc = "Contains information about a Go library",
fields = {
"info": """A struct containing information about this library.
Has the following fields:
importpath: Name by which the library may be imported.
archive: The .a file compiled from the library's sources.
""",
"deps": "A depset of info structs for this library's dependencies",
},
)
doc
sets a documentation string for Stardoc. fields
lists the allowed fields in the provider, along with their documentation strings. You can set the init
argument to a custom constructor function, which may be useful if you want to perform more advanced validation or initialization. See provider
for details.
The go_library
rule
Now we can define the go_library
rule. Here's the new rule declaration in rules.bzl.
go_library = rule(
implementation = _go_library_impl,
attrs = {
"srcs": attr.label_list(
allow_files = [".go"],
doc = "Source files to compile",
),
"deps": attr.label_list(
providers = [GoLibraryInfo],
doc = "Direct dependencies of the library",
),
"importpath": attr.string(
mandatory = True,
doc = "Name by which the library may be imported",
),
"_stdlib": attr.label(
allow_single_file = True,
default = "//internal:stdlib",
doc = "Hidden dependency on the Go standard library",
),
},
doc = "Compiles a Go archive from Go sources and dependencies",
)
There are four attributes here. srcs
is a list of labels that refer to source .go files or rules that generate .go files. deps
is a list of labels that refer to other Go library rules. They don't have to be go_library
specifically, but they have to return GoLibraryInfo
providers to be compatible. importpath
is just a string. We'll use that to generate the importcfg
files that the compiler and linker use to map import strings to compiled .a
files. And finally, _stdlib
is a hidden dependency (name starts with _
) on the compiled standard library, same as in go_binary
.
Here's the implementation of the rule.
def _go_library_impl(ctx):
# Declare an output file for the library package and compile it from srcs.
archive = ctx.actions.declare_file("{name}.a".format(name = ctx.label.name))
go_compile(
ctx,
srcs = ctx.files.srcs,
importpath = ctx.attr.importpath,
stdlib = ctx.file._stdlib,
deps = [dep[GoLibraryInfo] for dep in ctx.attr.deps],
out = archive,
)
# Return the output file and metadata about the library.
return [
DefaultInfo(files = depset([archive])),
GoLibraryInfo(
info = struct(
importpath = ctx.attr.importpath,
archive = archive,
),
deps = depset(
direct = [dep[GoLibraryInfo].info for dep in ctx.attr.deps],
transitive = [dep[GoLibraryInfo].deps for dep in ctx.attr.deps],
),
),
]
First, we use ctx.actions.declare_file
to declare our compiled output file, then go_compile
to declare the compile command. We did the same thing in go_binary
.
Look at the different ways we access our attributes here.
ctx.files.srcs
gives us a list of all files from the srcs
attribute. This list may not be the same length as the list of labels passed to srcs
: for example, one of those labels might be a filegroup
containing any number of files.
ctx.attr.importpath
gives us a string value, since importpath
is string attribute.
ctx.file._stdlib
gives us a single File
(actually a directory) for _stdlib
, which is allowed because it was declared with allow_single_file = True
.
ctx.attr.deps
gives us a list of Target
. The subscript expression dep[GoLibraryInfo]
gives us the GoLibraryInfo
provider returned by that target.
Finally, we return a list of two providers, DefaultInfo
and GoLibraryInfo
. The GoLibraryInfo.info
field is a struct with information about the library being compiled. It's important that this struct is immutable and is relatively small, since it will be added to a depset
(the GoLibraryInfo.deps
field of other libraries) and hashed.
go_compile
and go_link
There was an important change to go_compile
and go_link
. Did you catch it? Both now accept a deps
argument, a list of GoLibraryInfo
providers for direct dependencies. In both cases, we use this list to generate an importcfg file. The Go compiler and linker use importcfg files to map import strings to compiled .a
files.
For go_compile
, the importcfg file only needs to list direct dependencies, so we generate its content like this:
dep_importcfg_text = "\n".join([
"packagefile {importpath}={filepath}".format(
importpath = dep.info.importpath,
filepath = dep.info.archive.path,
)
for dep in deps
])
For go_link
, the importcfg file needs to contain all transitive dependencies, so we create a depset first, then iterate over that. This is why we needed GoLibraryInfo.deps
.
deps_set = depset(
direct = [d.info for d in deps],
transitive = [d.deps for d in deps],
)
dep_importcfg_text = "\n".join([
"packagefile {importpath}={filepath}".format(
importpath = dep.importpath,
filepath = dep.archive.path,
)
for dep in deps_set.to_list()
])
I'll skip over the actual bash script this is injected into since it's ugly and not relevant to writing rules for other languages. We'll clean it up in a later article.
go_binary
has one other change: it now includes a deps
attribute and calls go_link
with GoLibraryInfo
providers from those targets. I won't reproduce the entire source here because it's a very small change from last time.
Exposing a public interface
All our definitions are in an internal directory, and we need to make them available for other people to use. So we load them in def.bzl
, which just contains our public definitions. We expose both go_library
and GoLibraryInfo
. The latter will be needed by anyone who wants to implement compatible rules.
load(
"//internal:rules.bzl",
_go_binary = "go_binary",
_go_library = "go_library",
)
load(
"//internal:providers.bzl",
_GoLibraryInfo = "GoLibraryInfo",
)
go_binary = _go_binary
go_library = _go_library
GoLibraryInfo = _GoLibraryInfo
Testing the go_library
rule
We'll test our new functionality the same way we did before: using an sh_test
that runs a go_binary
built with our new functionality:
sh_test(
name = "bin_with_libs_test",
srcs = ["bin_with_libs_test.sh"],
args = ["$(rootpath :bin_with_libs)"],
data = [":bin_with_libs"],
)
go_binary(
name = "bin_with_libs",
srcs = ["bin_with_libs.go"],
deps = [":foo"],
)
go_library(
name = "foo",
srcs = ["foo.go"],
importpath = "rules_go_simple/tests/foo",
deps = [
":bar",
":baz",
],
)
go_library(
name = "bar",
srcs = ["bar.go"],
importpath = "rules_go_simple/tests/bar",
deps = [":baz"],
)
go_library(
name = "baz",
srcs = ["baz.go"],
importpath = "rules_go_simple/tests/baz",
)
You can test this out with bazel test //tests/...
.