Writing Bazel rules: data and runfiles
Bazel has a neat feature that can simplify a lot of work with tests and executables: the ability to make data files available at run-time using data
attributes. You may have seen these in rules like this:
cc_library( name = "server_lib", srcs = ["server.cc"], data = ["private.key"], )
When a file is listed in a data
attribute (or something that behaves like a data
attribute), Bazel makes that file available at run-time to executables started with bazel run
. This is useful for all kinds of things such as plugins, configuration files, certificates and keys, and resources.
In this article, we'll add data attributes to the go_library
and go_binary
rules in rules_go_simple, the set of rules we've been working on. We'll be working on the v3
branch. This won't take long: we only need to add a few lines of code for each rule.
Data and runfiles
We can start by adding a data
attribute to our rules. Here's the new declaration for go_library
. The attribute in go_binary
is similar.
go_library = rule( implementation = _go_library_impl, attrs = { "srcs": attr.label_list( allow_files = [".go"], doc = "Source files to compile", ), "deps": attr.label_list( providers = [GoLibraryInfo], doc = "Direct dependencies of the library", ), "data": attr.label_list( allow_files = True, doc = "Data files available to binaries using this library", ), "importpath": attr.string( mandatory = True, doc = "Name by which the library may be imported", ), "_stdlib": attr.label( default = "//internal:stdlib", providers = [GoStdLibInfo], doc = "Hidden dependency on the Go standard library", ), }, doc = "Compiles a Go archive from Go sources and dependencies", )
Bazel tracks files that should be made available at run-time using runfiles
objects. You can create new runfiles
objects with ctx.runfiles
. In order to actually make files available, you need to put one of these in the runfiles
field in the DefaultInfo
provider returned by your rule. Recall that DefaultInfo
is used to list the output files and executables produced by a rule.
Here's how we create the DefaultInfo
provider for go_library
. Again, go_binary
is similar.
return [ DefaultInfo( files = depset([archive]), runfiles = ctx.runfiles(collect_data = True), ), ... ]
The expression ctx.runfiles(collect_data = True)
gathers the files listed in the data
attribute and the runfiles returned by rules in the deps
and srcs
attributes. That means any library can have data files, and they will be available to tests and binaries run with bazel run
that link that library.
There are a few different ways to call ctx.runfiles
. If you set collect_data = True
, as we did above, Bazel will collect data runfiles from dependencies in the srcs
, deps
, and data
attributes. If you set collect_default = True
, Bazel will collect default runfiles from the same dependencies. I have no idea what the distinction is between data and default runfiles, but when you construct DefaultInfo
, you can set the data_runfiles
or default_runfiles
fields explicitly. If you just set runfiles
, your files will be treated as both data and default.
What if you want to build the list of files explicitly? This is useful if you want to collect files from non-standard attributes, or if you create files within your rule. ctx.runfiles
accepts a files
argument, which is a simple list of files. You can access runfiles from your dependencies with an expression like dep[DefaultInfo].data_runfiles
, where dep
is a Target
. You can combine runfiles objects using runfiles.merge
, which returns a new runfiles
object.
So we could have implemented go_library
like this:
# Gather runfiles. runfiles = ctx.runfiles(files = ctx.files.data) for dep in ctx.attr.deps: runfiles = runfiles.merge(dep[DefaultInfo].data_runfiles) # Return the output file and metadata about the library. return [ DefaultInfo( files = depset([archive]), runfiles = runfiles, ), ... ]
NOTE: When you have an attribute that is a label
or label_list
, you can access a list of all the files from all the labels using ctx.files
(for example: ctx.files.data
). This is almost always more convenient than going through ctx.attr
(which gives you a Target
or a list of Targets
), since each target may have multiple files. If your label has allow_single_file = True
set, you can also access the file through ctx.file
. And if executable = True
, you can access it through ctx.executable
.
Testing data and runfiles
We test our new support for runfiles with a simple binary that depends on a library. Both binary and library have data files, and the test verifies they are present.
sh_test( name = "data_test", srcs = ["data_test.sh"], args = ["$(location :list_data_bin)"], data = [":list_data_bin"], ) go_binary( name = "list_data_bin", srcs = ["list_data_bin.go"], deps = [":list_data_lib"], data = ["foo.txt"], ) go_library( name = "list_data_lib", srcs = ["list_data_lib.go"], data = ["bar.txt"], importpath = "rules_go_simple/tests/list_data_lib" )
You can run this test with bazel test //tests/...
.
Accessing runfiles, cross-platform
You should use a library to find and open runfiles, especially in tests. When Bazel executes a binary on Unix platforms, it creates a tree of symbolic links to the binary's runfiles. If your code only ever runs on Unix platforms, you can open a runfile by opening its relative path within the workspace.
This is not generally safe because Bazel handles runfiles differently on Windows. In versions of Windows before about 2019, Windows required you to be an administrator to create symbolic links. Even now in consumer versions of Windows, you need to enable "Developer Mode" to create symbolic links, which requries administrator access. Creating a symbolic link on Windows is also surprisingly slow. To avoid these problems, Bazel uses another strategy: it creates a manifest file that maps logical runfile paths to absolute paths paths for the real files in Bazel's cache. The manifest is pointed to by the RUNFILES_MANIFEST_FILE
environment variable, which is set for tests. Nothing points to the manifest file for binaries run with bazel run
, but you should find a file named MANIFEST
in the initial working directory of the binary. (Incidentally, you can override this and force symbolic links with the Bazel flag --enable_runfiles
).
It is best to use a library if one is available for your language, rather than parsing the manifest file on your own. Bazel's runfile semantics change over time, as they are changing now with bzlmod, and using a library will keep your code working. Most languages provide such a library:
- C++:
@bazel_tools//tools/cpp/runfiles
- Bash:
@bazel_tools//tools/bash/runfiles
- Java:
@bazel_tools//tools/java/runfiles
- Python:
@rules_python//python/runfiles
- Go:
@io_bazel_rules_go//go/runfiles
- Rust:
@rules_rust//tools/runfiles
You can learn more watching Runfiles and where to find them, Fabian Meumertzheim's excellent talk from BazelCon 2022.