Writing Bazel rules: data and runfiles

Published on 2020-02-01
Edited on 2023-10-12
Tagged: bazel go

This article is part of the series "Writing Bazel rules".

Writing Bazel rules: simple binary rule
Writing Bazel rules: library rule, depsets, providers
Writing Bazel rules: data and runfiles
Writing Bazel rules: moving logic to execution
Writing Bazel rules: repository rules
Writing Bazel rules: platforms and toolchains

Bazel has a neat feature that can simplify a lot of work with tests and executables: the ability to make data files available at run-time using data attributes. You may have seen these in rules like this:

cc_library(
    name = "server_lib",
    srcs = ["server.cc"],
    data = ["private.key"],
)

When a file is listed in a data attribute (or something that behaves like a data attribute), Bazel makes that file available at run-time to executables started with bazel run. This is useful for all kinds of things such as plugins, configuration files, certificates and keys, and resources.

In this article, we'll add data attributes to the go_library and go_binary rules in rules_go_simple, the set of rules we've been working on. We'll be working on the v3 branch. This won't take long: we only need to add a few lines of code for each rule.

Data and runfiles

We can start by adding a data attribute to our rules. Here's the new declaration for go_library. The attribute in go_binary is similar.

go_library = rule(
  implementation = _go_library_impl,
  attrs = {
      "srcs": attr.label_list(
          allow_files = [".go"],
          doc = "Source files to compile",
      ),
      "deps": attr.label_list(
          providers = [GoLibraryInfo],
          doc = "Direct dependencies of the library",
      ),
      "data": attr.label_list(
          allow_files = True,
          doc = "Data files available to binaries using this library",
      ),
      "importpath": attr.string(
          mandatory = True,
          doc = "Name by which the library may be imported",
      ),
      "_stdlib": attr.label(
          default = "//internal:stdlib",
          providers = [GoStdLibInfo],
          doc = "Hidden dependency on the Go standard library",
      ),
  },
  doc = "Compiles a Go archive from Go sources and dependencies",
)

Bazel tracks files that should be made available at run-time using runfiles objects. You can create new runfiles objects with ctx.runfiles. In order to actually make files available, you need to put one of these in the runfiles field in the DefaultInfo provider returned by your rule. Recall that DefaultInfo is used to list the output files and executables produced by a rule.

Here's how we create the DefaultInfo provider for go_library. Again, go_binary is similar.

return [
    DefaultInfo(
        files = depset([archive]),
        runfiles = ctx.runfiles(collect_data = True),
    ),
    ...
]

The expression ctx.runfiles(collect_data = True) gathers the files listed in the data attribute and the runfiles returned by rules in the deps and srcs attributes. That means any library can have data files, and they will be available to tests and binaries run with bazel run that link that library. There are a few different ways to call ctx.runfiles. If you set collect_data = True, as we did above, Bazel will collect data runfiles from dependencies in the srcs, deps, and data attributes. If you set collect_default = True, Bazel will collect default runfiles from the same dependencies. I have no idea what the distinction is between data and default runfiles, but when you construct DefaultInfo, you can set the data_runfiles or default_runfiles fields explicitly. If you just set runfiles, your files will be treated as both data and default.

What if you want to build the list of files explicitly? This is useful if you want to collect files from non-standard attributes, or if you create files within your rule. ctx.runfiles accepts a files argument, which is a simple list of files. You can access runfiles from your dependencies with an expression like dep[DefaultInfo].data_runfiles, where dep is a Target. You can combine runfiles objects using runfiles.merge, which returns a new runfiles object. So we could have implemented go_library like this:

# Gather runfiles.
runfiles = ctx.runfiles(files = ctx.files.data)
for dep in ctx.attr.deps:
    runfiles = runfiles.merge(dep[DefaultInfo].data_runfiles)

# Return the output file and metadata about the library.
return [
    DefaultInfo(
        files = depset([archive]),
        runfiles = runfiles,
    ),
    ...
]

NOTE: When you have an attribute that is a label or label_list, you can access a list of all the files from all the labels using ctx.files (for example: ctx.files.data). This is almost always more convenient than going through ctx.attr (which gives you a Target or a list of Targets), since each target may have multiple files. If your label has allow_single_file = True set, you can also access the file through ctx.file. And if executable = True, you can access it through ctx.executable.

Testing data and runfiles

We test our new support for runfiles with a simple binary that depends on a library. Both binary and library have data files, and the test verifies they are present.

sh_test(
    name = "data_test",
    srcs = ["data_test.sh"],
    args = ["$(location :list_data_bin)"],
    data = [":list_data_bin"],
)

go_binary(
    name = "list_data_bin",
    srcs = ["list_data_bin.go"],
    deps = [":list_data_lib"],
    data = ["foo.txt"],
)

go_library(
    name = "list_data_lib",
    srcs = ["list_data_lib.go"],
    data = ["bar.txt"],
    importpath = "rules_go_simple/tests/list_data_lib"
)

You can run this test with bazel test //tests/....

Accessing runfiles, cross-platform

You should use a library to find and open runfiles, especially in tests. When Bazel executes a binary on Unix platforms, it creates a tree of symbolic links to the binary's runfiles. If your code only ever runs on Unix platforms, you can open a runfile by opening its relative path within the workspace.

This is not generally safe because Bazel handles runfiles differently on Windows. In versions of Windows before about 2019, Windows required you to be an administrator to create symbolic links. Even now in consumer versions of Windows, you need to enable "Developer Mode" to create symbolic links, which requries administrator access. Creating a symbolic link on Windows is also surprisingly slow. To avoid these problems, Bazel uses another strategy: it creates a manifest file that maps logical runfile paths to absolute paths paths for the real files in Bazel's cache. The manifest is pointed to by the RUNFILES_MANIFEST_FILE environment variable, which is set for tests. Nothing points to the manifest file for binaries run with bazel run, but you should find a file named MANIFEST in the initial working directory of the binary. (Incidentally, you can override this and force symbolic links with the Bazel flag --enable_runfiles).

It is best to use a library if one is available for your language, rather than parsing the manifest file on your own. Bazel's runfile semantics change over time, as they are changing now with bzlmod, and using a library will keep your code working. Most languages provide such a library:

C++: @bazel_tools//tools/cpp/runfiles
Bash: @bazel_tools//tools/bash/runfiles
Java: @bazel_tools//tools/java/runfiles
Python: @rules_python//python/runfiles
Go: @io_bazel_rules_go//go/runfiles
Rust: @rules_rust//tools/runfiles

You can learn more watching Runfiles and where to find them, Fabian Meumertzheim's excellent talk from BazelCon 2022.