Export data, the secret of Go's fast builds

Published on 2020-03-07
Tagged: compilers go
More posts

Quick build speed is one of Go's key advantages. It's actually one of the reasons the language was created. Robert Griesemer and Rob Pike mentioned in an interview on Go Time they started brainstorming about Go after being frustrated by 45 minute build times for a C++ program.

There are many reasons that Go builds are fast. The language itself is simple compared with languages like C++, Rust, or Scala. The compiler doesn't need to work as hard to parse and type-check code. The compiler performs relatively few optimizations, since build speed is a priority. A new optimization needs to "pay for itself" by making the compiler faster in order to be considered worthwhile.

There's another clever trick I'd like to focus on though: each compiled Go package contains something called export data, a binary description of its exported definitions. When the compiler handles an import, it can quickly scan the imported package's export data to learn everything it needs to know about its definitions. This is much faster than parsing and type-checking sources of imported packages, so building large programs is very fast.

Exploring export data

When the Go compiler builds a package, it produces an archive file (.a). You can extract these files using the ar tool installed on your system or using go tool pack, which takes most of the same arguments.

$ go list -export -f '{{.Export}}' fmt
/opt/go/go1.14/pkg/linux_amd64/fmt.a

$ ar x /opt/go/go1.14/pkg/linux_amd64/fmt.a
$ ls
_go_.o  __.PKGDEF

Typically, a package archive contains two files. _go_.o contains the package's compiled code, which is used by the linker to construct a binary. __.PKGDEF contains package's export data, which is used by the compiler when building packages that import the compiled package.

Export data is binary type information about definitions that may be relevant to importing packages. It serves the same purpose as a header in C or C++, essentially containing a list of declarations. All exported definitions are described in the export data (hence the name). Non-exported definitions are included as well if they're mentioned in exported definitions, for example, in a function return type. Definitions from other imported packages may be included for the same reason, but it's important to note that a package's export data doesn't include all the information from its transitively imported packages' export data. Only definitions needed to describe a package's own exported definitions are included in its export data.

Export data can be read and written using the golang.org/x/tools/go/gcexportdata package. You can use the gopackages tool to read and print type information from export data on the command line. (gopackages can do many other things, too; it's the command line interface for golang.org/x/tools/go/packages).

$ go get golang.org/x/tools/go/packages/gopackages
$ gopackages -mode types io/ioutil
Go package "io/ioutil":
	package ioutil
	has complete exported type info
	file /opt/go/go1.14/src/io/ioutil/ioutil.go
	file /opt/go/go1.14/src/io/ioutil/tempfile.go
	import "bytes"
	import "io"
	import "os"
	import "path/filepath"
	import "sort"
	import "strconv"
	import "strings"
	import "sync"
	import "time"
	var Discard io.Writer
	func NopCloser(r io.Reader) io.ReadCloser
	func ReadAll(r io.Reader) ([]byte, error)
	func ReadDir(dirname string) ([]os.FileInfo, error)
	func ReadFile(filename string) ([]byte, error)
	func TempDir(dir string, pattern string) (name string, err error)
	func TempFile(dir string, pattern string) (f *os.File, err error)
	func WriteFile(filename string, data []byte, perm os.FileMode) error

Comparison with C++

In C++, public definitions are declared in header files that are textually included in source files that need them. Headers are written in regular C++. The compiler needs to parse and type-check headers every time they're included. (It's possible to precompile headers, but it's a lot of work to set up, and the benefit is less than you'd think).

Of course, a header file typically only contains declarations for definitions in the library or .cc file it corresponds to. Those declarations frequently need to reference types declares in other header files, which means most headers include other headers. This means that when a .cc file includes a header to use a specific type or function, the compiler ends up parsing and type checking a number (possibly a large number) of transitively included header files that are necessary but not actually relevant. C++ is a complicated language, so this isn't cheap. There's no mechanism that trims down headers to only definitions that are needed, and there's nothing that prevents you from including unnecessary headers.

C++20 introduces modules (no relation to Go modules), which are intended to solve this problem. I haven't used them yet, so I can't speak with authority, but the concept seems similar to export data in Go. .cc files are translated to a precompiled module file, which contains type information that can be imported by other files. It's no longer necessary to textually include header files, at least for libraries written to take advantage of the new feature.

Bonus: caching export data

By default, the Go compiler writes compiled code and export data to the same .a file. The -linkobj flag can be used to write the compiled code to a separate file.

$ go tool compile -o hello.x -linkobj hello.a hello.go
$ ar t hello.a
_go_.o
$ ar t hello.x
__.PKGDEF

Google's internal build system, Blaze, takes advantage of this. When you make a small change that doesn't affect a package's exported types, only the compiled code will be updated; the export data remains the same. Each package only depends on the export data of its imports, so importing packages don't need to be recompiled after a small change. This makes incremental builds scale much better, which is especially valuable in a build environment with globally shared remote caching and execution.

cmd/go and Bazel don't take advantage of this yet, but they may some day.

Conclusion

I hope this discussion on export data is interesting, if not useful. If you're designing a programming language that needs a packaging format, consider using something like export data. If you're building Go tools, you can use golang.org/x/tools/go/packages or golang.org/x/tools/go/gcexportdata to read type information about a large number of packages.