Export data, the secret of Go's fast builds
Quick build speed is one of Go's key advantages. It's actually one of the reasons the language was created. Robert Griesemer and Rob Pike mentioned in an interview on Go Time they started brainstorming about Go after being frustrated by 45 minute build times for a C++ program.
There are many reasons that Go builds are fast. The language itself is simple compared with languages like C++, Rust, or Scala. The compiler doesn't need to work as hard to parse and type-check code. The compiler performs relatively few optimizations, since build speed is a priority. A new optimization needs to "pay for itself" by making the compiler faster in order to be considered worthwhile.
There's another clever trick I'd like to focus on though: each compiled Go package contains something called export data, a binary description of its exported definitions. When the compiler handles an import, it can quickly scan the imported package's export data to learn everything it needs to know about its definitions. This is much faster than parsing and type-checking sources of imported packages, so building large programs is very fast.
Exploring export data
When the Go compiler builds a package, it produces an
archive file
(.a
). You can extract these files using the ar
tool
installed on your system or using go tool pack
, which takes most of the same arguments.
$ go list -export -f '{{.Export}}' fmt /opt/go/go1.14/pkg/linux_amd64/fmt.a $ ar x /opt/go/go1.14/pkg/linux_amd64/fmt.a $ ls _go_.o __.PKGDEF
Typically, a package archive contains two files. _go_.o
contains
the package's compiled code, which is used by the
linker to construct a binary. __.PKGDEF
contains package's export data, which is
used by the compiler when building packages that import the compiled
package.
Export data is binary type information about definitions that may be relevant to importing packages. It serves the same purpose as a header in C or C++, essentially containing a list of declarations. All exported definitions are described in the export data (hence the name). Non-exported definitions are included as well if they're mentioned in exported definitions, for example, in a function return type. Definitions from other imported packages may be included for the same reason, but it's important to note that a package's export data doesn't include all the information from its transitively imported packages' export data. Only definitions needed to describe a package's own exported definitions are included in its export data.
Export data can be read and written using
the golang.org/x/tools/go/gcexportdata
package. You can use
the gopackages
tool to read and print type information from export data on the command
line. (gopackages
can do many other things, too; it's the command
line interface
for golang.org/x/tools/go/packages
).
$ go get golang.org/x/tools/go/packages/gopackages $ gopackages -mode types io/ioutil Go package "io/ioutil": package ioutil has complete exported type info file /opt/go/go1.14/src/io/ioutil/ioutil.go file /opt/go/go1.14/src/io/ioutil/tempfile.go import "bytes" import "io" import "os" import "path/filepath" import "sort" import "strconv" import "strings" import "sync" import "time" var Discard io.Writer func NopCloser(r io.Reader) io.ReadCloser func ReadAll(r io.Reader) ([]byte, error) func ReadDir(dirname string) ([]os.FileInfo, error) func ReadFile(filename string) ([]byte, error) func TempDir(dir string, pattern string) (name string, err error) func TempFile(dir string, pattern string) (f *os.File, err error) func WriteFile(filename string, data []byte, perm os.FileMode) error
Comparison with C++
In C++, public definitions are declared in header files that are textually included in source files that need them. Headers are written in regular C++. The compiler needs to parse and type-check headers every time they're included. (It's possible to precompile headers, but it's a lot of work to set up, and the benefit is less than you'd think).
Of course, a header file typically only contains declarations for definitions
in the library or .cc
file it corresponds to. Those declarations
frequently need to reference types declares in other header files, which means
most headers include other headers. This means that when a .cc
file
includes a header to use a specific type or function, the compiler ends up
parsing and type checking a number (possibly a large number) of transitively
included header files that are necessary but not actually relevant. C++ is a
complicated language, so this isn't cheap. There's no mechanism that trims down
headers to only definitions that are needed, and there's nothing that prevents
you from including unnecessary headers.
C++20
introduces modules
(no relation to Go modules), which are intended to solve this problem. I haven't
used them yet, so I can't speak with authority, but the concept seems similar to
export data in Go. .cc
files are translated to a precompiled module
file, which contains type information that can be imported by other files. It's no longer necessary to textually include header files, at least for libraries written to take advantage of the
new feature.
Bonus: caching export data
By default, the Go compiler writes compiled code and export data to the
same .a
file. The -linkobj
flag can be used to write
the compiled code to a separate file.
$ go tool compile -o hello.x -linkobj hello.a hello.go $ ar t hello.a _go_.o $ ar t hello.x __.PKGDEF
Google's internal build system, Blaze, takes advantage of this. When you make a small change that doesn't affect a package's exported types, only the compiled code will be updated; the export data remains the same. Each package only depends on the export data of its imports, so importing packages don't need to be recompiled after a small change. This makes incremental builds scale much better, which is especially valuable in a build environment with globally shared remote caching and execution.
cmd/go
and Bazel don't take advantage of this yet, but they may
some day.
Conclusion
I hope this discussion on export data is interesting, if not useful. If
you're designing a programming language that needs a packaging format, consider
using something like export data. If you're building Go tools, you can
use golang.org/x/tools/go/packages
or golang.org/x/tools/go/gcexportdata
to read type information about a large number of packages.