Go Editor Support in Bazel Workspaces
A Talk from BazelCon 2022
I spoke at BazelCon 2022 in New York! This article is the script I wrote for that talk, together with my slides. If you'd prefer to watch the video, it's embedded below:
You can also find a PDF version of my slides here: Go Editor Support in Bazel Workspaces
Hey everyone. Today I'm going to tell the story of how Go editor support was implemented in Bazel workspaces. You've already heard about Java in VSCode yesterday and IntelliJ just now. We have a theme going! A lot of the same lessons will come up.
Before we get started, a bit of my history with Go and Bazel. Today I'm a software engineer at EngFlow, where we do remote execution for Bazel. Previously though, I was on the Go Team at Google, where I worked on the implementation of Go modules, fuzzing, and rules_go and Gazelle for Bazel. This talk is mostly about that team's work, together with many people in the Go and Bazel open source communities.
So before we get to Bazel support, let's go back in time, roughly 6 years. Learning the history of a project, or in this case a whole ecosystem, is a great way to understand how things came to be the way they are.
Most Go code was (and still is) built with the go tool, a dedicated build system that ships with the Go toolchain. At the time, the go tool used a very simple scheme for organizing packages, the GOPATH environment variable.
GOPATH was simply a list of directories. When you import a package with some path, the go tool looks in each GOPATH directory for a subdirectory that matches the import path. That logic was simple to implement, and it was easy to write tools on top of. There were a lot of small command-line tools written for many different purposes. Several are listed here, though there were many more. I tried to pick a variety: you have static analysis, refactoring, fuzzing, documentation, a debugger...
Some of these tools stood on their own, but many were used in editor plugins to implement basic features. If you were setting up Emacs, you'd install several of these. If you were setting up VSCode, the plugin would install them for you. Either way, the editor would delegate some basic functions to these programs. For example, godef here was used to implement go-to-definition. You'd run it with a file name and byte offset of a symbol, and it would tell you the file location of that symbol's definition. Or rather, your editor would run godef when you press whatever key was bound to go-to-definition.
Let's talk about what Bazel editor support was like back then. rules_go got started at the end of 2015, the same year Bazel was open sourced. I started working on it in 2017.
There was basically no direct support for Bazel, But if you organized your code the same way the go tool expected with GOPATH, the tools would work okay. Everyone using Bazel outside Google was pretty happy. Folks inside Google exporting code built with Blaze were less happy, since Blaze has a different set of conventions. For example, it allows multiple packages in the same directory; the go tool does not.
Whichever world you lived in, you'd have a difficult time working with generated code that wasn't checked into your repository like .pb.go files generated by go_proto_library. You wouldn't be able to see that code in your editor, since it might not be built yet. Tools that depend on type checking stopped working, too, since all source files need to be present for that. Go developers usually ended up checking in generated code to get around these problems. If you were a library developer, checking in generated code was also needed for compatibility with the go tool, which has no support for code generation.
At this point, you might be thinking though "If I can't generate code during the build, why would I even use Bazel?" There were a few advantages in the early days, especially with caching test results and remote execution. But the go tool got better, and there were fewer reasons to switch to Bazel.
In 2018, we got Go modules.
Modules are a versioning and dependency management system built directly into the go tool. So now instead of GOPATH, you have a go.mod file at the top of your project with a list of your dependencies and the versions you require.
This was a totally different approach to file layout. This was not simply a scheme to copy files into your GOPATH or vendor directory. It was essentially a new build system. The go tool needs to parse go.mod files, select which versions to use, download files at those versions if they're not in the cache, and cryptographically verify those downloads. That's just in the simple case. It's too much to re-implement in every tool, and it's even hard to do all that in a library. So none of the tools worked. We had a list of about 50 tools we needed to migrate, rewrite, or replace.
We needed to make sure all the editors continued working in projects that used modules. And while we were doing this, we needed GOPATH to keep working since this was a gradual migration over a couple years. And at the same time we needed to support Bazel, Blaze, and maybe Buck.
This was all with a team of 9 people, all with multiple responsibilities. So how can we get that done, quickly?
This quote is usually a joke, but it is what we needed to do and what we actually did. And fun fact, David Wheeler invented the subroutine in 1951.
So let's get into what we actually built.
I'll show a quick overview, then we'll go into detail on each component.
Starting at the top, we have our editor plugins. These are usually Go-specific things that users interact with directly, our public interface. Since there are a lot of editors, these plugins should be as small and simple as possible.
Each of those plugins communicates with "go please", our implementation of the Language Server Protocol, LSP. LSP was developed by Microsoft for VSCode, but it's used by other editors, too. That server, gopls, implements all the editor features for a language like Go without being tied to a specific editor. Go's server is called "go please" as in "go please do this for me". Within our team we'd affectionately call it "gahpls". Or on a bad day, "gopeless".
Most of the interesting logic is in gopls, but we don't want to hard code support for every build system, and there are other tools out there besides editors. For example, if you're writing a code generation tool like wire, you really don't need gopls. So we have another layer of abstraction, our package loading library.
Below that is gopackagesdriver, which is a command-line tool that implements the package library's API. The Bazel implementation ships with rules_go, but you can write your own implementation for other systems. I've heard there's a Buck implementation, but I don't think it's open source.
Finally, below that, rules_go emits package metadata for every go_library. That gets collected and reformatted by gopackagesdriver.
At this point in the talk, I switched to a demo where I installed gopackagesdriver in a small workspace I prepared ahead of time. I demonstrated the use of editor features like hover and go-to-definition. I followed the instructions on the rules_go wiki.
Let's now get into the details. We'll start at the package loading library and work our way down, then we'll go back up to gopls at the end.
So let's think for a moment what information gopls needs to work?
In order to find a definition for a symbol, the editor needs to parse and type-check a package and everything it imports. To draw red squigglies, gopls needs very thorough parsing and type checking. In fact, gopls uses the same libraries as the Go compiler, so it literally is the front half of the compiler. We can't take short cuts.
First, given a file name, we need the identity of a package containing it. "Package" usually means go_library but it could also be go_proto_library or anything compatible.
Second, for a given package, we need some metadata. At minimum, a list of files in that package and a list of imports.
Third, given an import string from that list, we need to be able to find the package it corresponds to.
Here's the API of the package loading library. We start with a configuration, which has the current directory, environment variables, and bazel flags.
We pass that to the Load function with a list of patterns. Each pattern is either the name of a file or an import string identifying a package.
The Load function gives us metadata about each package, including its Bazel label, its package path, source files, and imports, mapped to other packages.
I'm leaving out a lot of detail, these are just the highlights. For example, you can ask the Load function to load the syntax tree and all the type information for you after it gets the package metadata. gopls doesn't make use if that, but it's really handy if you're writing your own standalone tool.
The Load function can delegate to a gopackagesdriver binary. As we saw a minute ago, there is such a binary in rules_go, though we built and ran it using a wrapper script.
To use this binary, you set the GOPACKAGESDRIVER environment variable to its path, and when Load is called, it runs that program. If you don't set GOPACKAGESDRIVER by the way, Load has its own native implementation that runs the go tool.
The arguments passed to gopackagesdriver are the patterns given to Load. The configuration is encoded as JSON and passed in through stdin. The packages are written as JSON on stdout.
Now we ask once again "how does that work?"
First, the driver maps command line patterns to Bazel targets using bazel query. It builds expressions to find the right go_library target for a file or import path. It runs bazel query once and unions the results together.
Second, the driver builds the targets it found using an aspect. If you haven't seen Bazel aspects before, they're one of the most complicated, mind-melting things about Bazel. An aspect is a bit of Starlark code you can use in Bazel's analysis phase. It runs on each target on a subgraph of dependencies, recursing through certain attributes. It can read provider metadata for each target, it can return its own providers, and it can declare new output files and actions. gopackagesdriver does exactly that. For each target, it reads the GoArchive provider and writes a .json file. It also builds generated source files and type information that might be needed by the editor.
Third, gopackagesdriver reads those json files, processes them a little bit, then prints them on stdout. For example, it needs to convert relative file names to absolute since rules and aspects don't know absolute paths but the editor needs them.
Now we come to the bottom of the stack, at least as far as we're concerned for editor support.
rules_go doesn't actually need to do anything special to support gopackagesdriver.
Each Go-compatible rule already emits a GoArchive provider, which contains all the metadata the editor cares about, specifically the name, import path, and list of sources.
Now let's work our way back up to gopls and vscode.
We could easily fill an hour speaking about gopls, but I'm going to try and get through the highlights in a few minutes.
As I alluded to earlier, gopls is a standalone program that runs in a separate process. It communicates with the editor using the Language Server Protocol, which sends JSON requests, responses, and notifications over stdin and stdout. That lets gopls be decoupled from any specific editor.
When gopls starts, it builds a graph of all packages in the workspace. That means it needs to ask the package loader to give it metadata for all packages. Once that's done, it can load diagnostics for any package, which mostly means parsing and type checking. That process is lazy, but keep in mind that if you need types for one package, you also need types for everything it imports.
After gopls has started, the editor can send it commands like "definition", requiring a response. To locate a definition, gopls looks in the metadata graph for a package containing the current file. It gets type information for that package, then it finds the symbol under the cursor which has a reference back to its definition. gopls then responds with the definition's location. Most commands can be implemented simply using the package metadata graph and type information.
The editor can also send notifications like "didChange", meaning the user typed something. That might mean invalidating some of the information we keep in memory.
This all sounds like a lot of work. How can it possibly be fast enough to use interactively, and how can it scale for larger workspaces?
To start, Go is a language with simple syntax and few features, and gopls leans heavily on the parser and type checker being heavily optimized. Parsing in particular is very fast: you can parse a typical Go package in 10s of microseconds, which is barely worth caching. There are a few tricks beyond that though.
First, gopls has a concept of a snapshot, which is a view of the workspace at a point in time. A new snapshot is created after the user typed something, but only after a short pause, not after every keystroke. The package metadata graph and the type information for every package are computed from a snapshot. Each snapshot reuses as much information as it can from the previous one. In functional programming terms, it's a "persistent" data structure, by which we mostly mean copy-on-write.
In particular, we avoid reloading the package metadata graph if we can, since we have to call the package loader and that takes a while. That only needs to be done after a change that might affect it, like adding a new file or a new import. We don't need to reload it after a change to the interior of a function or a comment.
Underlying all this is a generic in-memory cache for re-using results of deterministic functions. Keys are hashes of all the inputs. Values are the outputs, for example, diagnostics and type info.
This might sound a little familiar to you. It's very much like Bazel's action cache, which takes a hash of the command you want to run and all its inputs and hopefully gives you the hash of the outputs. That turns out to be a good analogy: gopls is essentially a fast in-memory build system with a very specific purpose.
Let's take our last step up the stack to the Go extension for VSCode. This is a fairly small one, so I'm just going share some quick facts.
There are lots of editors that use gopls, but I'm picking on VSCode in particular since it's the most popular. IntelliJ / Goland is a close second, but they have their own implementation that's entirely separate. vim is the next most popular, then emacs, then a long tail of others.
vscode-go was originally written by people at Microsoft. The Go team worked directly with them during the modules transition, then eventually adopted the project.
Like most VSCode extensions, this one is written all in TypeScript. I've heard it's a great language, but of course the Go team prefers its own. That's good though, since it keeps this layer small and simple.
Mostly the purpose of the Go extension is to integrate into the editor and expose features. It installs and updates gopls and other tools. Provides Go-specific functions and key-bindings. For example, in regular Go workspaces, it has support for modules and can help you update your dependencies.
And that's pretty much it! I hope you found this interesting.
Actually, I hope you are inspired by this talk to improve Bazel editor support, in Go and in other languages. If you work in Go, get involved in open source!
If you work in other languages, please feel free to steal this whole architecture. Making editor support really solid will be a huge step toward making Bazel a useful tool for everyone.
Last thing before I go, I want to thank everyone who worked on this.