Life of a Go module
Go's module system is designed to be decentralized. Although there are public
mirrors like proxy.golang.org, there is no central module registry. An author
can publish a new version of their module by creating a tag in the module's
source repository.
$ git tag v1.2.3
$ git push --tags
A user can download and use that new version right away.
$ go get -d example.com/mod@v1.2.3
It's cool this is so automatic, but how exactly does it work? What does the go
command download, and from where?
Configuring module downloads
When the go command needs a module that's not in its local cache, it can
either download the module from the source repository (direct mode),
or it can download the module from a proxy, also known as a mirror. You can
control how modules are downloaded by setting GOPROXY, GOPRIVATE, and a few
other environment variables.
The default setting of GOPROXY is:
GOPROXY=https://proxy.golang.org,direct
This tells the go command to attempt to download modules first from
proxy.golang.org, the public module mirror operated by Google. If a module
isn't available there (indicated by a 404 or 410 HTTP status), the go command
falls back to direct mode. That usually happens because the module is in a
private repository that's not visible to proxy.golang.org.
You can enable direct mode for specific modules by setting GOPRIVATE to a list
of patterns matching prefixes of those modules (for example,
GOPRIVATE=corp.example.com). You can enable direct mode for all modules by
setting GOPROXY=direct.
Downloading in direct mode
Let's look at how direct mode works before we get into proxy details. Direct mode is the basis for most proxy implementations. Module files have to come from somewhere, after all.
Finding the module repository
The go command needs to clone a repository into the module cache. Before
it can do that, it needs to look up the repository's URL.
Many modules are hosted on GitHub, and their URLs are derived directly from
modules paths. For example, github.com/foo/bar becomes
https://github.com/foo/bar.git. This rule is hard-coded into the go command,
along with rules for a couple of other services.
For modules outside those services, there are two ways to find the repository
URL. First, the URL may be encoded directly into the module path. A fully
qualified path has an element that ends with .git, .hg, .svn,
.bzr, or .fossil. The go command can derive the repository URL directly
from one of these paths. For example, example.com/repo.git/mod would be hosted
at https://example.com/repo.git or ssh://example.com/repo.git.
Second, the go command can also look up the URL for a custom module
path (also known as a vanity path) by sending an HTTP GET
request to a URL derived from the module's path. You've likely seen this for
modules at golang.org, gopkg.in, or k8s.io. The request has the query
string ?go-get=1 to distinguish it from other queries. For example, for the
module golang.org/x/net, the go command sends a request for
https://golang.org/x/net?go-get=1. It looks for an HTML <meta> tag in the
response with the attribute name="go-import".
$ curl -L https://golang.org/x/net?go-get=1 | grep go-import
<meta name="go-import" content="golang.org/x/net git https://go.googlesource.com/net">
The content string in this tag has three fields separated by spaces: the root
path (the prefix of the module path corresponding to the repository root), the
version control tool (git, hg, svn, bzr, fossil, or mod), and the
repository URL.
Custom paths are a nice option since you can change where your module is hosted without renaming it. However, if you have a private module, and you can't easily stand up an HTTP server (for example, on a restricted corporate network), then a qualified path is probably your best option.
Extracting an archive from a repository
After the go command has located the repository, it makes a local clone of the
repository in the module cache using the appropriate tool like git,which must
be installed and configured. Configuration is especially important for user
credentials since the go command invokes git non-interactively, and you
won't have a chance to enter a password. The Go
FAQ has some advice on this.
Once the go command has cloned the repository, it creates an archive for the
requested version using a command like git archive. This archive may contain
unnecessary files, especially the module is in a subdirectory of the repository,
or if there are other nested modules. To remedy this, the go command copies
each of the module's files from the repository archive into a separate zip file.
After the module zip file is verified, the go command extracts it
into the module cache. The module's packages can then be built.
If you're curious or need to debug this process, you can see the git commands
run by go get and go mod download by passing in the -x flag. You can also
read more about version control systems in
the module reference documentation.
Downloading from a proxy
The go command can download modules from a proxy using an HTTP-based protocol.
This is typically 5-20x faster than downloading modules from a source
repository.
The GOPROXY protocol was
designed to be stateless and is simple enough to be implemented with a static
file server. The path structure matches the directories in the module cache, so
you can actually use a module cache as a proxy with a file:// URL.
Proxies support the following endpoints:
| Path | Description |
|---|---|
$module/@v/list |
Returns a list of known versions of the given module in plain text, one per line. This list should not include pseudo-versions. |
$module/@v/$version.info |
Returns JSON-formatted metadata about a version or a branch or tag name that resolves to a version. The JSON data contains a canonical version, and an optional timestamp. |
$module/@v/$version.mod |
Returns the go.mod file for a specific version of the module. If the
module doesn't have a go.mod file, this endpoint should return a
file with a module directive and nothing else.
|
$module/@v/$version.zip |
Returns the content of the module for a specific version. |
$module/@latest |
Returns JSON metadata for the version of a module that the
go command should use as @latest if the
$module/@v/list endpoint is empty or contains no
suitable versions. The returned metadata is in the same format as
$module/@v/$version.info. This endpoint is optional.
Not all proxies implement it.
|
Downloading a module from a proxy may be much faster than downloading the same
module from its source repository for two reasons evident from this protocol.
First, the go command doesn't need to download an entire repository or even an
entire commit. The .zip endpoint provides a snapshot of one module at one
version and nothing more. Second, unless a module's packages are actually needed
for a build, the go command only needs to download the .mod file for version
selection; it can skip downloading the .zip file.
Let's pretend we're the go command and walk through the process of fetching
the latest version of a module with curl. You can also visit these URLs
in your browser. Suppose we're running the command go get golang.org/x/mod@latest.
First, we need the list of versions.
$ curl -L https://proxy.golang.org/golang.org/x/mod/@v/list
v0.3.0
v0.4.0
v0.4.1
v0.1.0
v0.2.0
v0.4.2
I don't know why they're not sorted. Anyway, v0.4.2 is the highest version
at the time of this writing.
We'll fetch its metadata. Note that for a canonicalized version like v0.4.2,
the metadata isn't that useful, and it's not strictly necessary for the go
command to fetch it. It would be more important if we wanted to check what
version a branch name corresponds to.
$ curl -L https://proxy.golang.org/golang.org/x/mod/@v/v0.4.2.info
{"Version":"v0.4.2","Time":"2021-03-09T22:22:12Z"}
Next, we'll fetch the .mod file:
$ curl -L https://proxy.golang.org/golang.org/x/mod/@v/v0.4.2.mod
module golang.org/x/mod
go 1.12
require (
golang.org/x/crypto v0.0.0-20191011191535-87dc89f01550
golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e
golang.org/x/xerrors v0.0.0-20191011141410-1b5146add898
)
And finally, the .zip file:
$ curl -L -O https://proxy.golang.org/golang.org/x/mod/@v/v0.4.2.zip
$ unzip -l v0.4.2.zip | head
Archive: v0.4.2.zip
Length Date Time Name
--------- ---------- ----- ----
1479 1980-00-00 00:00 golang.org/x/mod@v0.4.2/LICENSE
1303 1980-00-00 00:00 golang.org/x/mod@v0.4.2/PATENTS
660 1980-00-00 00:00 golang.org/x/mod@v0.4.2/README.md
21 1980-00-00 00:00 golang.org/x/mod@v0.4.2/codereview.cfg
214 1980-00-00 00:00 golang.org/x/mod@v0.4.2/go.mod
1476 1980-00-00 00:00 golang.org/x/mod@v0.4.2/go.sum
5224 1980-00-00 00:00 golang.org/x/mod@v0.4.2/gosumcheck/main.go
Implementing a proxy
At this point, you might be wondering where a proxy gets the modules it serves,
perhaps so you can run your own proxy. There are a few different ways. You could
build a static proxy that serves modules from a directory you populate manually
with go mod download in direct mode.
export GOMODCACHE=/srv/modcache
mkdir -p $GOMODCACHE
export GOPROXY=direct
go mod download example.com/mod@v1.2.3
# serve files from /srv/modcache/cache/download
You could also build a proxy that serves files from a module cache and runs go mod download on each cache miss. You could scale that with multiple instances
using shared storage.
If you're interested in running a private proxy on your own network, check out The Athens Project.
Verifying downloaded modules
By default, the go command downloads publicly available modules from
proxy.golang.org, a module mirror operated by
Google. Anyone can operate a proxy though, which leads to an interesting
security question: how can you verify that the modules you download from a proxy
are genuine? Actually, the same question applies in direct mode: how do you
know the repository you cloned hasn't been tampered with?
The go command uses two mechanisms to ensure downloaded files haven't changed
since they were first downloaded from the source repository: go.sum files,
and the global checksum database.
go.sum
Each module has a go.sum file stored next to its go.mod file. go.sum
contains a list of hashes of .mod and .zip files for the module's
dependencies. It looks like this:
golang.org/x/mod v0.4.1 h1:Kvvh58BN8Y9/lBi7hTekvtMpm07eUZ0ck5pRHpsMWrY=
golang.org/x/mod v0.4.1/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA=
Each line has three fields: a module path, a version, and a base64-encoded
SHA-256 sum. If the version has a /go.mod suffix, the sum is for the .mod
file; otherwise it's for the .zip file. Instead of hashing the .zip
file itself, the go command hashes its files in a deterministic
order. Consequently, the hash isn't sensitive to file order, compression,
alignment, or metadata. Unfortunately, this violates the Cryptographic Doom
Principle.
When the go command downloads a file, it hashes it and checks go.sum. If
go.sum contains a different hash, the go command reports a security error.
If go.sum does not contain a hash for the file, the go command trusts the
file (perhaps after consulting the checksum database) and adds the hash.
This ensures that if multiple people are working together on the same module,
they'll be downloading the same set of dependencies. The go command reports an
error if a malicious proxy serves different files, or if a repository is taken
over and its version tags are changed.
Checksum database
go.sum doesn't completely address the threat. How can you verify a file is
authentic the first time you download it, when go.sum doesn't have a hash?
To answer this, Google operates sum.golang.org, an
auditable checksum database. The checksum database functions a little like a
giant go.sum file for all versions of all publicly available modules.
The go command consults this database when downloading files that don't
have hashes in go.sum. (Modules matched by GOPRIVATE or GONOSUMDB won't
be checked).
If you're interested in learning more, check out Proposal: Secure the Public Go Module Ecosystem, which describes how the system works and discusses engineering tradeoffs.
Conclusion
Go modules are a lot easier to manage than GOPATH was, at least in my
opinion. But that ease comes with a tradeoff: magic. There's a lot of
hidden complexity that makes problems difficult to fix (or to explain) when
something goes wrong.
I've been working on modules in the go command for a little over two years
now. We've improved the user experience quite a bit in that time, and I've
personally written a lot of documentation. I still
think it's hard for people to understand what's going on, particularly when
things don't "just work". We'll keep improving though, and I think we'll end up
with a great experience while preserving that ease.
If you're interested in learning more about Google's module proxy and checksum database, I'd highly recommend Katie Hockman's GopherCon 2019 talk, Go Module Proxy: Life of a Query. Katie led the team that built these services, and she presents their design in a very accessible form.
RSS feed