Life of a Go module
Go's module system is designed to be decentralized. Although there are public
mirrors like proxy.golang.org
, there is no central module registry. An author
can publish a new version of their module by creating a tag in the module's
source repository.
$ git tag v1.2.3
$ git push --tags
A user can download and use that new version right away.
$ go get -d example.com/mod@v1.2.3
It's cool this is so automatic, but how exactly does it work? What does the go
command download, and from where?
Configuring module downloads
When the go
command needs a module that's not in its local cache, it can
either download the module from the source repository (direct mode),
or it can download the module from a proxy, also known as a mirror. You can
control how modules are downloaded by setting GOPROXY
, GOPRIVATE
, and a few
other environment variables.
The default setting of GOPROXY
is:
GOPROXY=https://proxy.golang.org,direct
This tells the go
command to attempt to download modules first from
proxy.golang.org
, the public module mirror operated by Google. If a module
isn't available there (indicated by a 404 or 410 HTTP status), the go
command
falls back to direct mode. That usually happens because the module is in a
private repository that's not visible to proxy.golang.org
.
You can enable direct mode for specific modules by setting GOPRIVATE
to a list
of patterns matching prefixes of those modules (for example,
GOPRIVATE=corp.example.com
). You can enable direct mode for all modules by
setting GOPROXY=direct
.
Downloading in direct mode
Let's look at how direct mode works before we get into proxy details. Direct mode is the basis for most proxy implementations. Module files have to come from somewhere, after all.
Finding the module repository
The go
command needs to clone a repository into the module cache. Before
it can do that, it needs to look up the repository's URL.
Many modules are hosted on GitHub, and their URLs are derived directly from
modules paths. For example, github.com/foo/bar
becomes
https://github.com/foo/bar.git
. This rule is hard-coded into the go
command,
along with rules for a couple of other services.
For modules outside those services, there are two ways to find the repository
URL. First, the URL may be encoded directly into the module path. A fully
qualified path has an element that ends with .git
, .hg
, .svn
,
.bzr
, or .fossil
. The go
command can derive the repository URL directly
from one of these paths. For example, example.com/repo.git/mod
would be hosted
at https://example.com/repo.git
or ssh://example.com/repo.git
.
Second, the go
command can also look up the URL for a custom module
path (also known as a vanity path) by sending an HTTP GET
request to a URL derived from the module's path. You've likely seen this for
modules at golang.org
, gopkg.in
, or k8s.io
. The request has the query
string ?go-get=1
to distinguish it from other queries. For example, for the
module golang.org/x/net
, the go
command sends a request for
https://golang.org/x/net?go-get=1
. It looks for an HTML <meta>
tag in the
response with the attribute name="go-import"
.
$ curl -L https://golang.org/x/net?go-get=1 | grep go-import
<meta name="go-import" content="golang.org/x/net git https://go.googlesource.com/net">
The content
string in this tag has three fields separated by spaces: the root
path (the prefix of the module path corresponding to the repository root), the
version control tool (git
, hg
, svn
, bzr
, fossil
, or mod
), and the
repository URL.
Custom paths are a nice option since you can change where your module is hosted without renaming it. However, if you have a private module, and you can't easily stand up an HTTP server (for example, on a restricted corporate network), then a qualified path is probably your best option.
Extracting an archive from a repository
After the go
command has located the repository, it makes a local clone of the
repository in the module cache using the appropriate tool like git
,which must
be installed and configured. Configuration is especially important for user
credentials since the go
command invokes git
non-interactively, and you
won't have a chance to enter a password. The Go
FAQ has some advice on this.
Once the go
command has cloned the repository, it creates an archive for the
requested version using a command like git archive
. This archive may contain
unnecessary files, especially the module is in a subdirectory of the repository,
or if there are other nested modules. To remedy this, the go
command copies
each of the module's files from the repository archive into a separate zip file.
After the module zip file is verified, the go
command extracts it
into the module cache. The module's packages can then be built.
If you're curious or need to debug this process, you can see the git
commands
run by go get
and go mod download
by passing in the -x
flag. You can also
read more about version control systems in
the module reference documentation.
Downloading from a proxy
The go
command can download modules from a proxy using an HTTP-based protocol.
This is typically 5-20x faster than downloading modules from a source
repository.
The GOPROXY
protocol was
designed to be stateless and is simple enough to be implemented with a static
file server. The path structure matches the directories in the module cache, so
you can actually use a module cache as a proxy with a file://
URL.
Proxies support the following endpoints:
Path | Description |
---|---|
$module/@v/list |
Returns a list of known versions of the given module in plain text, one per line. This list should not include pseudo-versions. |
$module/@v/$version.info |
Returns JSON-formatted metadata about a version or a branch or tag name that resolves to a version. The JSON data contains a canonical version, and an optional timestamp. |
$module/@v/$version.mod |
Returns the go.mod file for a specific version of the module. If the
module doesn't have a go.mod file, this endpoint should return a
file with a module directive and nothing else.
|
$module/@v/$version.zip |
Returns the content of the module for a specific version. |
$module/@latest |
Returns JSON metadata for the version of a module that the
go command should use as @latest if the
$module/@v/list endpoint is empty or contains no
suitable versions. The returned metadata is in the same format as
$module/@v/$version.info . This endpoint is optional.
Not all proxies implement it.
|
Downloading a module from a proxy may be much faster than downloading the same
module from its source repository for two reasons evident from this protocol.
First, the go
command doesn't need to download an entire repository or even an
entire commit. The .zip
endpoint provides a snapshot of one module at one
version and nothing more. Second, unless a module's packages are actually needed
for a build, the go
command only needs to download the .mod
file for version
selection; it can skip downloading the .zip
file.
Let's pretend we're the go
command and walk through the process of fetching
the latest version of a module with curl
. You can also visit these URLs
in your browser. Suppose we're running the command go get golang.org/x/mod@latest
.
First, we need the list of versions.
$ curl -L https://proxy.golang.org/golang.org/x/mod/@v/list
v0.3.0
v0.4.0
v0.4.1
v0.1.0
v0.2.0
v0.4.2
I don't know why they're not sorted. Anyway, v0.4.2
is the highest version
at the time of this writing.
We'll fetch its metadata. Note that for a canonicalized version like v0.4.2
,
the metadata isn't that useful, and it's not strictly necessary for the go
command to fetch it. It would be more important if we wanted to check what
version a branch name corresponds to.
$ curl -L https://proxy.golang.org/golang.org/x/mod/@v/v0.4.2.info
{"Version":"v0.4.2","Time":"2021-03-09T22:22:12Z"}
Next, we'll fetch the .mod
file:
$ curl -L https://proxy.golang.org/golang.org/x/mod/@v/v0.4.2.mod
module golang.org/x/mod
go 1.12
require (
golang.org/x/crypto v0.0.0-20191011191535-87dc89f01550
golang.org/x/tools v0.0.0-20191119224855-298f0cb1881e
golang.org/x/xerrors v0.0.0-20191011141410-1b5146add898
)
And finally, the .zip
file:
$ curl -L -O https://proxy.golang.org/golang.org/x/mod/@v/v0.4.2.zip
$ unzip -l v0.4.2.zip | head
Archive: v0.4.2.zip
Length Date Time Name
--------- ---------- ----- ----
1479 1980-00-00 00:00 golang.org/x/mod@v0.4.2/LICENSE
1303 1980-00-00 00:00 golang.org/x/mod@v0.4.2/PATENTS
660 1980-00-00 00:00 golang.org/x/mod@v0.4.2/README.md
21 1980-00-00 00:00 golang.org/x/mod@v0.4.2/codereview.cfg
214 1980-00-00 00:00 golang.org/x/mod@v0.4.2/go.mod
1476 1980-00-00 00:00 golang.org/x/mod@v0.4.2/go.sum
5224 1980-00-00 00:00 golang.org/x/mod@v0.4.2/gosumcheck/main.go
Implementing a proxy
At this point, you might be wondering where a proxy gets the modules it serves,
perhaps so you can run your own proxy. There are a few different ways. You could
build a static proxy that serves modules from a directory you populate manually
with go mod download
in direct mode.
export GOMODCACHE=/srv/modcache
mkdir -p $GOMODCACHE
export GOPROXY=direct
go mod download example.com/mod@v1.2.3
# serve files from /srv/modcache/cache/download
You could also build a proxy that serves files from a module cache and runs go mod download
on each cache miss. You could scale that with multiple instances
using shared storage.
If you're interested in running a private proxy on your own network, check out The Athens Project.
Verifying downloaded modules
By default, the go
command downloads publicly available modules from
proxy.golang.org, a module mirror operated by
Google. Anyone can operate a proxy though, which leads to an interesting
security question: how can you verify that the modules you download from a proxy
are genuine? Actually, the same question applies in direct mode: how do you
know the repository you cloned hasn't been tampered with?
The go
command uses two mechanisms to ensure downloaded files haven't changed
since they were first downloaded from the source repository: go.sum
files,
and the global checksum database.
go.sum
Each module has a go.sum
file stored next to its go.mod
file. go.sum
contains a list of hashes of .mod
and .zip
files for the module's
dependencies. It looks like this:
golang.org/x/mod v0.4.1 h1:Kvvh58BN8Y9/lBi7hTekvtMpm07eUZ0ck5pRHpsMWrY=
golang.org/x/mod v0.4.1/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA=
Each line has three fields: a module path, a version, and a base64-encoded
SHA-256 sum. If the version has a /go.mod
suffix, the sum is for the .mod
file; otherwise it's for the .zip
file. Instead of hashing the .zip
file itself, the go
command hashes its files in a deterministic
order. Consequently, the hash isn't sensitive to file order, compression,
alignment, or metadata. Unfortunately, this violates the Cryptographic Doom
Principle.
When the go
command downloads a file, it hashes it and checks go.sum
. If
go.sum
contains a different hash, the go
command reports a security error.
If go.sum
does not contain a hash for the file, the go
command trusts the
file (perhaps after consulting the checksum database) and adds the hash.
This ensures that if multiple people are working together on the same module,
they'll be downloading the same set of dependencies. The go
command reports an
error if a malicious proxy serves different files, or if a repository is taken
over and its version tags are changed.
Checksum database
go.sum
doesn't completely address the threat. How can you verify a file is
authentic the first time you download it, when go.sum
doesn't have a hash?
To answer this, Google operates sum.golang.org, an
auditable checksum database. The checksum database functions a little like a
giant go.sum
file for all versions of all publicly available modules.
The go
command consults this database when downloading files that don't
have hashes in go.sum
. (Modules matched by GOPRIVATE
or GONOSUMDB
won't
be checked).
If you're interested in learning more, check out Proposal: Secure the Public Go Module Ecosystem, which describes how the system works and discusses engineering tradeoffs.
Conclusion
Go modules are a lot easier to manage than GOPATH
was, at least in my
opinion. But that ease comes with a tradeoff: magic. There's a lot of
hidden complexity that makes problems difficult to fix (or to explain) when
something goes wrong.
I've been working on modules in the go
command for a little over two years
now. We've improved the user experience quite a bit in that time, and I've
personally written a lot of documentation. I still
think it's hard for people to understand what's going on, particularly when
things don't "just work". We'll keep improving though, and I think we'll end up
with a great experience while preserving that ease.
If you're interested in learning more about Google's module proxy and checksum database, I'd highly recommend Katie Hockman's GopherCon 2019 talk, Go Module Proxy: Life of a Query. Katie led the team that built these services, and she presents their design in a very accessible form.