-
-
Notifications
You must be signed in to change notification settings - Fork 684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stdlib: zip sdk file inputs when executing remotely #2460
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. The speedups you're seeing are awesome.
Architecturally, this isn't quite the solution I have in mind for this problem though. Here's what I'm thinking:
GoStdLib
should produce a .zip file, whether it's using precompiled .a files or a cross-compiled configuration.- For the precompiled configuration, the .zip file should be produced by the repository rule using the system
zip
tool (if available), and thestdlib
rule can return that without any action. This avoids the need for the standard library sources to be inputs. - For the cross-compiled configuration, the .zip should be produced by
GoStdLib
. It may need to consume a .zip of std sources, again produced by the systemzip
tool in the repository rule.
- For the precompiled configuration, the .zip file should be produced by the repository rule using the system
- Actions like
GoLink
andGoCompilePkg
can consume the standard library .zip directly. They won't need std sources or .a files as inputs.GoCompilePkg
already reads sources to find out what packages are imported, so it can extract just the packages it needs. - I'm not sure if this will benefit local builds. Probably not on LInux, possibly yes on Windows. So this would need to be behind a flag.
If the zip file is built by the repository rule on the host, that would dramatically simplify Bazel's file-action graph, making things easier for remote execution. This is basically what Blaze does within Google. The main difference is that the standard library zip is checked into the monorepo.
@@ -459,6 +459,7 @@ def go_context(ctx, attr = None): | |||
importpath_aliases = importpath_aliases, | |||
pathtype = pathtype, | |||
cgo_tools = cgo_tools, | |||
zipper = toolchain._zipper, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not expose anything in the go_context
public API. We might want an is_sdk_archive
flag on the toolchain, but let's see how far we can get without it.
# that we are executing under remote execution. In this case an additional | ||
# action is created to package the sdk files into a single zip file rather | ||
# than 6100+ individual files. An additional flag naming the zip file is | ||
# passed to the builder that will unpackage it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why this speeds things up. Why doesn't GoStdlibZip
execute on the remote worker, taking just as long as GoStdLib
did previously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Point taken. I didn't put the mnemonic behind a strategy flag, so it calls into question why I saw any speedup at all! Perhaps in the course of the work flipping between local and remote configurations I was able to build it locally and the zip was pushed up into the cache. I'll have to study this better.
def go_toolchain(**kwargs): | ||
"""Macro wrapping go_toolchain (to be able to use select expressions).""" | ||
kwargs.setdefault("zipper", select({ | ||
"@bazel_tools//src/conditions:remote": "@bazel_tools//third_party/ijar:zipper", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a cc_binary
as a dependency here means that there must be a configured C/C++ toolchain in remote configurations. That may not always be the case, and I'd rather not make it a hard requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Will avoid a cc_binary
requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can also depend on @bazel_tools//tools/zip:zipper
. Or perhaps worst case scenario we could reimplement zipper as a builder tool (it's not that hard) or perhaps even switch to a nicer format that can be streamed, such as .tar.gz
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like when remote execution is enabled, @bazel_tools//tools/zip:zipper
is //third_party/ijar:zipper
.
A zip tool will still be needed if we generate a .zip in the repository rule, which would cut the GOROOT
sources out of the file-action graph in the default configuration.
I'd rather not do .tar
. Random access is a major advantage of .zip
. GoCompilePkg
and other actions could just pull out the .a
files they need without extracting the whole thing.
Thanks @jayconrod, let me re-think the solution as a repository rule and study the |
Wouldn't this solution also speed up sandboxing? Given the amount of files. |
@steeve It's hard to say without measurement. In every compile / link action, we'd need to extract .a files, so there's I/O overhead. We'd generally only need a small number of files though. I'm not sure how that would compare with creating a symlink forest. |
I'm going to close this for now as I don't have capacity to work on it ATM. |
This PR adds the "zipper" tool to the go_toolchain when
--define=EXECUTOR=remote
.When the zipper tool is present in the go toolchain, it acts as a signal to package the sdk as a single zip archive and simplify the inputs to the
GoStdlib
action from ~6140 files to a handful. The stdlib builder utility has been given a new-sdkzip
flag, and the replicator.go code has a new responsibility to replicate the requisite files from that zip file rather than the filesystem.Fixes #2188
In my (rather puny) remote execution setup, the GoStdlib action never was able to finish (given > 30m) in a cross-compilation scenario. With this PR, it was able to finish in about 3m.
The compressed sdk.zip is ~83M.