Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stdlib: zip sdk file inputs when executing remotely #2460

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 35 additions & 5 deletions go/private/actions/stdlib.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -77,11 +77,41 @@ def _build_stdlib(go):
"CGO_CFLAGS": " ".join(go.cgo_tools.c_compile_options),
"CGO_LDFLAGS": " ".join(extldflags_from_cc_toolchain(go)),
})
inputs = (go.sdk.srcs +
go.sdk.headers +
go.sdk.tools +
[go.sdk.go, go.sdk.package_list, go.sdk.root_file] +
go.crosstool)

sdk_inputs = go.sdk.srcs + go.sdk.headers + go.sdk.tools + [go.sdk.go, go.sdk.root_file]

# Existence of the go.zipper executable being defined here is the signal
# that we are executing under remote execution. In this case an additional
# action is created to package the sdk files into a single zip file rather
# than 6100+ individual files. An additional flag naming the zip file is
# passed to the builder that will unpackage it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this speeds things up. Why doesn't GoStdlibZip execute on the remote worker, taking just as long as GoStdLib did previously?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point taken. I didn't put the mnemonic behind a strategy flag, so it calls into question why I saw any speedup at all! Perhaps in the course of the work flipping between local and remote configurations I was able to build it locally and the zip was pushed up into the cache. I'll have to study this better.

if go.zipper:
sdkzip = go.actions.declare_file("sdk.zip")

# for zipper usage see
# https://github.com/bazelbuild/bazel/blob/master/third_party/ijar/zip_main.cc#L354
zipargs = go.actions.args()
zipargs.add("cC", sdkzip.path)

# builder expects zip entries relative to GOROOT
prefixlen = len(go.sdk.root_file.dirname) + 1
for f in sdk_inputs:
rel = f.path[prefixlen:]
zipargs.add(rel + "=" + f.path)

archive = go.actions.run(
inputs = sdk_inputs,
outputs = [sdkzip],
mnemonic = "GoStdlibZip",
progress_message = "Packaging %s stdlib files" % len(sdk_inputs),
executable = go.zipper,
arguments = [zipargs],
)
sdk_inputs = [sdkzip]
args.add("-sdkzip", sdkzip.path)

inputs = sdk_inputs + go.crosstool + [go.sdk.package_list]

outputs = [pkg, src]
go.actions.run(
inputs = inputs,
Expand Down
1 change: 1 addition & 0 deletions go/private/context.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -459,6 +459,7 @@ def go_context(ctx, attr = None):
importpath_aliases = importpath_aliases,
pathtype = pathtype,
cgo_tools = cgo_tools,
zipper = toolchain._zipper,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not expose anything in the go_context public API. We might want an is_sdk_archive flag on the toolchain, but let's see how far we can get without it.

nogo = nogo,
coverdata = coverdata,
coverage_enabled = ctx.configuration.coverage_enabled,
Expand Down
17 changes: 16 additions & 1 deletion go/private/go_toolchain.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -54,9 +54,10 @@ def _go_toolchain_impl(ctx):

# Internal fields -- may be read by emit functions.
_builder = ctx.executable.builder,
_zipper = ctx.executable.zipper,
)]

go_toolchain = rule(
_go_toolchain = rule(
_go_toolchain_impl,
attrs = {
# Minimum requirements to specify a toolchain
Expand Down Expand Up @@ -87,11 +88,25 @@ go_toolchain = rule(
"cgo_link_flags": attr.string_list(
doc = "Flags passed to the external linker (if it is used)",
),
# helper tool for the stdlib action
"zipper": attr.label(
cfg = "exec",
executable = True,
doc = "The zipper helper utility",
),
},
doc = "Defines a Go toolchain based on an SDK",
provides = [platform_common.ToolchainInfo],
)

def go_toolchain(**kwargs):
"""Macro wrapping go_toolchain (to be able to use select expressions)."""
kwargs.setdefault("zipper", select({
"@bazel_tools//src/conditions:remote": "@bazel_tools//third_party/ijar:zipper",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a cc_binary as a dependency here means that there must be a configured C/C++ toolchain in remote configurations. That may not always be the case, and I'd rather not make it a hard requirement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Will avoid a cc_binary requirement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can also depend on @bazel_tools//tools/zip:zipper. Or perhaps worst case scenario we could reimplement zipper as a builder tool (it's not that hard) or perhaps even switch to a nicer format that can be streamed, such as .tar.gz.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like when remote execution is enabled, @bazel_tools//tools/zip:zipper is //third_party/ijar:zipper.

A zip tool will still be needed if we generate a .zip in the repository rule, which would cut the GOROOT sources out of the file-action graph in the default configuration.

I'd rather not do .tar. Random access is a major advantage of .zip. GoCompilePkg and other actions could just pull out the .a files they need without extracting the whole thing.

"//conditions:default": None,
}))
_go_toolchain(**kwargs)

def declare_toolchains(host, sdk, builder):
"""Declares go_toolchain and toolchain targets for each platform."""

Expand Down
5 changes: 5 additions & 0 deletions go/tools/builders/env.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,10 @@ type env struct {
// platform. This may be different than GOROOT.
sdk string

// sdk is the path to the Go SDK package as a zip file. This is only used
// when building the stdlib, as an optimization for remote execution.
sdkzip string

// installSuffix is the name of the directory below GOROOT/pkg that contains
// the .a files for the standard library we should build against.
// For example, linux_amd64_race.
Expand All @@ -66,6 +70,7 @@ type env struct {
func envFlags(flags *flag.FlagSet) *env {
env := &env{}
flags.StringVar(&env.sdk, "sdk", "", "Path to the Go SDK.")
flags.StringVar(&env.sdkzip, "sdkzip", "", "Path to the Go SDK (as a zip archive)")
flags.Var(&tagFlag{}, "tags", "List of build tags considered true.")
flags.StringVar(&env.installSuffix, "installsuffix", "", "Standard library under GOROOT/pkg")
flags.BoolVar(&env.verbose, "v", false, "Whether subprocess command lines should be printed")
Expand Down
151 changes: 130 additions & 21 deletions go/tools/builders/replicate.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,12 @@
package main

import (
"archive/zip"
"fmt"
"io"
"os"
"path/filepath"
"strings"
)

type replicateMode int
Expand All @@ -36,8 +38,24 @@ type replicateConfig struct {
fileMode replicateMode
dirMode replicateMode
paths []string
zip string
}

// replicator implementations are capable to copying a filetree from src to dst
// under the specified configuration settings
type replicator interface {
Replicate(src, dst string, config *replicateConfig) error
}

// replicateFromZip is a replicateOption that sets the configuration zip file
// name.
func replicateFromZip(zip string) replicateOption {
return func(config *replicateConfig) {
config.zip = zip
}
}

// replicatePaths is a replicateOption that sets the configuration file paths
func replicatePaths(paths ...string) replicateOption {
return func(config *replicateConfig) {
config.paths = append(config.paths, paths...)
Expand All @@ -56,38 +74,44 @@ func replicatePrepare(dst string, config *replicateConfig) error {
return nil
}

// createFile takes a file source reader and FileInfo, creates at file at dst,
// and updates the file mode.
func createFile(in io.Reader, stat os.FileInfo, dst string) error {
out, err := os.Create(dst)
if err != nil {
return err
}
_, err = io.Copy(out, in)
closeerr := out.Close()
if err != nil {
return err
}
if closeerr != nil {
return closeerr
}
if err := os.Chmod(dst, stat.Mode()); err != nil {
return err
}
return nil
}

// replicateFile is called internally by replicate to map a single file from src into dst.
func replicateFile(src, dst string, config *replicateConfig) error {
if err := replicatePrepare(dst, config); err != nil {
return err
}
switch config.fileMode {
case copyMode:
in, err := os.Open(src)
if err != nil {
return err
}
defer in.Close()
out, err := os.Create(dst)
if err != nil {
return err
}
_, err = io.Copy(out, in)
closeerr := out.Close()
if err != nil {
return err
}
if closeerr != nil {
return closeerr
}
s, err := os.Stat(src)
if err != nil {
return err
}
if err := os.Chmod(dst, s.Mode()); err != nil {
in, err := os.Open(src)
if err != nil {
return err
}
return nil
defer in.Close()
return createFile(in, s, dst)
case hardlinkMode:
return os.Link(src, dst)
case softlinkMode:
Expand Down Expand Up @@ -153,15 +177,100 @@ func replicate(src, dst string, options ...replicateOption) error {
for _, option := range options {
option(&config)
}

var replicator replicator
if config.zip == "" {
replicator = &filesystemReplicator{}
} else {
replicator = &zipReplicator{}
}

return replicator.Replicate(src, dst, &config)
}

// filesystemReplicator implements the replicator interface when source paths
// represent pre-existing entries in the filesystem.
type filesystemReplicator struct {
}

// Replicate is called for each single src dst pair.
func (r *filesystemReplicator) Replicate(src, dst string, config *replicateConfig) error {
if len(config.paths) == 0 {
return replicateTree(src, dst, &config)
return replicateTree(src, dst, config)
}
for _, base := range config.paths {
from := filepath.Join(src, base)
to := filepath.Join(dst, base)
if err := replicateTree(from, to, &config); err != nil {
if err := replicateTree(from, to, config); err != nil {
return err
}
}
return nil
}

// zipReplicator implements the replicator interface when source paths represent
// entries in a zip archive.
type zipReplicator struct {
}

// Replicate is called for each single src dst pair.
func (r *zipReplicator) Replicate(src, dst string, config *replicateConfig) error {
in, err := zip.OpenReader(config.zip)
if err != nil {
return err
}
defer in.Close()

dirs := make(map[string]bool)
files := make([]*zip.File, 0)

// Collect all the zipfile entries of interest based on path prefixes in the
// config.
// TODO: construct a prefix trie here to remove this nested loop
for _, f := range in.File {
for _, path := range config.paths {
if strings.HasPrefix(f.Name, path) {
if f.FileInfo().IsDir() {
// Although this check for IsDir is done, in practice the
// bazel zipper utility does not create zip *directory*
// entries, so in the usual case this branch is never
// executed.
dirs[f.Name] = true
} else {
files = append(files, f)
dirs[filepath.Dir(f.Name)] = true
}
break
}
}
}

extract := func(file *zip.File) error {
to := filepath.Join(dst, file.Name)
if err := os.MkdirAll(filepath.Dir(to), os.ModePerm); err != nil {
return err
}
f, err := file.Open()
if err != nil {
return fmt.Errorf("could not open zip file entry %s: %v", file.Name, err)
}
defer f.Close()

return createFile(f, file.FileInfo(), to)
}

for dir := range dirs {
to := filepath.Join(dst, dir)
if err := replicatePrepare(to, config); err != nil {
return err
}
}

for _, f := range files {
if err := extract(f); err != nil {
return err
}
}

return nil
}
14 changes: 12 additions & 2 deletions go/tools/builders/stdlib.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,18 @@ func stdlib(args []string) error {
}
output := abs(*out)

// Link in the bare minimum needed to the new GOROOT
if err := replicate(goroot, output, replicatePaths("src", "pkg/tool", "pkg/include")); err != nil {
// Link in the bare minimum needed to the new GOROOT. In the case of
// sdk-in-a-zip, we need to unpack additional bin tools and relocate the
// goenv.sdk to point to the unpacked location.
options := make([]replicateOption, 0)
sdkPaths := []string{"src", "pkg/tool", "pkg/include"}
if goenv.sdkzip != "" {
sdkPaths = append(sdkPaths, "bin")
options = append(options, replicateFromZip(goenv.sdkzip))
goenv.sdk = output
}
options = append(options, replicatePaths(sdkPaths...))
if err := replicate(goroot, output, options...); err != nil {
return err
}

Expand Down
2 changes: 1 addition & 1 deletion tests/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@ load(":buildifier_test.bzl", "buildifier_test")

buildifier_test(
name = "buildifier_test",
size = "small",
data = [
":buildifier_test.bzl",
"//:all_files",
],
size = "small",
)
2 changes: 1 addition & 1 deletion tests/core/c_linkmodes/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ cc_test(
}),
copts = select({
"@io_bazel_rules_go//go/platform:windows": [],
"//conditions:default": ['-DSO=\\"$(rootpath :crypto)\\"'],
"//conditions:default": ["-DSO=\\\"$(rootpath :crypto)\\\""],
}),
data = select({
"@io_bazel_rules_go//go/platform:windows": [],
Expand Down
2 changes: 1 addition & 1 deletion tests/core/go_path/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,6 @@ go_test(
":link_path",
":nodata_path",
],
deps = ["//go/tools/bazel:go_default_library"],
rundir = ".",
deps = ["//go/tools/bazel:go_default_library"],
)
2 changes: 1 addition & 1 deletion tests/core/go_path/pkg/lib/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ go_library(
go_library(
name = "vendored",
srcs = ["vendored.go"],
importpath = "example.com/repo2",
importmap = "example.com/repo/vendor/example.com/repo2",
importpath = "example.com/repo2",
visibility = ["//visibility:public"],
)
2 changes: 1 addition & 1 deletion tests/core/stdlib/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ load(":stdlib_files.bzl", "stdlib_files")
go_test(
name = "buildid_test",
srcs = ["buildid_test.go"],
rundir = ".",
data = [":stdlib_files"],
rundir = ".",
)

stdlib_files(name = "stdlib_files")
2 changes: 1 addition & 1 deletion tests/examples/executable_name/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,5 @@ sh_test(
name = "executable_name",
size = "small",
srcs = ["name_test.sh"],
data = [ ":normalised_binary" ],
data = [":normalised_binary"],
)