Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

campaigns: add and use volume mounts by default on Intel macOS #412

Merged
merged 52 commits into from
Jan 7, 2021

Conversation

LawnGnome
Copy link
Contributor

@LawnGnome LawnGnome commented Dec 25, 2020

Background

As we now know all too well, the default bind mount behaviour we use to manage workspaces when executing campaigns is slow on macOS: any file I/O in the workspace has to go through the loopback interface, and this causes problems with I/O heavy operations such as pretty much anything to do with npm or yarn.

The steps we perform within a workspace are extremely simple, however: we unzip the repository archive and run a variety of git commands to generate the diff. It's handy for this to be done on the host filesystem, since it allows the user to inspect what happens to their repository if a campaign step fails, but it's not a technical requirement.

What's in the box PR

This PR generalises our existing concept of a workspace, and expands it significantly. Workspace modes are now defined by implementing a WorkspaceCreator interface, which knows how to create Workspace implementations which implement the common operations we need to set up, diff, and tear down workspaces. This is used to add a new workspace mode that uses a Docker volume to contain the workspace, rather than bind mounting the workspace from the host. This is controlled by the new -workspace flag, and the default for Intel macOS has been switched to the new volume mode.

For this to work, we have to perform the workspace management commands — the aforementioned unzipping and git malarkey — within a container, since that's the only way to access the volume. As such, part of this PR adds a Docker image that extends alpine to always have git available and configured appropriately, along with unzip and curl. A GitHub Action has been added that will rebuild and push this image on src release.

There's also a lot of new testing stuff: the volume workspace code is copiously unit tested. In order to support this, there's a (very) minimal implementation of a mock API client, and an expanded ripoff implementation of the method Go's os/exec test suite uses to mock external commands, which means the new tests don't need dummydocker and work on Windows.

Adam's guess at obvious questions

Why is CI failing?

The os/exec method for mocking external commands basically redirects exec.Cmd calls back into the test binary, which gains a little bit of logic in its TestMain to handle those correctly.

This works fine cross-platform, but something about AppVeyor specifically makes this not work on Windows. The errors are fairly inscrutable. However, the approach doesn't have any systemic issue on Windows: it works fine both for me locally, and in GitHub Actions, which now has Windows runner support.

I've opened #415, which is included in this PR as well, to remove our AppVeyor support and only use GitHub Actions for CI. If we decide to go ahead with that, then we can remove the AppVeyor integration and this PR will start passing.

How much faster is this on macOS?

It depends.

For campaigns that generate small-to-medium sized diffs and do little I/O, there isn't much difference: maybe a few percent here and there. For campaigns that generate large diffs and do little I/O, this can actually be slower. (That feels like a really weird case, though.)

For campaigns that perform lots of I/O, this is a big win. A test campaign that upgrades TypeScript in Sourcegraph completes in approximately a quarter of the time in volume mode compared to bind mode. (~11 minutes compared to ~42.)

Why change the default on macOS only?

Volume mode isn't universally better. On Linux (assuming nothing weird like a remote Docker server), there's no reason not to use bind mount mode: the performance is the same, and you have the advantage that you can inspect what happened in your workspace if a step fails.

Why change the default on Intel macOS only?

Conservatism. I've put in a bunch of effort to make multi-architecture builds of the Docker image needed by volume mode work, but I don't have an M1 Mac sitting around to test this. It should work, but I'd prefer to find that out by testing it using -workspace volume explicitly, rather than finding out that it doesn't work after making it the default.

What about Windows?

My suspicion is that this will provide similar performance improvements on Windows, but I don't have good numbers on this right now, don't have an environment ready to go to prove that out, and I'd prefer to focus on macOS for now. It's easy enough to change the default later.

That said, I think there's another reason we should consider this for Windows sooner rather than later: it effectively removes the requirement to have git in your PATH, and removes any potential for Windows-specific git weirdness.

What future work could we do to improve this further?

Glad you asked! I had more ideas, but this PR was more than big enough already.

  1. Bind mode tests: now that we have comprehensive tests for volume workspaces, we could retrofit the same techniques onto the bind workspace tests.
  2. Volume mode debugging: right now, if a step fails, the volume is removed, which means the user would have to do some hackery using something like sleep to exec into the container to inspect the workspace if that was necessary. We should probably provide some sort of interactive debugging mode where you get dropped into a container with the workspace already set up and running on error. (I mean, this is probably a good idea for bind mode, too.)
  3. Volume container communication: I have a suspicion that the large diff slowness noted above might be more to do with extracting the diff from the Docker container via stdout than anything endemic to the approach. Another way of doing this would be to have the utility container running the whole time while executing a campaign on a repository, and provide some sort of RPC interface to perform setup and teardown and (most importantly) get diffs without having to go through Docker.
  4. Exec output: to allow mocking external commands, the volume workspace uses the new internal/exec package instead of os/exec directly. If we migrate other modules to use this, then we could use that down the track to provide verbose logging of every command that's run, since command execution will go through a central place.

PR links

Skeletal end user documentation is provided by sourcegraph/sourcegraph#16979.

Fixes sourcegraph/sourcegraph#16809.

@LawnGnome LawnGnome added this to the Campaigns Sprint 7 milestone Dec 25, 2020
@LawnGnome LawnGnome self-assigned this Dec 25, 2020
LawnGnome added a commit to sourcegraph/sourcegraph-public-snapshot that referenced this pull request Jan 1, 2021
@LawnGnome LawnGnome changed the title campaigns: add and use volume mounts by default on macOS campaigns: add and use volume mounts by default on Intel macOS Jan 1, 2021
@LawnGnome LawnGnome requested a review from a team January 1, 2021 02:25
@LawnGnome LawnGnome marked this pull request as ready for review January 1, 2021 02:25
Copy link
Contributor

@mrnugget mrnugget left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic, thorough work - docs, tests, scripts, code. And things can get up to 4 times faster. Good start into the new year :)

CHANGELOG.md Outdated Show resolved Hide resolved
DEVELOPMENT.md Outdated Show resolved Hide resolved
internal/api/mock/mock.go Outdated Show resolved Hide resolved
internal/campaigns/bind_workspace.go Outdated Show resolved Hide resolved
internal/campaigns/bind_workspace_test.go Outdated Show resolved Hide resolved
internal/campaigns/volume_workspace.go Outdated Show resolved Hide resolved
internal/campaigns/volume_workspace.go Show resolved Hide resolved
internal/campaigns/volume_workspace.go Outdated Show resolved Hide resolved
internal/campaigns/volume_workspace_test.go Show resolved Hide resolved
internal/exec/expect/expect.go Show resolved Hide resolved
@LawnGnome LawnGnome force-pushed the aharvey/in-docker-workspace branch from 211151e to 155e213 Compare January 6, 2021 01:14
@LawnGnome
Copy link
Contributor Author

OK, so this has changed around a bit in response to @mrnugget's review. Most notably: there's now a RepoFetcher type that is responsible for fetching repos (gasp!) and is called by runSteps(), which means that repo fetching logic is no longer within the workspace types (and, as a side benefit, means that the volume workspace uses the archive cache correctly).

PTAL!

Copy link
Member

@eseliger eseliger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work 🌟

.github/workflows/docker.yml Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
cmd/src/campaigns_common.go Outdated Show resolved Hide resolved
cmd/src/campaigns_common.go Show resolved Hide resolved
internal/campaigns/repo_fetcher_test.go Outdated Show resolved Hide resolved
internal/campaigns/run_steps.go Show resolved Hide resolved
internal/campaigns/volume_workspace.go Show resolved Hide resolved
internal/campaigns/workspace.go Outdated Show resolved Hide resolved
LawnGnome and others added 5 commits January 7, 2021 13:29
@LawnGnome LawnGnome merged commit 7ce5b18 into main Jan 7, 2021
@LawnGnome LawnGnome deleted the aharvey/in-docker-workspace branch January 7, 2021 21:43
LawnGnome added a commit to sourcegraph/sourcegraph-public-snapshot that referenced this pull request Jan 7, 2021
scjohns pushed a commit that referenced this pull request Apr 24, 2023
Co-authored-by: Thorsten Ball <mrnugget@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cli: add a mode to avoid bind mounts
3 participants