Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow remote-caching for Symlink and SolibSymlink actions #11119

Closed
Tracked by #12665
sohaibiftikhar opened this issue Apr 13, 2020 · 15 comments
Closed
Tracked by #12665

Allow remote-caching for Symlink and SolibSymlink actions #11119

sohaibiftikhar opened this issue Apr 13, 2020 · 15 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug

Comments

@sohaibiftikhar
Copy link

sohaibiftikhar commented Apr 13, 2020

Description of the problem / feature request:

Bazel does not cache Symlink and SolibSymlink actions. Since the remote-cache API allows caching of symlinks this should be possible to implement.

Feature requests: what underlying problem are you trying to solve with this feature?

When used on a clean output base but with the entire build/test cached remotely this leads to unnecessary downloads or action outputs even when the actual build/test action is already cached. Coupled with the --remote_download_minimal this leads to repeated downloads for a fully cached build without even shutting down the bezel server. For larger binaries/libraries this leads to significant time wastage.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

  1. Create a file main.cpp
#include <stdio.h>
#include <thread>
#include <chrono>

template<int I> void f() { printf("%d\n", I); }

template<int N, int I>
struct C {
  void operator()() {
    f<I>();
    C<N/2, I+N>()();
    C<N/2, I-N>()();
  }
};

template<int I>
struct C<0, I> {
  void operator()() {}
};

int main() {
  // This is just to create a sufficiently sized binary
  C<65535, 0>()();
  printf("Sleeping\n");
  std::this_thread::sleep_for(std::chrono::seconds(5));
  printf("done\n");
}

And create a build file

load("@rules_cc//cc:defs.bzl", "cc_binary")

cc_binary(
    name = "big",
    srcs = ["main.cpp"]
)

sh_test(
    name = "big_test",
    size = "small",
    srcs = ["big"],
)

Now run the command:

bazel test --remote_cache=grpc://mycache:8980" --remote_download_minimal  //:big_test

Everything should be sent to the remote cache now. Now clean your output base to simulate another system running the same build. Then re run the build.

bazel clean && bazel shutdown
bazel test --remote_cache="--remote_cache=grpc//mycache:8980" --remote_download_minimal  //:big_test

Bazel will download the remote binaries again on repeated invocations (remote_download_minimal ensures that we don't even need a clean anymore).

What operating system are you running Bazel on?

Linux

What's the output of bazel info release?

3.0.0

If bazel info release returns "development version" or "(@non-git)", tell us how you built Bazel.

NA

What's the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

NA

Have you found anything relevant by searching the web?

There is #6862 but it doesn't cover this particular usage pattern.

Any other information, logs, or outputs that you want to share?

The problem can be circumvented by using --remote_download_outputs=all or even toplevel in some scenarios depending on the dependency graph but this is suboptimal.

@aiuto aiuto added team-Remote-Exec Issues and PRs for the Execution (Remote) team untriaged labels Apr 13, 2020
@ulfjack
Copy link
Contributor

ulfjack commented Apr 27, 2020

I am not sure that what you're proposing is the right solution. Caching symlinks remotely is generally pure overhead because the amount of data in the symlink itself is minimal, and they still have to be created locally. Can you say why Bazel is downloading file repeatedly? What does that have to do with symlinks?

@sohaibiftikhar
Copy link
Author

sohaibiftikhar commented Apr 27, 2020

Thanks for the reply. I will try to present the use case again:
For a clean output base where the build is already fully remotely cached, no intermediate outputs need to be downloaded when using build without the bytes. Without having these actions cached remotely the inputs to locally executed actions would always be staged (since inputs are a prerequisite to executing an action) even if the action itself has almost no cost (such as a Symlink). This can have rather severe effects for dynamic CPP libraries because of the SolibSymlinkAction or for any other action such as sh_binary/sh_test that makes use of the SymlinkAction.

What does that have to do with symlinks

It doesn't necessarily have to do with symlinks but rather to all native actions that bypass the AbstractSpawnStrategy. With symlinks this is more severe cause the symlink itself is kind of a meta action that still gets treated as its own layer in the action graph.

Can you say why Bazel is downloading file repeatedly?

Because the staged inputs from the RemoteActionInputFetcher get deleted in RemoteModule#afterCommand. However, this seems to be expected behaviour as seen in the remote execution tests.

and they still have to be created locally

Not necessarily. One example would be for builds running purely with remote-execution, there is no need for any kind of local execution.

pure overhead because of the amount of data in the symlink itself

I don't understand what is meant by pure overhead here? Do you mean uploading the result of the symlink is an overhead? I agree but that would still happen only once. And while it is overhead, it is has a very small cost to it I believe?

@ulfjack
Copy link
Contributor

ulfjack commented Jun 2, 2020

I don't understand what you're saying. I think you're saying that symlink actions have their inputs staged locally when build-without-the-bytes is active, i.e., the inputs are unnecessarily downloaded to the local machine. How about we don't do that?

I would expect uploading a symlink to a remote cache to be several orders more expensive than a kernel call to create a symlink locally, and similarly, making a remote call to download the symlinks to be infinitely more expensive than ... not doing that, because we have all the information in Bazel already, so there's really no point in making any remote call at all.

@sohaibiftikhar
Copy link
Author

sohaibiftikhar commented Jun 2, 2020

How about we don't do that?

That would be the best thing to do I suppose. This is not easily achieved I think. Because Bazel expects all inputs to be staged locally before executing an action (which would happen in the case of the the symlink not being cached).

If you could give me some pointer on how this can be implemented I would be happy to come up with some changes for review.

@ulfjack
Copy link
Contributor

ulfjack commented Jun 3, 2020

Except it doesn't expect inputs to be staged when executing an action remotely?

@sohaibiftikhar
Copy link
Author

For remotely executed actions it doesn't indeed. There are two issues though:

  1. Symlink actions cannot be executed remotely currently (this change is actually a step in that direction).
  2. Even if that would be the case the problem continues to exist when you use only remote-caching (without remote-execution).

@ulfjack
Copy link
Contributor

ulfjack commented Jun 3, 2020

My point is that executing symlink actions remotely is pointless because you'd still need to re-create the symlinks locally. Symlink actions don't actually need the inputs available locally - (on Linux) it's perfectly fine to create a dangling symlink.

@sohaibiftikhar
Copy link
Author

I see what you mean. I will try this out and check the consequence of this on actions that depend on the symlink action.
Example:
Action 1: compile a binary. Executed remotely cached remotely.
Action 2: create an executable symbolic link (creates dangling link) locally.
Action 3: run the executable from action 2 locally (will it follow links and download the required file?)

@philwo
Copy link
Member

philwo commented Sep 23, 2020

@coeuvre Could you please check this issue and associated PR #11120 whether we should go forward with them or not?

@sohaibiftikhar
Copy link
Author

I don't think the solution I proposed is very good. I have noticed this with other actions as well and it leads to a lot of inefficiencies. I think the strategy to unlink files after each command needs to be revisited.

@coeuvre
Copy link
Member

coeuvre commented Sep 25, 2020

@sohaibiftikhar

This is not easily achieved I think. Because Bazel expects all inputs to be staged locally before executing an action (which would happen in the case of the the symlink not being cached).

I am working on the action rewinding feature. Once landed we can discover new inputs during executing phase which means we don't have to ensure all the inputs are presented before execution. This might be helpful with the performance of symlink actions.

I have noticed this with other actions as well and it leads to a lot of inefficiencies.

Can you share some details of what actions and why they are inefficient?

I think the strategy to unlink files after each command needs to be revisited.

Can you share your concerns and suggestions?

@ulfjack

My point is that executing symlink actions remotely is pointless because you'd still need to re-create the symlinks locally.

It seems that we already have a need for symlinking remotely. #11942 (comment)

@sohaibiftikhar
Copy link
Author

sohaibiftikhar commented Sep 25, 2020

Can you share some details of what actions and why they are inefficient?

Let's say you have a large binary that can be compiled/built remotely but requires some hardware to run so the test execution needs to be local. Let's say the test is super simple (like a smoke test) but you want to force execution so you do --cache_test_results=no (maybe because you want to check for flakiness). When you run the test the binary will be fetched from the remote cache and then run. Trying running it again and it will again fetch the binary. If the time to fetch a binary is large compared to the actual execution time of the test this makes it rather slow. This holds true for any action that is forced to execute locally and has inputs that exist only remotely.

Can you share your concerns and suggestions?

What would be reasonable might be to not waste download efforts. If Bazel fetches the input artifact to the local cache then it should stay there. The reason it doesn't stay there currently I think is because of the local action cache not being aware of those artifacts being present.

@sohaibiftikhar
Copy link
Author

@coeuvre

I am working on the action rewinding feature. Once landed we can discover new inputs during executing phase which means we don't have to ensure all the inputs are presented before execution. This might be helpful with the performance of symlink actions.

This would be awesome btw. Is there someplace I can track progress on this?

@coeuvre
Copy link
Member

coeuvre commented Oct 12, 2020

Sorry for the delay. I was OOO.

If Bazel fetches the input artifact to the local cache then it should stay there. The reason it doesn't stay there currently I think is because of the local action cache not being aware of those artifacts being present.

Good point! I will look into this.

This would be awesome btw. Is there someplace I can track progress on this?

This is still under the design stage and will publish once finalized.

@tjgq
Copy link
Contributor

tjgq commented Oct 21, 2022

I believe this was fixed in 32b0f5a. Internal actions that create a symlink should no longer cause the symlink's target to be downloaded when --remote_download_minimal is enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug
Projects
None yet
Development

No branches or pull requests

6 participants