Allow remote-caching for Symlink and SolibSymlink actions #11119

sohaibiftikhar · 2020-04-13T17:09:25Z

Description of the problem / feature request:

Bazel does not cache Symlink and SolibSymlink actions. Since the remote-cache API allows caching of symlinks this should be possible to implement.

Feature requests: what underlying problem are you trying to solve with this feature?

When used on a clean output base but with the entire build/test cached remotely this leads to unnecessary downloads or action outputs even when the actual build/test action is already cached. Coupled with the --remote_download_minimal this leads to repeated downloads for a fully cached build without even shutting down the bezel server. For larger binaries/libraries this leads to significant time wastage.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Create a file main.cpp

#include <stdio.h>
#include <thread>
#include <chrono>

template<int I> void f() { printf("%d\n", I); }

template<int N, int I>
struct C {
  void operator()() {
    f<I>();
    C<N/2, I+N>()();
    C<N/2, I-N>()();
  }
};

template<int I>
struct C<0, I> {
  void operator()() {}
};

int main() {
  // This is just to create a sufficiently sized binary
  C<65535, 0>()();
  printf("Sleeping\n");
  std::this_thread::sleep_for(std::chrono::seconds(5));
  printf("done\n");
}

And create a build file

load("@rules_cc//cc:defs.bzl", "cc_binary")

cc_binary(
    name = "big",
    srcs = ["main.cpp"]
)

sh_test(
    name = "big_test",
    size = "small",
    srcs = ["big"],
)

Now run the command:

bazel test --remote_cache=grpc://mycache:8980" --remote_download_minimal  //:big_test

Everything should be sent to the remote cache now. Now clean your output base to simulate another system running the same build. Then re run the build.

bazel clean && bazel shutdown
bazel test --remote_cache="--remote_cache=grpc//mycache:8980" --remote_download_minimal  //:big_test

Bazel will download the remote binaries again on repeated invocations (remote_download_minimal ensures that we don't even need a clean anymore).

What operating system are you running Bazel on?

Linux

What's the output of `bazel info release`?

3.0.0

If `bazel info release` returns "development version" or "(@non-git)", tell us how you built Bazel.

NA

What's the output of `git remote get-url origin ; git rev-parse master ; git rev-parse HEAD` ?

NA

Have you found anything relevant by searching the web?

There is #6862 but it doesn't cover this particular usage pattern.

Any other information, logs, or outputs that you want to share?

The problem can be circumvented by using --remote_download_outputs=all or even toplevel in some scenarios depending on the dependency graph but this is suboptimal.

The text was updated successfully, but these errors were encountered:

ulfjack · 2020-04-27T11:17:21Z

I am not sure that what you're proposing is the right solution. Caching symlinks remotely is generally pure overhead because the amount of data in the symlink itself is minimal, and they still have to be created locally. Can you say why Bazel is downloading file repeatedly? What does that have to do with symlinks?

sohaibiftikhar · 2020-04-27T16:42:45Z

Thanks for the reply. I will try to present the use case again:
For a clean output base where the build is already fully remotely cached, no intermediate outputs need to be downloaded when using build without the bytes. Without having these actions cached remotely the inputs to locally executed actions would always be staged (since inputs are a prerequisite to executing an action) even if the action itself has almost no cost (such as a Symlink). This can have rather severe effects for dynamic CPP libraries because of the SolibSymlinkAction or for any other action such as sh_binary/sh_test that makes use of the SymlinkAction.

What does that have to do with symlinks

It doesn't necessarily have to do with symlinks but rather to all native actions that bypass the AbstractSpawnStrategy. With symlinks this is more severe cause the symlink itself is kind of a meta action that still gets treated as its own layer in the action graph.

Can you say why Bazel is downloading file repeatedly?

Because the staged inputs from the RemoteActionInputFetcher get deleted in RemoteModule#afterCommand. However, this seems to be expected behaviour as seen in the remote execution tests.

and they still have to be created locally

Not necessarily. One example would be for builds running purely with remote-execution, there is no need for any kind of local execution.

pure overhead because of the amount of data in the symlink itself

I don't understand what is meant by pure overhead here? Do you mean uploading the result of the symlink is an overhead? I agree but that would still happen only once. And while it is overhead, it is has a very small cost to it I believe?

ulfjack · 2020-06-02T12:19:11Z

I don't understand what you're saying. I think you're saying that symlink actions have their inputs staged locally when build-without-the-bytes is active, i.e., the inputs are unnecessarily downloaded to the local machine. How about we don't do that?

I would expect uploading a symlink to a remote cache to be several orders more expensive than a kernel call to create a symlink locally, and similarly, making a remote call to download the symlinks to be infinitely more expensive than ... not doing that, because we have all the information in Bazel already, so there's really no point in making any remote call at all.

sohaibiftikhar · 2020-06-02T14:20:54Z

How about we don't do that?

That would be the best thing to do I suppose. This is not easily achieved I think. Because Bazel expects all inputs to be staged locally before executing an action (which would happen in the case of the the symlink not being cached).

If you could give me some pointer on how this can be implemented I would be happy to come up with some changes for review.

ulfjack · 2020-06-03T11:35:59Z

Except it doesn't expect inputs to be staged when executing an action remotely?

sohaibiftikhar · 2020-06-03T13:10:30Z

For remotely executed actions it doesn't indeed. There are two issues though:

Symlink actions cannot be executed remotely currently (this change is actually a step in that direction).
Even if that would be the case the problem continues to exist when you use only remote-caching (without remote-execution).

ulfjack · 2020-06-03T14:57:20Z

My point is that executing symlink actions remotely is pointless because you'd still need to re-create the symlinks locally. Symlink actions don't actually need the inputs available locally - (on Linux) it's perfectly fine to create a dangling symlink.

sohaibiftikhar · 2020-06-03T18:49:12Z

I see what you mean. I will try this out and check the consequence of this on actions that depend on the symlink action.
Example:
Action 1: compile a binary. Executed remotely cached remotely.
Action 2: create an executable symbolic link (creates dangling link) locally.
Action 3: run the executable from action 2 locally (will it follow links and download the required file?)

philwo · 2020-09-23T14:49:59Z

@coeuvre Could you please check this issue and associated PR #11120 whether we should go forward with them or not?

sohaibiftikhar · 2020-09-24T19:17:41Z

I don't think the solution I proposed is very good. I have noticed this with other actions as well and it leads to a lot of inefficiencies. I think the strategy to unlink files after each command needs to be revisited.

coeuvre · 2020-09-25T02:38:34Z

@sohaibiftikhar

This is not easily achieved I think. Because Bazel expects all inputs to be staged locally before executing an action (which would happen in the case of the the symlink not being cached).

I am working on the action rewinding feature. Once landed we can discover new inputs during executing phase which means we don't have to ensure all the inputs are presented before execution. This might be helpful with the performance of symlink actions.

I have noticed this with other actions as well and it leads to a lot of inefficiencies.

Can you share some details of what actions and why they are inefficient?

I think the strategy to unlink files after each command needs to be revisited.

Can you share your concerns and suggestions?

@ulfjack

My point is that executing symlink actions remotely is pointless because you'd still need to re-create the symlinks locally.

It seems that we already have a need for symlinking remotely. #11942 (comment)

sohaibiftikhar · 2020-09-25T11:08:03Z

Can you share some details of what actions and why they are inefficient?

Let's say you have a large binary that can be compiled/built remotely but requires some hardware to run so the test execution needs to be local. Let's say the test is super simple (like a smoke test) but you want to force execution so you do --cache_test_results=no (maybe because you want to check for flakiness). When you run the test the binary will be fetched from the remote cache and then run. Trying running it again and it will again fetch the binary. If the time to fetch a binary is large compared to the actual execution time of the test this makes it rather slow. This holds true for any action that is forced to execute locally and has inputs that exist only remotely.

Can you share your concerns and suggestions?

What would be reasonable might be to not waste download efforts. If Bazel fetches the input artifact to the local cache then it should stay there. The reason it doesn't stay there currently I think is because of the local action cache not being aware of those artifacts being present.

sohaibiftikhar · 2020-10-04T08:55:16Z

@coeuvre

I am working on the action rewinding feature. Once landed we can discover new inputs during executing phase which means we don't have to ensure all the inputs are presented before execution. This might be helpful with the performance of symlink actions.

This would be awesome btw. Is there someplace I can track progress on this?

coeuvre · 2020-10-12T07:20:55Z

Sorry for the delay. I was OOO.

If Bazel fetches the input artifact to the local cache then it should stay there. The reason it doesn't stay there currently I think is because of the local action cache not being aware of those artifacts being present.

Good point! I will look into this.

This would be awesome btw. Is there someplace I can track progress on this?

This is still under the design stage and will publish once finalized.

tjgq · 2022-10-21T12:27:57Z

I believe this was fixed in 32b0f5a. Internal actions that create a symlink should no longer cause the symlink's target to be downloaded when --remote_download_minimal is enabled.

sohaibiftikhar mentioned this issue Apr 13, 2020

Allow remote caching for Symlink Actions #11120

Closed

aiuto added team-Remote-Exec Issues and PRs for the Execution (Remote) team untriaged labels Apr 13, 2020

ulfjack mentioned this issue Jun 4, 2020

Tracking issue for "Remote Builds without the Bytes" #6862

Closed

19 tasks

exoson mentioned this issue Aug 13, 2020

Build event file creation fails with symlinks when using --remote_download_minimal #11942

Closed

philwo assigned coeuvre Sep 23, 2020

coeuvre added P2 We'll consider working on this in future. (Assignee optional) type: bug and removed untriaged labels Dec 9, 2020

coeuvre mentioned this issue Dec 9, 2020

Separate downloads from action execution to provide more download options #12665

Closed

17 tasks

castler mentioned this issue Aug 9, 2022

Remote Execution: Symlinks created by ctx.actions.symlink are not represented as symlinks remotely #16070

Closed

tjgq closed this as completed Oct 21, 2022

eagleonhill mentioned this issue Dec 21, 2022

Remote download minimal will download all input files for symlink #17063

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow remote-caching for Symlink and SolibSymlink actions #11119

Allow remote-caching for Symlink and SolibSymlink actions #11119

sohaibiftikhar commented Apr 13, 2020 •

edited

Loading

ulfjack commented Apr 27, 2020

sohaibiftikhar commented Apr 27, 2020 •

edited

Loading

ulfjack commented Jun 2, 2020

sohaibiftikhar commented Jun 2, 2020 •

edited

Loading

ulfjack commented Jun 3, 2020

sohaibiftikhar commented Jun 3, 2020

ulfjack commented Jun 3, 2020

sohaibiftikhar commented Jun 3, 2020

philwo commented Sep 23, 2020

sohaibiftikhar commented Sep 24, 2020

coeuvre commented Sep 25, 2020

sohaibiftikhar commented Sep 25, 2020 •

edited

Loading

sohaibiftikhar commented Oct 4, 2020

coeuvre commented Oct 12, 2020

tjgq commented Oct 21, 2022

Allow remote-caching for Symlink and SolibSymlink actions #11119

Allow remote-caching for Symlink and SolibSymlink actions #11119

Comments

sohaibiftikhar commented Apr 13, 2020 • edited Loading

Description of the problem / feature request:

Feature requests: what underlying problem are you trying to solve with this feature?

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

What operating system are you running Bazel on?

What's the output of bazel info release?

If bazel info release returns "development version" or "(@non-git)", tell us how you built Bazel.

What's the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

Have you found anything relevant by searching the web?

Any other information, logs, or outputs that you want to share?

ulfjack commented Apr 27, 2020

sohaibiftikhar commented Apr 27, 2020 • edited Loading

ulfjack commented Jun 2, 2020

sohaibiftikhar commented Jun 2, 2020 • edited Loading

ulfjack commented Jun 3, 2020

sohaibiftikhar commented Jun 3, 2020

ulfjack commented Jun 3, 2020

sohaibiftikhar commented Jun 3, 2020

philwo commented Sep 23, 2020

sohaibiftikhar commented Sep 24, 2020

coeuvre commented Sep 25, 2020

sohaibiftikhar commented Sep 25, 2020 • edited Loading

sohaibiftikhar commented Oct 4, 2020

coeuvre commented Oct 12, 2020

tjgq commented Oct 21, 2022

sohaibiftikhar commented Apr 13, 2020 •

edited

Loading

What's the output of `bazel info release`?

If `bazel info release` returns "development version" or "(@non-git)", tell us how you built Bazel.

What's the output of `git remote get-url origin ; git rev-parse master ; git rev-parse HEAD` ?

sohaibiftikhar commented Apr 27, 2020 •

edited

Loading

sohaibiftikhar commented Jun 2, 2020 •

edited

Loading

sohaibiftikhar commented Sep 25, 2020 •

edited

Loading