Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing starlark build setting always discards the analysis cache. #13591

Open
uri-canva opened this issue Jun 19, 2021 · 9 comments
Open

Changing starlark build setting always discards the analysis cache. #13591

uri-canva opened this issue Jun 19, 2021 · 9 comments
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Configurability platforms, toolchains, cquery, select(), config transitions type: feature request

Comments

@uri-canva
Copy link
Contributor

Description of the problem / feature request:

Changing starlark build settings always discards the analysis cache. It would be nice to have a way of disabling that for certain build settings.

Feature requests: what underlying problem are you trying to solve with this feature?

When using starlark build settings that are limited to certain targets, it would be nice if the analysis cache could be maintained. Similar to --test_filter, for certain build settings, the values of the settings might be changed often, but would only cause a small number of additional configured targets to have to be cached. That's why --trim_test_configuration was introduced.

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Create a starlark build setting, change its value, observe INFO: Build option --//tools/settings:your_setting has changed, discarding analysis cache.

What operating system are you running Bazel on?

> sw_vers
ProductName:	macOS
ProductVersion:	11.3.1
BuildVersion:	20E241

What's the output of bazel info release?

> bazel info release
INFO: Invocation ID: 781ccd26-b559-444f-976d-678b37d89b18
release 4.1.0- (@non-git)

If bazel info release returns "development version" or "(@non-git)", tell us how you built Bazel.

From https://github.com/NixOS/nixpkgs/blob/0e3229d628b3a1c56cc9279c39e6a40720794773/pkgs/development/tools/build-managers/bazel/bazel_4/default.nix.

Have you found anything relevant by searching the web?

#7466
#6011
#6842 (comment)

@sventiffe sventiffe added team-Configurability platforms, toolchains, cquery, select(), config transitions untriaged labels Jun 22, 2021
@gregestren
Copy link
Contributor

This is true for most flags (not just Starlark flags).

I agree that we should expand the benefit of --test_filter / --trim_test_configuration further. My understanding is it's not quite trivial. For example:

// Note that clearing the analysis cache is currently required for correctness. It is also
// helpful to save memory.
//
// If we had more memory, fixing the correctness issue (see also b/144932999) would allow us
// to not invalidate the cache, leading to potentially better performance on incremental
// builds.

The referenced bug refers to some arcane technical details about Bazel's execution engine. Basically a previous build's action output may not be triggered to reflect the flag change, which would harm incremental builds. I can elaborate if you like. But short story is we'd have to address that before expanding this behavior.

Where in particular do you consume your flags? If you know only top-level targets (i.e. the binaries requested at their command line, not their deps), read the flags, that could open up an easier path. --trim_test_configuration relies on exactly that property.

@uri-canva
Copy link
Contributor Author

We have two use cases:

  1. We use https://github.com/ash2k/bazel-tools/tree/master/multirun to install multiple docker images in the same bazel run command. In this case the build setting is used by the top level target.
  2. We use build settings to customise some docker images (currently not part of the multirun). They are also top level targets.

If we wanted to combine 1 and 2 then we might have use for configured non-top level targets, but we haven't needed that yet.

@katre katre removed the untriaged label Jul 19, 2021
@gregestren gregestren added the P2 We'll consider working on this in future. (Assignee optional) label Jul 22, 2021
@gregestren
Copy link
Contributor

CC @sdtwigg , who's leading --trim_test_configuration and related.

@github-actions
Copy link

Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 14 days unless any other activity occurs or one of the following labels is added: "not stale", "awaiting-bazeler". Please reach out to the triage team (@bazelbuild/triage) if you think this issue is still relevant or you are interested in getting the issue resolved.

@github-actions github-actions bot added the stale Issues or PRs that are stale (no activity for 30 days) label May 28, 2023
@uri-canva
Copy link
Contributor Author

In the meantime we've found a workaround for this: --repo_env does not cause the analysis cache to be discarded, so by creating repository rules that read the environment we can pass information into the analysis phase in a way that will only invalidate the rules that use it. Note that the repository rules should be granular for the specific environment variables you need so you can change them individually without invalidating all rules that use the environment every time you change any environment variable. You can create toolchains to expose those values as make vars where it's more convenient.

@github-actions github-actions bot removed the stale Issues or PRs that are stale (no activity for 30 days) label Jun 3, 2023
@fmeum
Copy link
Collaborator

fmeum commented Jun 3, 2023

@gregestren I would be interested in learning more about the details in b/144932999.

@gregestren
Copy link
Contributor

We have two use cases:

  1. We use https://github.com/ash2k/bazel-tools/tree/master/multirun to install multiple docker images in the same bazel run command. In this case the build setting is used by the top level target.
  2. We use build settings to customise some docker images (currently not part of the multirun). They are also top level targets.

If we wanted to combine 1 and 2 then we might have use for configured non-top level targets, but we haven't needed that yet.

I realize this is old. But if we're talking about flags we know are only used by the top-level target we have some stronger options. The challenge is providing a generic enough API to capture this. But this particularly is an interesting use case.

@louis925
Copy link

Is it possible to keep at least two or N caches? I found most of the scenario I only have two compilation_mode one with -c opt for running program and the other one for running test.

@fmeum
Copy link
Collaborator

fmeum commented Dec 5, 2024

I've been thinking about this more and have the following idea. I haven't prototyped it yet, so it's definitely possible that is has a fatal flaw.

The goal is to implement the equivalent of Starlark's unused_input_lists for Starlark flags and the analysis cache without just keeping all entries around indefinitely. The latter would essentially amount to a memory leak when flags change repeatedly and remote analysis caching may very well be the better solution to config A -> config B -> config A type workflows. Instead, I want to speed up analysis for config A -> config B workflows assuming that the changed Starlark flags are not relevant to most of the configured targets in the build:

  1. Every configured target is augmented with a special built-in provider (let's call it UsedStarlarkFlagsInfo) that tracks a NestedSet<Label> of all Starlark build flags that may affect its transitive closure. The existing logic around RequiredConfigFragmentsProvider essentially already does this, except that the new provider would also need to track settings read by transitions. I would expect this to result in minor memory overhead and essentially no CPU time overhead.

  2. InMemoryGraph gains a new method (let's call it graft) that takes a function (SkyKey, SkyValue) -> @Nullable SkyValue and performs the following modification to the entire graph (assuming no concurrent modifications) based on what the function returns for each existing entry:

    • If it returns null, the entry is removed from the graph.
    • If it returns the same key, the entry is kept as-is.
    • If it returns a different key, a new entry is created under it with the old value attached, including a copy of its entire state. The reverse deps of all deps of the old key are updated to point to the new key and likewise for the deps of the reverse deps. Finally, the entry under the old key is removed.
  3. When Starlark flags have changed from the last build, but the native flags have not, the graft method is called on the graph before any other invalidation is performed with the following function of a SkyKey K and its corresponding SkyValue V:

    • If the key does not involve the build configuration, it is returned as-is.
    • Otherwise, compute the set D of Starlark flags that differ between the old and new top-level configuration (this only needs to be computed once). Also compute the set F_K of Starlark flags whose values agree between K's configuration and the old top-level configuration (aka the baseline configuration) to filter out flags that differ only because they are set by transitions. Then, check whether the intersection of D and F_K is disjoint from the set of flags tracked in V's UsedStarlarkFlagsInfo.
      • If the intersection is non-empty, the transitive closure of K is affected by the flag change and null is returned.
      • Otherwise, a new key icreated from K by applying the changes between the old and new top-level configuration is returned.

Since configured targets only depend on their diff against the respective baseline configuration, Skyframe's change pruning should ensure that the new entries are reused for those targets whose transitive closure does not depend on the changed flags. At the same time, entries that aren't obviously relevant in the new configuration are removed from the graph, thus avoiding the graph from growing indefinitely and, possibly, resolving the remaining correctness issues with analysis cache preservation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Configurability platforms, toolchains, cquery, select(), config transitions type: feature request
Projects
None yet
Development

No branches or pull requests

7 participants