Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ToolchainContext to ConfiguredTarget. #7969

Closed
wants to merge 1 commit into from

Conversation

katre
Copy link
Member

@katre katre commented Apr 8, 2019

Part of work on execution transitions, #7935.

Part of work on execution transitions, bazelbuild#7935.
@katre katre requested a review from gregestren April 8, 2019 13:24
@katre katre requested a review from lberki as a code owner April 8, 2019 13:24
@@ -75,4 +75,7 @@ default String getConfigurationChecksum() {
default SourceArtifact getSourceArtifact() {
return null;
}

@Nullable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this needed for? Looking at ToolchainContext, it's not a particularly lightweight object and this change would make it so that it's kept in RAM permanently as opposed to being discarded after the rule in question is analyzed. It's also called a "Context" which makes it doubly surprising that it's kept in RAM after analysis.

Can you run a memory benchmark on a larg-ish target with C++ and Java toolchain resolution enabled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason: the execution transition and toolchain transition both need to know about the toolchain context, and this is part of plumbing that through. The full change is here: https://github.com/katre/bazel/tree/et-05-tc-in-atd

I did the memory benchmarking last week, actually. The short version is that adding this increases memory use for analyzing //gws:gws_and_dev_fileset by slightly less than 1%:

Before: 6036MB (blaze memory reported by used-hea-size-after-gc)
After: 6086MB

One option is to store the UnloadedToolchainContext, not the ResolvedToolchainContext. UTC also implements the ToolchainContext interface, but doesn't have the map from toolchain type to ToolchainInfo (or the list of TemplateVariableInfo).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(/me acting as the Guardian Of Memory this time)

I looked at the above fork and it looks like that the the new field in ConfiguredTarget is used for the sole purpose of passing to dependentNodeMap() in production code? If that's the case, that method in itself does a lot of dependency resolution already, so I was wondering if it would be possible to make callers also do toolchain resolution: technically, both toolchains and dependencies are, well, prerequisites and it feels weird that we cache the resolved values for one of them, but not for the other.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the main application be in ConfiguredTargetFunction, which already has access to the context?

I see calls in PostConfiguredTargetFunction and TransitionsOutputFormatterCallback (for query). Are these the motivating dependencies?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of my motivation was also to remove a lot of hacky /* toolchainsLabels= */ ImmutableSet.of() style code from around the codebase.

I can change PostConfiguredTargetFunction to re-compute the toolchain resolution process, but TransitionsOutputFormatterCallback can't do that (it doesn't have a skyframe Environment handy). That means that cquery will end up potentially reporting the wrong transitions (as it won't be able to properly set up the execution and toolchain transitions).

The change to use UnloadedToolchainContext instead of the ToolchainContext in ConfiguredTarget gets the correct functionality and reduces the unbounded part of the memory usage (UnloadedToolchainContext has a String, two PlatformInfo objects, a set of ToolchainTypeInfo objects, and a map from ToolchainTypeInfo to Label). Is that acceptable, memory-wise?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking further, I don't think $resolved_toolchains_internal is kept in the ConfiguredTarget instance, so that is definitely a memory regression to be avoided.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not kept in the ConfiguredTarget instance to supply

// Note: Being able to pull the $resolved_toolchain_internal attr unconditionally from the
// mapper relies on the fact that {@link PlatformSemantics.RESOLVED_TOOLCHAINS_ATTR} exists
// in every rule. Also, we don't actually use fromOptions in our implementation of
// DependencyResolver but passing to avoid passing a null and since we have the information
// anyway.
? Or you mean it would no longer be necessary if this is refactored to avoid that dependency?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of exploring a refactoring of how toolchains relate to DependencyResolver / dependency resolution before following through on this change. But it's not clear to me how easy it is to un-special-case them since there are some fundamental ways (maybe?) that toolchain dependencies really are special.

Today in DependencyResolver I think the only integration is to set a special TOOLCHAIN_DEPENDENCY on the toolchain labels with a host configuration. We could easily set that directly outside DependencyResolver, or as a special case at the beginning of dependentNodeMap without getting further littered through the class.. But then, @lberki would that get in the way of your intentions of creating distinct phases for partially resolved vs. fully resolve dependencies?

And with John's current changes there's a new role for the toolchain context: providing the execution platform. It's possible to just pass the execution platform directly similar to how we're currently passing in the host config. So I think it's still possible to completely disentangle the toolchain context and DependencyResolver. But I'm not sure that puts us in a better state, nor makes the broader issue of special-casing toolchain dependencies less deep?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, this is all getting complicated. To recap, at core:

  • PostConfiguredTarget can just re-call toolchain resolution if it has the unloaded context, which it would have to get from the ConfiguredTarget. This should be quick since that should already be cached in Skyframe. Maybe there's an opportunity to consolidate that same lookup code with other places that resolve toolchains so we're not duplicating a bunch of logic?

  • TransitionsOutputFormatterCallback just fits poorly here. It has Skyframe access but through a different interface, which is already an awkward compromise since the current SkyframeExecutor.getConfiguration() is replicating similar logic called from ConfiguredTargetFunction but in a "different" way. Note this is also used for getting the configurations for top-level targets. This is a necessary evil and expanding this with yet another variation for toolchain resolution would be an unfortunate design direction.

But... does TransitionsOutputFormatterCallback actually need to resolve toolchains? If we just pass it the exec platform label via a ConfiguredTarget, I think that's all it needs to function the rest of the way. It doesn't actually apply transitions - it just spits out the transition objects. So it should be possible to avoid any extra new Skyframe evaluation needs here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to explore the change where I revert most of this, and just pass the execution platform label directly down the call chain. I'll send it when I have some time.

@katre katre closed this Apr 18, 2019
@katre katre deleted the et-03-tc-in-ct branch April 19, 2019 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants