Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix --build_event_publish_all_actions with --remote_download_minimal #10972

Closed
wants to merge 1 commit into from

Conversation

scele
Copy link
Contributor

@scele scele commented Mar 16, 2020

The ActionExecutedEvent references the primaryOutput of the executed
action (see ActionExecutedEvent::referencedLocalFiles).
ByteStreamBuildEventArtifactUploader::upload will try to ensure that
this referenced file is present on the remote cache. It calls
file.isDirectory(), which will eventually go to
RemoteActionFileSystem::getRemoteInputMetadata. However,
RemoteActionFileSystem::inputArtifactData only contains metadata about
the input files, not action outputs.

This change proposes a fix, where we make RemoteActionFileSystem
implement MetadataInjector, and inject information about output files
from ActionMetadataHandler::injectRemoteFile.

The implementation is likely not correct for all cases, but demonstrates
the issue and a possible approach to fix it.

Fixes #10971

@aiuto aiuto added the team-Core Skyframe, bazel query, BEP, options parsing, bazelrc label Mar 24, 2020
@aiuto aiuto requested a review from janakdr March 24, 2020 03:47
Copy link
Contributor

@janakdr janakdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, and sorry for the delay in reviewing here! The Google-internal version of this is a bit different: we insert output metadata through a totally different scheme. I'm consulting internally because on the surface, this seems like a more natural way to do it, so perhaps we could unify the codepaths. It may be a few more days, though, with all the disruptions going on right now. Apologies again!

RemoteFileArtifactValue rm = injectedMetadata.get(execPathString);
if (rm != null) {
return rm;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can't you just return rm unconditionally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Done.

The ActionExecutedEvent references the primaryOutput of the executed
action (see ActionExecutedEvent::referencedLocalFiles).
ByteStreamBuildEventArtifactUploader::upload will try to ensure that
this referenced file is present on the remote cache.  It calls
file.isDirectory(), which will eventually go to
RemoteActionFileSystem::getRemoteInputMetadata.  However,
RemoteActionFileSystem::inputArtifactData only contains metadata about
the input files, not action outputs.

This change proposes a fix, where we make RemoteActionFileSystem
implement MetadataInjector, and inject information about output files
from ActionMetadataHandler::injectRemoteFile.

The implementation is likely not correct for all cases, but demonstrates
the issue and a possible approach to fix it.
@scele scele force-pushed the fix_be_minimal_download branch from f031332 to 0358947 Compare March 26, 2020 20:10
@scele
Copy link
Contributor Author

scele commented Mar 26, 2020

I need for apologies @janakdr, I know we're all busy. :) Thanks for the update, and the initial review!

@janakdr
Copy link
Contributor

janakdr commented Mar 31, 2020

As I mentioned above, the Google-internal version of this filesystem injects metadata earlier, while the Spawn is running. The equivalent in open source would be somewhere in RemoteCache.java, I think, inside the RemoteCache#download* methods. After a discussion, we think that is the right place for it here as well. The idea is that the filesystem should be the source of truth for the build: metadata should hit it before anywhere else.

Moreover, internally at least, there are cases where the same Bazel "action" runs two commands. In that case, the second command consumes files produced by the first one, and so those files must be available on the filesystem, but those intermediate files are not considered action outputs, and so aren't ever injected into the MetadataHandler.

Finally, it would be great to have a test for the situation you're fixing. Thanks!

@meisterT
Copy link
Member

What's the state of this PR?

@scele
Copy link
Contributor Author

scele commented Oct 26, 2020

Unfortunately I have not had time to rebase and add tests for this. I'll close this for now, and re-open when I get some time to spend on this again.

@scele scele closed this Oct 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes team-Core Skyframe, bazel query, BEP, options parsing, bazelrc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

--remote_download_minimal does not work with --build_event_publish_all_actions
5 participants