Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify sandbox/remote handling of empty TreeArtifact inputs #15276

Closed
wants to merge 1 commit into from

Conversation

fmeum
Copy link
Collaborator

@fmeum fmeum commented Apr 18, 2022

Actions that take a TreeArtifact as input should see a corresponding
directory even if the TreeArtifact is empty.

Previously, SandboxHelpers contained special handling for the case of
empty TreeArtifact action inputs to ensure that they are added to the
sandbox as empty directories. As pointed out in a comment, this logic
should live in SpawnInputExpander, where it would also apply to remote
execution.

This commit adds a integration tests for this previously untested case,
extracts the logic into SpawnInputExpander and adapts
DirectoryTreeBuilder to handle empty TreeArtifacts when creating the
Merkle trees.

Note: The Merkle tree builder now reports an error when it encounters a
non-empty TreeArtifact. Such an artifact should have been expanded by
SpawnInputExpander and if it wasn't, e.g. because it wasn't properly
registered with Skyframe, it can't be expanded at this point anyway.
The tests uncovered that the spawn for split coverage postprocessing declared the
coverage dir artifact as such an input. In this case, it can simply be
removed as the coverage script creates the coverage dir if it doesn't
exist.

@sgowroji sgowroji added the team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website label Apr 18, 2022
@fmeum fmeum force-pushed the refactor-spawn-input-expander branch 2 times, most recently from 89f1519 to d698d7e Compare April 18, 2022 12:09
@fmeum fmeum changed the title TMP: Move logic for empty TreeArtifacts to SpawnInputExpander Unify sandbox/remote handling of empty TreeArtifacts Apr 18, 2022
@fmeum fmeum force-pushed the refactor-spawn-input-expander branch 3 times, most recently from 94325a4 to 8cc1f77 Compare April 18, 2022 13:11
@brentleyjones
Copy link
Contributor

@bazelbuild/remote-execution

@fmeum
Copy link
Collaborator Author

fmeum commented Apr 18, 2022

Still working on this one, there are test failures I don't understand yet.

@sgowroji sgowroji added team-Remote-Exec Issues and PRs for the Execution (Remote) team and removed team-OSS Issues for the Bazel OSS team: installation, release processBazel packaging, website labels Apr 18, 2022
@fmeum fmeum force-pushed the refactor-spawn-input-expander branch 8 times, most recently from 43fbd19 to ce8e17c Compare April 19, 2022 10:43
@fmeum fmeum marked this pull request as ready for review April 19, 2022 10:43
@fmeum fmeum requested a review from a team as a code owner April 19, 2022 10:43
@fmeum
Copy link
Collaborator Author

fmeum commented Apr 19, 2022

@brentleyjones The tests are passing now, had to fix one case of an unexpanded TreeArtifact in coverage postprocessing.

@fmeum fmeum force-pushed the refactor-spawn-input-expander branch from ce8e17c to d465fa9 Compare April 19, 2022 23:58
@sgowroji sgowroji added the awaiting-review PR is awaiting review from an assigned reviewer label Apr 21, 2022
@fmeum
Copy link
Collaborator Author

fmeum commented Apr 22, 2022

@cushon I just learned about #6393, which seems quite related to this PR. Would you appreciate a PR that fixes #6393 by having Bazel (rather than the remote executor) create the output directories? Fixing these issues would help ongoing work of @gregestren and me on #6526.

@cushon
Copy link
Contributor

cushon commented Apr 22, 2022

Thanks for looping me in @fmeum! I am still interested in a fix for #6393, but I haven't paged in all of the discussion since it was filed in 2018, and it's not a change I can approve by myself. But if you're interested in proposing a solution I can try to at least help find reviewers for it.

@fmeum
Copy link
Collaborator Author

fmeum commented Apr 26, 2022

@coeuvre Would you be available to review this PR or should I ask someone else? It's my first PR in the area, so I don't know the usual process yet.

@coeuvre
Copy link
Member

coeuvre commented Apr 26, 2022

Yes, I can take this one. Thanks for your PR!

@coeuvre coeuvre self-requested a review April 26, 2022 10:33
Copy link
Member

@coeuvre coeuvre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR only covers the case of empty dir as input.

@@ -586,7 +586,6 @@ private static Spawn createCoveragePostProcessingSpawn(
.addTransitive(action.getInputs())
.addAll(expandedCoverageDir)
.add(action.getCollectCoverageScript())
.add(action.getCoverageDirectoryTreeArtifact())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why this is removed. Can you explain a bit more?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is passed the expanded coverage dir TreeArtifact in expandedCoverageDir and adds the contained files to the spawn inputs. In addition, it also used to add the TreeArtifact itself, which doesn't really make sense to me: Either SpawnInputExpander can expand the TreeArtifact, in which case expanding it manually wouldn't be necessary here, or it can't, in which case it would expand to nothing just like an empty TreeArtifact prior to this change.

It turns out that the latter is the case: For some reason beyond my understanding, the coverage directory TreeArtifact is created in a way so that SpawnInputExpander fails to see as non-empty, but

ImmutableSet<? extends ActionInput> expandedCoverageDir =
can expand it.

Since the coverage script creates the coverage directory anyway in

mkdir -p "$COVERAGE_DIR"
, I found deleting this line to be the most direct way to fix the precondition failure this would otherwise hit in https://github.com/bazelbuild/bazel/pull/15276/files#diff-749caa742500c63a4012e0e57f908fb3df3b664bae30f93d9e59d86888f1633bR148.

The commit that introduced the coverage directory TreeArtifact expansion logic is a445cda. If you would like a better explanation for why this artifact can't be expanded in the usual way, I could ping the author.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. SpawnInputExpander can only expand inputs which are existing prior to action execution. This post-processing spawn is collecting inputs generated by previous spawn which are not visible to SpawnInputExpander so it is always expand to nothing -- that's the reason why we manually expand tree artifact here by reading directly from metadata provider.

I think it's safe to remove this line here. but @oquenchil can you confirm since you originally wrote this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it looks safe to me.

@fmeum fmeum changed the title Unify sandbox/remote handling of empty TreeArtifacts Unify sandbox/remote handling of empty TreeArtifact inputs Apr 28, 2022
@fmeum fmeum force-pushed the refactor-spawn-input-expander branch from d465fa9 to 7f94842 Compare April 28, 2022 15:18
Copy link
Collaborator Author

@fmeum fmeum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. I addressed the comments and also updated the commit/PR title to reflect the fact that it only deals with inputs.

@@ -586,7 +586,6 @@ private static Spawn createCoveragePostProcessingSpawn(
.addTransitive(action.getInputs())
.addAll(expandedCoverageDir)
.add(action.getCollectCoverageScript())
.add(action.getCoverageDirectoryTreeArtifact())
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is passed the expanded coverage dir TreeArtifact in expandedCoverageDir and adds the contained files to the spawn inputs. In addition, it also used to add the TreeArtifact itself, which doesn't really make sense to me: Either SpawnInputExpander can expand the TreeArtifact, in which case expanding it manually wouldn't be necessary here, or it can't, in which case it would expand to nothing just like an empty TreeArtifact prior to this change.

It turns out that the latter is the case: For some reason beyond my understanding, the coverage directory TreeArtifact is created in a way so that SpawnInputExpander fails to see as non-empty, but

ImmutableSet<? extends ActionInput> expandedCoverageDir =
can expand it.

Since the coverage script creates the coverage directory anyway in

mkdir -p "$COVERAGE_DIR"
, I found deleting this line to be the most direct way to fix the precondition failure this would otherwise hit in https://github.com/bazelbuild/bazel/pull/15276/files#diff-749caa742500c63a4012e0e57f908fb3df3b664bae30f93d9e59d86888f1633bR148.

The commit that introduced the coverage directory TreeArtifact expansion logic is a445cda. If you would like a better explanation for why this artifact can't be expanded in the usual way, I could ping the author.

@fmeum fmeum force-pushed the refactor-spawn-input-expander branch from 7f94842 to ce38834 Compare April 28, 2022 15:23
Copy link
Member

@coeuvre coeuvre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me, but I will wait for @oquenchil to confirm the change of coverage before importing. Thanks!

@fmeum
Copy link
Collaborator Author

fmeum commented Apr 29, 2022

@coeuvre I am investigating an alternative approach that may be overall more elegant, especially when it comes to also fixing #6393. Could you wait for a bit before importing? I will ping you on an alternative PR or post here if the other approach doesn't work.

@fmeum
Copy link
Collaborator Author

fmeum commented May 1, 2022

@coeuvre I pushed a small commit to fix an issue with --experimental_sibling_repository_layout and added test coverage for this flag.

The alternative approach I pursued in #15371 - expanding empty TreeArtifacts to a synthetic TreeFileArtifact to leverage the existing logic to create parent directories of input & output files - ended up not being as promising as expected. It requires too many changes to other users of SpawnInputExpander and would probably only be worth it if Bazel had many more affected strategies than just sandboxed and remote. The current PR is thus my best attempt, feel free to import once @oquenchil has given the okay.

Actions that take a TreeArtifact as input should see a corresponding
directory even if the TreeArtifact is empty.

Previously, SandboxHelpers contained special handling for the case of
empty TreeArtifact action inputs to ensure that they are added to the
sandbox as empty directories. As pointed out in a comment, this logic
should live in SpawnInputExpander, where it would also apply to remote
execution.

This commit adds a integration tests for this previously untested case,
extracts the logic into SpawnInputExpander and adapts
DirectoryTreeBuilder to handle empty TreeArtifacts when creating the
Merkle trees.

Note: The Merkle tree builder now reports an error when it encounters a
non-empty TreeArtifact. Such an artifact should have been expanded by
SpawnInputExpander and if it wasn't, e.g. because it wasn't properly
registered with Skyframe, it can't be expanded at this point anyway.
The tests uncovered that the spawn for split coverage postprocessing
declared the coverage dir artifact as such an input. In this case, it
can simply be removed as the coverage script creates the coverage dir
if it doesn't exist.
@fmeum fmeum force-pushed the refactor-spawn-input-expander branch from ba8bf13 to ab82887 Compare May 2, 2022 19:43
@fmeum
Copy link
Collaborator Author

fmeum commented May 2, 2022

I resolved merge conflicts and used the opportunity to squash the commits.

@bazel-io bazel-io closed this in c30ffb0 May 4, 2022
@brentleyjones
Copy link
Contributor

@bazel-io flag

@bazel-io bazel-io added the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label May 4, 2022
@ckolli5
Copy link

ckolli5 commented May 9, 2022

@bazel-io fork 5.2.0

@bazel-io bazel-io removed the potential release blocker Flagged by community members using "@bazel-io flag". Should be added to a release blocker milestone label May 9, 2022
@sgowroji sgowroji removed the awaiting-review PR is awaiting review from an assigned reviewer label May 9, 2022
ckolli5 added a commit that referenced this pull request May 10, 2022
Actions that take a TreeArtifact as input should see a corresponding
directory even if the TreeArtifact is empty.

Previously, SandboxHelpers contained special handling for the case of
empty TreeArtifact action inputs to ensure that they are added to the
sandbox as empty directories. As pointed out in a comment, this logic
should live in SpawnInputExpander, where it would also apply to remote
execution.

This commit adds a integration tests for this previously untested case,
extracts the logic into SpawnInputExpander and adapts
DirectoryTreeBuilder to handle empty TreeArtifacts when creating the
Merkle trees.

Note: The Merkle tree builder now reports an error when it encounters a
non-empty TreeArtifact. Such an artifact should have been expanded by
SpawnInputExpander and if it wasn't, e.g. because it wasn't properly
registered with Skyframe, it can't be expanded at this point anyway.
The tests uncovered that the spawn for split coverage postprocessing declared the
coverage dir artifact as such an input. In this case, it can simply be
removed as the coverage script creates the coverage dir if it doesn't
exist.

Closes #15276.

PiperOrigin-RevId: 446452587

Co-authored-by: Fabian Meumertzheim <fabian@meumertzhe.im>
meteorcloudy pushed a commit that referenced this pull request May 10, 2022
Actions that take a TreeArtifact as input should see a corresponding
directory even if the TreeArtifact is empty.

Previously, SandboxHelpers contained special handling for the case of
empty TreeArtifact action inputs to ensure that they are added to the
sandbox as empty directories. As pointed out in a comment, this logic
should live in SpawnInputExpander, where it would also apply to remote
execution.

This commit adds a integration tests for this previously untested case,
extracts the logic into SpawnInputExpander and adapts
DirectoryTreeBuilder to handle empty TreeArtifacts when creating the
Merkle trees.

Note: The Merkle tree builder now reports an error when it encounters a
non-empty TreeArtifact. Such an artifact should have been expanded by
SpawnInputExpander and if it wasn't, e.g. because it wasn't properly
registered with Skyframe, it can't be expanded at this point anyway.
The tests uncovered that the spawn for split coverage postprocessing declared the
coverage dir artifact as such an input. In this case, it can simply be
removed as the coverage script creates the coverage dir if it doesn't
exist.

Closes #15276.

PiperOrigin-RevId: 446452587

Co-authored-by: Fabian Meumertzheim <fabian@meumertzhe.im>
@fmeum fmeum deleted the refactor-spawn-input-expander branch July 1, 2022 07:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Remote-Exec Issues and PRs for the Execution (Remote) team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants