Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Clear workspace paths before writing codegen outputs #18871

Closed
wants to merge 4 commits into from

Conversation

huonw
Copy link
Contributor

@huonw huonw commented May 1, 2023

Fixes #18941

Plan: allow passing clear_paths to the write_digest call by tracking which parts of the export-codegen are 'real' outputs, and which are just location (e.g. path/to:some_target shouldn't clear dist/codegen/path or dist/codegen/path/to). Build on infrastructure in #18930.

Work remaining:

@huonw huonw added needs-cherrypick category:bugfix Bug fixes for released features labels May 1, 2023
@huonw huonw added this to the 2.16.x milestone May 1, 2023
@huonw
Copy link
Contributor Author

huonw commented May 1, 2023

Hm, this is getting too large for a backport I think: to do the plan effectively, every single goal/target/rule that does something that might be exported needs to start propagating "these are the files/directories I manage", which touches a lot of places.

@huonw huonw removed this from the 2.16.x milestone May 1, 2023
huonw added a commit that referenced this pull request May 1, 2023
This applies a workaround that fixes #18809, for 2.16: before this PR,
repeated commands that write the exact same contents to `dist/` will
fail, if those contents include a symlink. After this patch, they will
succeed. For instance, `pants export-codegen ::` twice if any codegen
creates a symlink.

The particular problem of failing when re-materialising an entry only
surfaces with symlinks, because directories are created in "exists okay"
mode, and files are truncated if they already exist.

However, directories and files _do_ have problems when being
materialised over an entry of a different kind (#17758), but fixing that
seems like a broader issue, and likely too large to target 2.16 at this
point. After the change in this PR, we're at least back to the behaviour
in 2.15:

- directly rerunning commands that write to the workspace will always
succeed
- rerunning after changes may or may not (and, if it does, may or may
not give a valid result: #18849)

I've started on a potential fix for #17758 and #18849 in #18871, but, as
mentioned, it felt like it was getting too large and too "feature"-y to
land for 2.16. If/when a fix along those lines lands, this workaround
can likely be reverted.
huonw added a commit to huonw/pants that referenced this pull request May 1, 2023
…8873)

This applies a workaround that fixes pantsbuild#18809, for 2.16: before this PR,
repeated commands that write the exact same contents to `dist/` will
fail, if those contents include a symlink. After this patch, they will
succeed. For instance, `pants export-codegen ::` twice if any codegen
creates a symlink.

The particular problem of failing when re-materialising an entry only
surfaces with symlinks, because directories are created in "exists okay"
mode, and files are truncated if they already exist.

However, directories and files _do_ have problems when being
materialised over an entry of a different kind (pantsbuild#17758), but fixing that
seems like a broader issue, and likely too large to target 2.16 at this
point. After the change in this PR, we're at least back to the behaviour
in 2.15:

- directly rerunning commands that write to the workspace will always
succeed
- rerunning after changes may or may not (and, if it does, may or may
not give a valid result: pantsbuild#18849)

I've started on a potential fix for pantsbuild#17758 and pantsbuild#18849 in pantsbuild#18871, but, as
mentioned, it felt like it was getting too large and too "feature"-y to
land for 2.16. If/when a fix along those lines lands, this workaround
can likely be reverted.
huonw added a commit that referenced this pull request May 2, 2023
#18873) (#18878)

This applies a workaround that fixes #18809, for 2.16: before this PR,
repeated commands that write the exact same contents to `dist/` will
fail, if those contents include a symlink. After this patch, they will
succeed. For instance, `pants export-codegen ::` twice if any codegen
creates a symlink.

The particular problem of failing when re-materialising an entry only
surfaces with symlinks, because directories are created in "exists okay"
mode, and files are truncated if they already exist.

However, directories and files _do_ have problems when being
materialised over an entry of a different kind (#17758), but fixing that
seems like a broader issue, and likely too large to target 2.16 at this
point. After the change in this PR, we're at least back to the behaviour
in 2.15:

- directly rerunning commands that write to the workspace will always
succeed
- rerunning after changes may or may not (and, if it does, may or may
not give a valid result: #18849)

I've started on a potential fix for #17758 and #18849 in #18871, but, as
mentioned, it felt like it was getting too large and too "feature"-y to
land for 2.16. If/when a fix along those lines lands, this workaround
can likely be reverted.
huonw added a commit that referenced this pull request May 8, 2023
This patch has the package goal clear out the artefact(s) it is about to
write to `dist/`, if they already exists.

For instance, if running ``pants package path/to:target`` on a
`pex_binary` (outputting `path.to/target.pex`), pants will now first
remove anything that's already at `dist/path.to/target.pex` before
writing the new output.

This resolves two problems:

- if there's existing contents of a different kind (e.g. a directory in
`dist/` and writing a file), the package call would explode. For
instance, switching a target like `pex_binary(..., format="zipapp")`
(file) to `pex_binary(..., format="packed")` (directory).

- if the package output is directory, stale files already in that
location in `dist/` would remain. For instance, a `pex_binary(...,
format="packed")` where a file was removed.

This fixes #17758 and fixes #18849, respectively.

This only fixes `package`, not any other goals that also write to fixed
paths (like `export` and `export-codegen`). In #18871, I start on
`export-codegen`, but it's a bit fiddlier (requires propagating "this is
the artefact" paths around) and it's best to land the infrastructure in
this PR first. I'll file follow-up issues covering them specifically.
@huonw huonw changed the title WIP: Clear workspace paths before writing outputs WIP: Clear workspace paths before writing codegen outputs May 14, 2023
@benjyw
Copy link
Contributor

benjyw commented Sep 17, 2023

Hey @huonw , what's the status of this?

@huonw
Copy link
Contributor Author

huonw commented Sep 18, 2023

I'm not going to get to this in the near future. I'll close and leave some breadcrumbs on #18941.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:bugfix Bug fixes for released features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

export-codegen is not resilient to existing contents in dist/codegen
2 participants