-
Notifications
You must be signed in to change notification settings - Fork 681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
layer: clarify wording around applying changesets #317
Conversation
@vbatts 4u |
|
||
### Changeset over existing files | ||
|
||
This section covers applying an entry in a layer changeset, if the file path already exists. | ||
This section describes applying an entry from a layer changeset if the file path already exists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
covers/describes → specifies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And maybe:
This section specifies entry application if the target path already exists.
* set attributes on the filepath | ||
If the entry and the existing path are both directories, then the existing path's attributes MUST be replaced by those of the entry in the changeset. | ||
In all other cases, the implementation MUST do the semantic equivalent of the following: | ||
- removing the file path (e.g. [`unlink(2)`](http://linux.die.net/man/2/unlink) on Linux systems) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“removing the file” → “(recursively) removing the file”? Most of the discussion in the GNU docs you linked is for “folks would be mad if we blew away a full tree and replaced it with a broken symlink”. That makes sense for working filesystems, but we're building the rootfs from scratch here, so losing information is not a concern. However, you don't want folks blindly using unlink
, having it fail because the target is a directory and either dying or leaving it in place. We do want them to get that old thing off the filesystem (whatever it takes ;), and the “(recursively)” hint may be enough to get that across.
Or maybe it's sufficient as it stands :p.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cannot act recursively. It must only act on the immediate resource. For example, a link should not be followed.
There are some ordering requirements for this to work correctly. Let's take the following directories from the applying layer:
/a/b
/a/b/c
/a/b
should probably come before /a/b/c
for this to work correctly. If /a/b/d
is encountered, it needs to be left alone, unless the layer above specifies a removal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stevvooe what is supposed to happen if /a/b/
is a directory and a changeset contains /a/b
as a file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I supposed I missed the distinction here. I think we need to be clear that this is recursive only when the lstat (shallow) type of the resource is dir.
If /a/b/...
exists, a non-directory entry /a/b
would cause removal, recursively. Further entries prefixed with an already encountered, non-directory entry, would be ignored or cause an error.
If /a/b/
exists and a new directory entries /a/b/...
are encountered, the result will be a union of the two directory trees, unless whiteouts apply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to be clear that this is recursive only when the lstat (shallow) type of the resource is dir.
No, it's only not a recursive delete when both the existing entry (lstat) and tar entry are dirs. See the implementation in opencontainers/image-tools#42 for machine-readable phrasing.
If
/a/b/…
exists, a non-directory entry/a/b
would cause removal, recursively.
Yes.
Further entries prefixed with an already encountered, non-directory entry, would be ignored or cause an error.
Already-encountered-ness has nothing to do with it. I think we should just unpack as we read through the tarball. Only the current .wh.*
handling jumps the tar order, and the remove-before-unpacking behavior is for non-whiteout entries.
If
/a/b/
exists and a new directory entries/a/b/…
are encountered, the result will be a union of the two directory trees, unless whiteouts apply.
This is true even without the “directory” limit on /a/b/…
. If the filesystem has /a/b/c
and an /a/b/not-c
entry is found in the tarball, then the result will have both /a/b/c
and /a/b/not-c
. If an /a/b
entry is found in the tarball, then it's just a metadata clobber (the result will still have /a/b/c
).
@@ -166,31 +166,29 @@ The resulting tar archive for `rootfs-c9d-v1.s1` has the following entries: | |||
./etc/.wh.my-app-config | |||
``` | |||
|
|||
Where the basename name of `./etc/my-app-config` is now prefixed with `.wh.`, and will therefore be removed when the changeset is applied. | |||
where the basename name of `./etc/my-app-config` is now prefixed with `.wh.`, and will therefore be removed when the changeset is applied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we reword this so it is not a partial sentence? It doesn't even really complete the clause above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tweaked the wording
* set attributes on the filepath | ||
If the entry and the existing path are both directories, then the existing path's attributes MUST be replaced by those of the entry in the changeset. | ||
In all other cases, the implementation MUST do the semantic equivalent of the following: | ||
- removing the file path (e.g. [`unlink(2)`](http://linux.die.net/man/2/unlink) on Linux systems) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cannot act recursively. It must only act on the immediate resource. For example, a link should not be followed.
There are some ordering requirements for this to work correctly. Let's take the following directories from the applying layer:
/a/b
/a/b/c
/a/b
should probably come before /a/b/c
for this to work correctly. If /a/b/d
is encountered, it needs to be left alone, unless the layer above specifies a removal.
On Tue, Sep 20, 2016 at 02:11:20PM -0700, Stephen Day wrote:
How about “Where the” → “The”?
Following links (which I think should never be done) and recursively
Where /a/b is a directory, I see no constraint on the tar-entry /a/b to result in a tree with /a/b/c, where /a/b has unspecified /a/b/c to result in a tree with /a/b, where /a/b is the file from the |
Needs an LGTM from @stevvooe since he requested changes |
@@ -204,31 +204,29 @@ The resulting tar archive for `rootfs-c9d-v1.s1` has the following entries: | |||
./etc/.wh.my-app-config | |||
``` | |||
|
|||
Where the basename name of `./etc/my-app-config` is now prefixed with `.wh.`, and will therefore be removed when the changeset is applied. | |||
The basename name of `./etc/my-app-config` is prefixed with `.wh.`, which signifies that it MUST be removed when the changeset is applied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still fairly awkward. Do you intend to signify or to prefix? I think this sentence can get better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonboulle Maybe this:
To signify that the resource
./etc/my-app-config
MUST be removed, the basename of the entry is prefixed with.wh.
.
@jonboulle The first sentence is still quite awkward. If you're stuck, let me know and I'll suggest a wording. The rest looks fine. |
Implementing the logic that is in-flight with [1], but using recursive removal [2]. GNU tar has a --recursive-unlink option that's not enabled by default, with the motivation being something like "folks would be mad if we blew away a full tree and replaced it with a broken symlink" [3]. That makes sense for working filesystems, but we're building the rootfs from scratch here so losing information is not a concern. This commit always uses recursive removal to get that old thing off the filesystem (whatever it takes ;). The exception to the removal is if both the tar entry and existing path occupant are directories. In this case we want to use GNU tar's default --overwrite-dir behavior, but unpackLayer's metadata handling is currently very weak so I've left it at "don't delete the old directory". The reworked directory case also fixes a minor bug from 44210d0 (cmd/oci-image-tool: fix unpacking..., 2016-07-22, opencontainers#177) where the: if fi, err := os.Lstat(path); !(err == nil && fi.IsDir()) { block would not error out if the Lstat failed for a reason besides the acceptable IsNotExist. Instead, it would attempt to call MkdirAll, which would probably fail for the same reason that Lstat failed (permissions?). But it's better to handle the Lstat errors directly. [1]: opencontainers/image-spec#317 [2]: https://github.com/opencontainers/image-spec/pull/317/files#r79214718 [3]: https://www.gnu.org/software/tar/manual/html_node/Dealing-with-Old-Files.html Signed-off-by: W. Trevor King <wking@tremily.us>
Implementing the logic that is in-flight with [1], but using recursive removal [2]. GNU tar has a --recursive-unlink option that's not enabled by default, with the motivation being something like "folks would be mad if we blew away a full tree and replaced it with a broken symlink" [3]. That makes sense for working filesystems, but we're building the rootfs from scratch here so losing information is not a concern. This commit always uses recursive removal to get that old thing off the filesystem (whatever it takes ;). The exception to the removal is if both the tar entry and existing path occupant are directories. In this case we want to use GNU tar's default --overwrite-dir behavior, but unpackLayer's metadata handling is currently very weak so I've left it at "don't delete the old directory". The reworked directory case also fixes a minor bug from 44210d0 (cmd/oci-image-tool: fix unpacking..., 2016-07-22, opencontainers#177) where the: if fi, err := os.Lstat(path); !(err == nil && fi.IsDir()) { block would not error out if the Lstat failed for a reason besides the acceptable IsNotExist. Instead, it would attempt to call MkdirAll, which would probably fail for the same reason that Lstat failed (e.g. ENOTDIR). But it's better to handle the Lstat errors directly. [1]: opencontainers/image-spec#317 [2]: https://github.com/opencontainers/image-spec/pull/317/files#r79214718 [3]: https://www.gnu.org/software/tar/manual/html_node/Dealing-with-Old-Files.html Signed-off-by: W. Trevor King <wking@tremily.us>
@stevvooe really not sure how to make this clearer ("signifies" is used in a similar way elsewhere in the document) so suggestions welcome! |
Signed-off-by: Jonathan Boulle <jonathanboulle@gmail.com>
Signed-off-by: Jonathan Boulle <jonathanboulle@gmail.com>
adopted @stevvooe's wording |
1 similar comment
We have to use Docker's symlink pkg to make unpacking safe when it comes to symlinks inside path components (and also scoping in some senses). I also included some cleaning of hdr.Name just for safety. I will add unit tests for this. VOTE STERLING ARCHER 2020 Signed-off-by: Aleksa Sarai <asarai@suse.com>
The existing wording was extremely confusing so this is an attempt at
clearing it up. Still not convinced this is quite right but putting it
up for feedback. Alternatively we could scrap the explanation and lean on
something like tar's:
https://www.gnu.org/software/tar/manual/html_node/Dealing-with-Old-Files.html