Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

layer: clarify wording around applying changesets #317

Merged
merged 2 commits into from
Oct 20, 2016

Conversation

jonboulle
Copy link
Contributor

The existing wording was extremely confusing so this is an attempt at
clearing it up. Still not convinced this is quite right but putting it
up for feedback. Alternatively we could scrap the explanation and lean on
something like tar's:
https://www.gnu.org/software/tar/manual/html_node/Dealing-with-Old-Files.html

@jonboulle
Copy link
Contributor Author

@vbatts 4u


### Changeset over existing files

This section covers applying an entry in a layer changeset, if the file path already exists.
This section describes applying an entry from a layer changeset if the file path already exists.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

covers/describes → specifies?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe:

This section specifies entry application if the target path already exists.

* set attributes on the filepath
If the entry and the existing path are both directories, then the existing path's attributes MUST be replaced by those of the entry in the changeset.
In all other cases, the implementation MUST do the semantic equivalent of the following:
- removing the file path (e.g. [`unlink(2)`](http://linux.die.net/man/2/unlink) on Linux systems)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“removing the file” → “(recursively) removing the file”? Most of the discussion in the GNU docs you linked is for “folks would be mad if we blew away a full tree and replaced it with a broken symlink”. That makes sense for working filesystems, but we're building the rootfs from scratch here, so losing information is not a concern. However, you don't want folks blindly using unlink, having it fail because the target is a directory and either dying or leaving it in place. We do want them to get that old thing off the filesystem (whatever it takes ;), and the “(recursively)” hint may be enough to get that across.

Or maybe it's sufficient as it stands :p.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to adding "recursively" as long as that is actually what is expected here - @vbatts @stevvooe ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cannot act recursively. It must only act on the immediate resource. For example, a link should not be followed.

There are some ordering requirements for this to work correctly. Let's take the following directories from the applying layer:

/a/b
/a/b/c

/a/b should probably come before /a/b/c for this to work correctly. If /a/b/d is encountered, it needs to be left alone, unless the layer above specifies a removal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevvooe what is supposed to happen if /a/b/ is a directory and a changeset contains /a/b as a file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I supposed I missed the distinction here. I think we need to be clear that this is recursive only when the lstat (shallow) type of the resource is dir.

If /a/b/... exists, a non-directory entry /a/b would cause removal, recursively. Further entries prefixed with an already encountered, non-directory entry, would be ignored or cause an error.

If /a/b/ exists and a new directory entries /a/b/... are encountered, the result will be a union of the two directory trees, unless whiteouts apply.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to be clear that this is recursive only when the lstat (shallow) type of the resource is dir.

No, it's only not a recursive delete when both the existing entry (lstat) and tar entry are dirs. See the implementation in opencontainers/image-tools#42 for machine-readable phrasing.

If /a/b/… exists, a non-directory entry /a/b would cause removal, recursively.

Yes.

Further entries prefixed with an already encountered, non-directory entry, would be ignored or cause an error.

Already-encountered-ness has nothing to do with it. I think we should just unpack as we read through the tarball. Only the current .wh.* handling jumps the tar order, and the remove-before-unpacking behavior is for non-whiteout entries.

If /a/b/ exists and a new directory entries /a/b/… are encountered, the result will be a union of the two directory trees, unless whiteouts apply.

This is true even without the “directory” limit on /a/b/…. If the filesystem has /a/b/c and an /a/b/not-c entry is found in the tarball, then the result will have both /a/b/c and /a/b/not-c. If an /a/b entry is found in the tarball, then it's just a metadata clobber (the result will still have /a/b/c).

@wking wking mentioned this pull request Sep 18, 2016
@@ -166,31 +166,29 @@ The resulting tar archive for `rootfs-c9d-v1.s1` has the following entries:
./etc/.wh.my-app-config
```

Where the basename name of `./etc/my-app-config` is now prefixed with `.wh.`, and will therefore be removed when the changeset is applied.
where the basename name of `./etc/my-app-config` is now prefixed with `.wh.`, and will therefore be removed when the changeset is applied.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we reword this so it is not a partial sentence? It doesn't even really complete the clause above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tweaked the wording

* set attributes on the filepath
If the entry and the existing path are both directories, then the existing path's attributes MUST be replaced by those of the entry in the changeset.
In all other cases, the implementation MUST do the semantic equivalent of the following:
- removing the file path (e.g. [`unlink(2)`](http://linux.die.net/man/2/unlink) on Linux systems)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cannot act recursively. It must only act on the immediate resource. For example, a link should not be followed.

There are some ordering requirements for this to work correctly. Let's take the following directories from the applying layer:

/a/b
/a/b/c

/a/b should probably come before /a/b/c for this to work correctly. If /a/b/d is encountered, it needs to be left alone, unless the layer above specifies a removal.

@wking
Copy link
Contributor

wking commented Sep 20, 2016

On Tue, Sep 20, 2016 at 02:11:20PM -0700, Stephen Day wrote:

-Where the basename name of ./etc/my-app-config is now prefixed with .wh., and will therefore be removed when the changeset is applied.
+where the basename name of ./etc/my-app-config is now prefixed with .wh., and will therefore be removed when the changeset is applied.

Could we reword this so it is not a partial sentence? It doesn't
even really complete the clause above.

How about “Where the” → “The”?

-If the file path is a directory, then the existing path just has it's attribute set from the layer changeset for that filepath.
-If the file path is any other file type (regular file, FIFO, etc), then the:
-* file path is unlinked (See unlink(2))
-* create the file

  • * If a regular file then content written.
    -* set attributes on the filepath
    +If the entry and the existing path are both directories, then the existing path's attributes MUST be replaced by those of the entry in the changeset.
    +In all other cases, the implementation MUST do the semantic equivalent of the following:
    +- removing the file path (e.g. unlink(2) on Linux systems)

This cannot act recursively. It must only act on the immediate
resource. For example, a link should not be followed.

Following links (which I think should never be done) and recursively
removing existing conflicts before applying a new entry (which I think
should be done unless both the new enty and existing filesystem object
are both directories) seem like orthogonal considerations.

There are some ordering requirements for this to work
correctly. Let's take the following directories from the applying
layer:

/a/b
/a/b/c

/a/b should probably come before /a/b/c for this to work
correctly. If /a/b/d is encountered, it needs to be left alone,
unless the layer above specifies a removal.

Where /a/b is a directory, I see no constraint on the tar-entry
ordering. Where /a/b is not a directory (e.g. it's a regular file),
then I expect:

/a/b
/a/b/c

to result in a tree with /a/b/c, where /a/b has unspecified
attributes. And I expect:

/a/b/c
/a/b

to result in a tree with /a/b, where /a/b is the file from the
tar-entry.

@vbatts
Copy link
Member

vbatts commented Oct 6, 2016

yep
LGTM

Approved with PullApprove

@jonboulle
Copy link
Contributor Author

Needs an LGTM from @stevvooe since he requested changes

@@ -204,31 +204,29 @@ The resulting tar archive for `rootfs-c9d-v1.s1` has the following entries:
./etc/.wh.my-app-config
```

Where the basename name of `./etc/my-app-config` is now prefixed with `.wh.`, and will therefore be removed when the changeset is applied.
The basename name of `./etc/my-app-config` is prefixed with `.wh.`, which signifies that it MUST be removed when the changeset is applied.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still fairly awkward. Do you intend to signify or to prefix? I think this sentence can get better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonboulle Maybe this:

To signify that the resource ./etc/my-app-config MUST be removed, the basename of the entry is prefixed with .wh..

@stevvooe
Copy link
Contributor

@jonboulle The first sentence is still quite awkward. If you're stuck, let me know and I'll suggest a wording. The rest looks fine.

wking added a commit to wking/image-tools that referenced this pull request Oct 18, 2016
Implementing the logic that is in-flight with [1], but using recursive
removal [2].  GNU tar has a --recursive-unlink option that's not
enabled by default, with the motivation being something like "folks
would be mad if we blew away a full tree and replaced it with a broken
symlink" [3].  That makes sense for working filesystems, but we're
building the rootfs from scratch here so losing information is not a
concern.  This commit always uses recursive removal to get that old
thing off the filesystem (whatever it takes ;).

The exception to the removal is if both the tar entry and existing
path occupant are directories.  In this case we want to use GNU tar's
default --overwrite-dir behavior, but unpackLayer's metadata handling
is currently very weak so I've left it at "don't delete the old
directory".

The reworked directory case also fixes a minor bug from 44210d0
(cmd/oci-image-tool: fix unpacking..., 2016-07-22, opencontainers#177) where the:

  if fi, err := os.Lstat(path); !(err == nil && fi.IsDir()) {

block would not error out if the Lstat failed for a reason besides the
acceptable IsNotExist.  Instead, it would attempt to call MkdirAll,
which would probably fail for the same reason that Lstat failed
(permissions?).  But it's better to handle the Lstat errors directly.

[1]: opencontainers/image-spec#317
[2]: https://github.com/opencontainers/image-spec/pull/317/files#r79214718
[3]: https://www.gnu.org/software/tar/manual/html_node/Dealing-with-Old-Files.html

Signed-off-by: W. Trevor King <wking@tremily.us>
wking added a commit to wking/image-tools that referenced this pull request Oct 18, 2016
Implementing the logic that is in-flight with [1], but using recursive
removal [2].  GNU tar has a --recursive-unlink option that's not
enabled by default, with the motivation being something like "folks
would be mad if we blew away a full tree and replaced it with a broken
symlink" [3].  That makes sense for working filesystems, but we're
building the rootfs from scratch here so losing information is not a
concern.  This commit always uses recursive removal to get that old
thing off the filesystem (whatever it takes ;).

The exception to the removal is if both the tar entry and existing
path occupant are directories.  In this case we want to use GNU tar's
default --overwrite-dir behavior, but unpackLayer's metadata handling
is currently very weak so I've left it at "don't delete the old
directory".

The reworked directory case also fixes a minor bug from 44210d0
(cmd/oci-image-tool: fix unpacking..., 2016-07-22, opencontainers#177) where the:

  if fi, err := os.Lstat(path); !(err == nil && fi.IsDir()) {

block would not error out if the Lstat failed for a reason besides the
acceptable IsNotExist.  Instead, it would attempt to call MkdirAll,
which would probably fail for the same reason that Lstat failed
(e.g. ENOTDIR).  But it's better to handle the Lstat errors directly.

[1]: opencontainers/image-spec#317
[2]: https://github.com/opencontainers/image-spec/pull/317/files#r79214718
[3]: https://www.gnu.org/software/tar/manual/html_node/Dealing-with-Old-Files.html

Signed-off-by: W. Trevor King <wking@tremily.us>
@jonboulle
Copy link
Contributor Author

@stevvooe really not sure how to make this clearer ("signifies" is used in a similar way elsewhere in the document) so suggestions welcome!

Signed-off-by: Jonathan Boulle <jonathanboulle@gmail.com>
Signed-off-by: Jonathan Boulle <jonathanboulle@gmail.com>
@jonboulle
Copy link
Contributor Author

adopted @stevvooe's wording

@stevvooe
Copy link
Contributor

stevvooe commented Oct 20, 2016

LGTM

Approved with PullApprove

1 similar comment
@vbatts
Copy link
Member

vbatts commented Oct 20, 2016

LGTM

Approved with PullApprove

@vbatts vbatts merged commit 6179d36 into opencontainers:master Oct 20, 2016
@jonboulle jonboulle deleted the layer branch October 21, 2016 16:09
wking referenced this pull request in opencontainers/umoci Nov 10, 2016
We have to use Docker's symlink pkg to make unpacking safe when it comes
to symlinks inside path components (and also scoping in some senses). I
also included some cleaning of hdr.Name just for safety. I will add unit
tests for this.

VOTE STERLING ARCHER 2020

Signed-off-by: Aleksa Sarai <asarai@suse.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants