Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add support for overlay maps #106

Closed
wants to merge 11 commits into from

Conversation

maleadt
Copy link
Contributor

@maleadt maleadt commented Nov 19, 2022

Adds overlay maps to the usernamespace sandbox so that we can create overlay mounts (e.g. to make read-only mounts mutable) in a controlled manner (i.e. without having to peek into the executor's persistence_dir):

using Sandbox

overlay = mktempdir()
config = SandboxConfig(
    Dict("/" => Sandbox.debian_rootfs(),
         "/root/.julia/packages" => "/home/tim/.julia/packages");
    overlay_maps = Dict("/root/.julia/packages" => overlay),
    stdin, stdout, stderr, persist=false, verbose=true,
)

with_executor(UnprivilegedUserNamespacesExecutor) do exe
    run(exe, config, ignorestatus(`/bin/bash -c "touch /root/.julia/packages/foo"`))
end

# $overlay/upper/foo now exists
verbose sandbox enabled (running in unprivileged container mode)
Parsed --rootfs as "/home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f"
Parsed --cd as "/"
Parsed --map as "/home/tim/.julia/packages" -> "/root/.julia/packages"
Parsed --overlay as "/tmp/jl_XwYhld" -> "/root/.julia/packages"
Parsed --uid as "0"
Parsed --gid as "0"
Child Process PID is 1712994
--> Mapping 1000:1000 to 0:0 within container namespace
--> Creating overlay workdir at /root
--> Mounting overlay of /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f at /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f (modifications in /root/upper/rootfs, workspace in /root/work/rootfs)
--> Bind-mounting /home/tim/.julia/packages over /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f/root/.julia/packages (read-only)
--> Mounting procfs at /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f/proc
--> Bind-mounting /dev/null over /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f//dev/null (read-write)
--> Bind-mounting /dev/tty over /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f//dev/tty (read-write)
--> Bind-mounting /dev/zero over /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f//dev/zero (read-write)
--> Bind-mounting /dev/random over /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f//dev/random (read-write)
--> Bind-mounting /dev/urandom over /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f//dev/urandom (read-write)
--> Bind-mounting /dev/shm over /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f//dev/shm (read-write)
--> Bind-mounting /sys over /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f//sys (read-only)
--> Bind-mounting /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f/dev/pts/ptmx over /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f/dev/ptmx (read-write)
--> Mounting overlay of /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f/root/.julia/packages at /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f/root/.julia/packages (modifications in /tmp/jl_XwYhld/upper, workspace in /tmp/jl_XwYhld/work)
Entering rootfs at /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f
--> pivot_root() succeeded and unmounted old root
About to run `/bin/bash` `-c` `touch /root/.julia/packages/foo`
Child Process 1712994 exited with code 0

Intended use case: PkgEval, where we need a way to mount the package store. That mount needs to be read-only, since concurrent access results in corruption, yet we need a way to catch changes and add them to the cache afterwards. We cannot use layered depots for this because compilecache files hard-code the path to the Julia sources (see JuliaCI/PkgEval.jl#158).

It would be cleaner if the overlay dir passed to the container would only be used as the upper dir, so that we can directly use the files in there, and not have to worry about removing the work directory (which has bad permissions that trigger a Julia bug, and may even be root-owned in the case of the PrivilegedUserNamespaceExecutor), but sadly the workdir has to live on the same filesystem as the upper dir.

@codecov
Copy link

codecov bot commented Nov 19, 2022

Codecov Report

Base: 79.71% // Head: 79.71% // No change to project coverage 👍

Coverage data is based on head (975c302) compared to base (da1ccc1).
Patch has no changes to coverable lines.

❗ Current head 975c302 differs from pull request most recent head a3d1210. Consider uploading reports for the commit a3d1210 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #106   +/-   ##
=======================================
  Coverage   79.71%   79.71%           
=======================================
  Files           6        6           
  Lines         567      567           
=======================================
  Hits          452      452           
  Misses        115      115           
Impacted Files Coverage Δ
src/SandboxConfig.jl 91.30% <ø> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@maleadt maleadt force-pushed the overlay_maps branch 6 times, most recently from 52763d4 to a957e3a Compare November 20, 2022 09:51
@maleadt
Copy link
Contributor Author

maleadt commented Nov 20, 2022

Fixed an issue: maps now need to be mounted before procfs or we aren't using the tmpfs meant for overlays.

The remaining CI failure is the test that read-only mounts are actually read-only (from the sandbox POV), which is of course what this PR is proposing to change.

@maleadt
Copy link
Contributor Author

maleadt commented Nov 21, 2022

Scratch that, I've redesigned the feature after some thinking, introducing an --overlay HOST_PATH:SANDBOX_PATH to be used in combination with read-only maps. This keeps everything working as it currently does, while introducing (UserNSExecutor-only) overlay functionality to make read-only maps mutable in a controlled manner (i.e. without having to peek into the persistence_dir):

using Sandbox

overlay = mktempdir()
config = SandboxConfig(
    Dict("/" => Sandbox.debian_rootfs(),
         "/root/.julia/packages" => "/home/tim/.julia/packages");
    overlay_maps = Dict("/root/.julia/packages" => overlay),
    stdin, stdout, stderr, persist=true, verbose=true,
)

with_executor(UnprivilegedUserNamespacesExecutor) do exe
    run(exe, config, ignorestatus(`/bin/bash -c "touch /root/.julia/packages/foo"`))
end
run(`find $overlay`)

The only annoying part is that this requires --persist, i.e., it doesn't work with the ephemeral one mounted at /proc because mount_overlays runs after procfs has been mounted again. We also cannot reorder that code, because it needs to run after mount_maps (since overlays are put on top of other mounts), yet mount_maps requires procfs to look into mtab.

@maleadt maleadt changed the title RFC: Mount (read-only) maps as overlays to support modifications RFC: Add support for overlay maps Nov 21, 2022
@maleadt maleadt force-pushed the overlay_maps branch 5 times, most recently from 3f0e4dd to a9e5778 Compare November 21, 2022 13:43
@maleadt
Copy link
Contributor Author

maleadt commented Nov 21, 2022

OK, made this work with persist=false too. This should be good to go, or good to review at least.

@staticfloat Could you explain why you were mounting the overlayfs-backing tmpfs on /proc? That breaks lots of things (and is why you quickly restore it), but I don't see why we're doing this in the first place. Many other directories are guaranteed to exist as per FSH, and this is outside of the sandbox' root we'll pivot into anyway, so should be invisible wherever we mount it. In this PR, I've changed it to mount at /root, which seems to work fine.

@maleadt maleadt marked this pull request as ready for review November 21, 2022 13:44
@maleadt maleadt force-pushed the overlay_maps branch 5 times, most recently from 77dc981 to 21b1e58 Compare November 21, 2022 14:18
These directories aren't visible anyway since they're outside of
the sandbox root.
Both work and upper need to be on the same mount anyway.
@maleadt
Copy link
Contributor Author

maleadt commented Nov 22, 2022

FWIW, I actually started out with something simple that made read-only mounts use overlays, 3e534e1, but because it would then behave differently from the Docker executor I went with a separate --overlays option.

@staticfloat
Copy link
Collaborator

Okay, I've been thinking about this, and while I like it, I'm not happy with the API. With this, we have the following:

  1. A way to create read-write mappings (--workspace)
  2. A way to create read-only mappings (--map)
  3. By default, an overlay that allows modifications to the rootfs (but not to read-only mounts), which can be persisted or not (--persist)
  4. A way to selectively apply an overlay to paths, granting similar functionality to --persist but to the read-only mounts (--overlay).

IMO, this is too complicated. In terms of realistic needs, I think we only actually care about the following things:

  1. I need to be able to modify files in the host through read-write mappings.
  2. I want to be able to do bad things in the sandbox without messing up anything in my outer system.
  3. I want to be able to persist changes.

Improved API

From this perspective, I think a better API would be to have the following pieces of functionality:

  1. --map and --workspace remain. (perhaps with a unified API, --map host:guest and --map host:guest:ro, if we wanted to copy docker)
  2. We apply overlays to all read-only mounts (including the rootfs) by default. We store them all in the persistence directory so --persist works with all of them at once.

What we lose with this is the ability to get read-only file system errors, which I think is fine. Regarding docker compatibility, I think we can get this working in docker as well by generating an entrypoint bash script that runs the moral equivalent of:

for mount in $ro_mount_list; do
    workdir=$persist_dir/$(hash $mount)
    mkdir -p $workdir
    mount -t overlay overlay -o lowerdir=$mount,upperdir=$workdir/upper,workdir=$workdir/work $mount
done
umount $persist_dir

Since we usually run with --privileged, that should work.

@maleadt
Copy link
Contributor Author

maleadt commented Nov 23, 2022

OK, I implemented (the core part) of that API. One problem that surfaced is that overlayfs doesn't support mounting a single file, breaking e.g. the internet tests here (which mount /etc/resolv.conf). Do we want to support that? I guess we could detect the file case, mount the parent dir in a temporary location and bind-mount the file, but that feels hacky.

EDIT: pushed that hack. Makes mount looks funny, but it works:

Child Process PID is 1914942
--> Mapping 1000:1000 to 0:0 within container namespace
--> Creating overlay workdir at /root
...
--> Mounting overlay of /tmp at /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f/proc/etc/resolv.conf (modifications in /root/upper/etc/resolv.conf, workspace in /root/work/etc/resolv.conf)
--> Bind-mounting /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f/proc/etc/resolv.conf/resolv.conf over /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f/etc/resolv.conf (read-write)
...
About to run `/bin/bash` `-c` `mount`
overlay on / type overlay (rw,relatime,lowerdir=/home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f,upperdir=/root/upper/rootfs,workdir=/root/work/rootfs,index=off,xino=off,metacopy=off)
overlay on /proc/etc/resolv.conf type overlay (rw,relatime,lowerdir=/tmp,upperdir=/root/upper/etc/resolv.conf,workdir=/root/work/etc/resolv.conf,index=off,xino=off,metacopy=off)
overlay on /etc/resolv.conf type overlay (rw,relatime,lowerdir=/tmp,upperdir=/root/upper/etc/resolv.conf,workdir=/root/work/etc/resolv.conf,index=off,xino=off,metacopy=off)

@maleadt
Copy link
Contributor Author

maleadt commented Nov 25, 2022

Pushed a commit that does the entrypoint hack as suggested by @staticfloat. I'm not particularly happy with that, as it only works when we have appropriate privileges, but on the other hand with --overlay we didn't support any of this using the Docker executor so I guess it's an improvement of some sort.


EDIT:

[  387.893369] overlayfs: filesystem on '/var/overlay//read_only/upper' not supported as upperdir

Gah. What if we used a host dir, and implement persistence just like how it's done with the UserNS executor?


Well, that works, except for the rootfs. Guess Linux doesn't like double overlayfs mounts? Using two separate persistence mechanisms works around this, but is not nice.

@maleadt maleadt force-pushed the overlay_maps branch 2 times, most recently from b119001 to c6ce387 Compare November 25, 2022 15:17
Otherwise we stop executing tests as soon as we fail one executor's tests.
@maleadt
Copy link
Contributor Author

maleadt commented Dec 2, 2022

I'm not happy with how this turned out; the Docker entrypoint solution (in order to preserve compatibility with the UserNS sandbox) too way too messy: using two persistence mechanisms, requiring privileged containers AND overlayfs support, and likely to break when using stripped container images (say, not containing a /bin/sh).

So instead I've opted to switch PkgEval over to using OCI container engines directly, JuliaCI/PkgEval.jl#177. That gives me the control I need (pretty easy, too; instead of generating a SandboxConfig I just need to build slightly more verbose JSON), as well as many other features I would have to add to the UserNS sandbox here (notably cgroup resource limits).

So I'll go ahead and close this. FWIW, I think it would make sense to remove the UserNS sandbox here and build Sandbox.jl on top of an OCI container runtime; the code from JuliaCI/PkgEval.jl#177 could help with that. It supports almost all features of Sandbox.jl, with the exception of whole-root persistence (mounting an overlay at / seems unsupported by both crun and runc).

@maleadt maleadt closed this Dec 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants