RFC: Add support for overlay maps #106

maleadt · 2022-11-19T13:59:22Z

Adds overlay maps to the usernamespace sandbox so that we can create overlay mounts (e.g. to make read-only mounts mutable) in a controlled manner (i.e. without having to peek into the executor's persistence_dir):

using Sandbox

overlay = mktempdir()
config = SandboxConfig(
    Dict("/" => Sandbox.debian_rootfs(),
         "/root/.julia/packages" => "/home/tim/.julia/packages");
    overlay_maps = Dict("/root/.julia/packages" => overlay),
    stdin, stdout, stderr, persist=false, verbose=true,
)

with_executor(UnprivilegedUserNamespacesExecutor) do exe
    run(exe, config, ignorestatus(`/bin/bash -c "touch /root/.julia/packages/foo"`))
end

# $overlay/upper/foo now exists

verbose sandbox enabled (running in unprivileged container mode)
Parsed --rootfs as "/home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f"
Parsed --cd as "/"
Parsed --map as "/home/tim/.julia/packages" -> "/root/.julia/packages"
Parsed --overlay as "/tmp/jl_XwYhld" -> "/root/.julia/packages"
Parsed --uid as "0"
Parsed --gid as "0"
Child Process PID is 1712994
--> Mapping 1000:1000 to 0:0 within container namespace
--> Creating overlay workdir at /root
--> Mounting overlay of /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f at /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f (modifications in /root/upper/rootfs, workspace in /root/work/rootfs)
--> Bind-mounting /home/tim/.julia/packages over /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f/root/.julia/packages (read-only)
--> Mounting procfs at /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f/proc
--> Bind-mounting /dev/null over /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f//dev/null (read-write)
--> Bind-mounting /dev/tty over /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f//dev/tty (read-write)
--> Bind-mounting /dev/zero over /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f//dev/zero (read-write)
--> Bind-mounting /dev/random over /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f//dev/random (read-write)
--> Bind-mounting /dev/urandom over /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f//dev/urandom (read-write)
--> Bind-mounting /dev/shm over /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f//dev/shm (read-write)
--> Bind-mounting /sys over /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f//sys (read-only)
--> Bind-mounting /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f/dev/pts/ptmx over /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f/dev/ptmx (read-write)
--> Mounting overlay of /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f/root/.julia/packages at /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f/root/.julia/packages (modifications in /tmp/jl_XwYhld/upper, workspace in /tmp/jl_XwYhld/work)
Entering rootfs at /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f
--> pivot_root() succeeded and unmounted old root
About to run `/bin/bash` `-c` `touch /root/.julia/packages/foo`
Child Process 1712994 exited with code 0

Intended use case: PkgEval, where we need a way to mount the package store. That mount needs to be read-only, since concurrent access results in corruption, yet we need a way to catch changes and add them to the cache afterwards. We cannot use layered depots for this because compilecache files hard-code the path to the Julia sources (see JuliaCI/PkgEval.jl#158).

It would be cleaner if the overlay dir passed to the container would only be used as the upper dir, so that we can directly use the files in there, and not have to worry about removing the work directory (which has bad permissions that trigger a Julia bug, and may even be root-owned in the case of the PrivilegedUserNamespaceExecutor), but sadly the workdir has to live on the same filesystem as the upper dir.

codecov · 2022-11-19T14:04:25Z

Codecov Report

Base: 79.71% // Head: 79.71% // No change to project coverage 👍

Coverage data is based on head (975c302) compared to base (da1ccc1).
Patch has no changes to coverable lines.

❗ Current head 975c302 differs from pull request most recent head a3d1210. Consider uploading reports for the commit a3d1210 to get more accurate results

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #106   +/-   ##
=======================================
  Coverage   79.71%   79.71%           
=======================================
  Files           6        6           
  Lines         567      567           
=======================================
  Hits          452      452           
  Misses        115      115

Impacted Files	Coverage Δ
src/SandboxConfig.jl	`91.30% <ø> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

maleadt · 2022-11-20T09:54:02Z

~~Fixed an issue: maps now need to be mounted before procfs or we aren't using the tmpfs meant for overlays.~~

~~The remaining CI failure is the test that read-only mounts are actually read-only (from the sandbox POV), which is of course what this PR is proposing to change.~~

maleadt · 2022-11-21T09:43:49Z

Scratch that, I've redesigned the feature after some thinking, introducing an --overlay HOST_PATH:SANDBOX_PATH to be used in combination with read-only maps. This keeps everything working as it currently does, while introducing (UserNSExecutor-only) overlay functionality to make read-only maps mutable in a controlled manner (i.e. without having to peek into the persistence_dir):

using Sandbox

overlay = mktempdir()
config = SandboxConfig(
    Dict("/" => Sandbox.debian_rootfs(),
         "/root/.julia/packages" => "/home/tim/.julia/packages");
    overlay_maps = Dict("/root/.julia/packages" => overlay),
    stdin, stdout, stderr, persist=true, verbose=true,
)

with_executor(UnprivilegedUserNamespacesExecutor) do exe
    run(exe, config, ignorestatus(`/bin/bash -c "touch /root/.julia/packages/foo"`))
end
run(`find $overlay`)

The only annoying part is that this requires --persist, i.e., it doesn't work with the ephemeral one mounted at /proc because mount_overlays runs after procfs has been mounted again. We also cannot reorder that code, because it needs to run after mount_maps (since overlays are put on top of other mounts), yet mount_maps requires procfs to look into mtab.

maleadt · 2022-11-21T13:44:34Z

OK, made this work with persist=false too. This should be good to go, or good to review at least.

@staticfloat Could you explain why you were mounting the overlayfs-backing tmpfs on /proc? That breaks lots of things (and is why you quickly restore it), but I don't see why we're doing this in the first place. Many other directories are guaranteed to exist as per FSH, and this is outside of the sandbox' root we'll pivot into anyway, so should be invisible wherever we mount it. In this PR, I've changed it to mount at /root, which seems to work fine.

… modifications.

These directories aren't visible anyway since they're outside of the sandbox root.

Both work and upper need to be on the same mount anyway.

maleadt · 2022-11-22T21:10:40Z

FWIW, I actually started out with something simple that made read-only mounts use overlays, 3e534e1, but because it would then behave differently from the Docker executor I went with a separate --overlays option.

staticfloat · 2022-11-23T04:34:00Z

Okay, I've been thinking about this, and while I like it, I'm not happy with the API. With this, we have the following:

A way to create read-write mappings (--workspace)
A way to create read-only mappings (--map)
By default, an overlay that allows modifications to the rootfs (but not to read-only mounts), which can be persisted or not (--persist)
A way to selectively apply an overlay to paths, granting similar functionality to --persist but to the read-only mounts (--overlay).

IMO, this is too complicated. In terms of realistic needs, I think we only actually care about the following things:

I need to be able to modify files in the host through read-write mappings.
I want to be able to do bad things in the sandbox without messing up anything in my outer system.
I want to be able to persist changes.

Improved API

From this perspective, I think a better API would be to have the following pieces of functionality:

--map and --workspace remain. (perhaps with a unified API, --map host:guest and --map host:guest:ro, if we wanted to copy docker)
We apply overlays to all read-only mounts (including the rootfs) by default. We store them all in the persistence directory so --persist works with all of them at once.

What we lose with this is the ability to get read-only file system errors, which I think is fine. Regarding docker compatibility, I think we can get this working in docker as well by generating an entrypoint bash script that runs the moral equivalent of:

for mount in $ro_mount_list; do
    workdir=$persist_dir/$(hash $mount)
    mkdir -p $workdir
    mount -t overlay overlay -o lowerdir=$mount,upperdir=$workdir/upper,workdir=$workdir/work $mount
done
umount $persist_dir

Since we usually run with --privileged, that should work.

maleadt · 2022-11-23T09:17:37Z

OK, I implemented (the core part) of that API. One problem that surfaced is that overlayfs doesn't support mounting a single file, breaking e.g. the internet tests here (which mount /etc/resolv.conf). Do we want to support that? I guess we could detect the file case, mount the parent dir in a temporary location and bind-mount the file, but that feels hacky.

EDIT: pushed that hack. Makes mount looks funny, but it works:

Child Process PID is 1914942
--> Mapping 1000:1000 to 0:0 within container namespace
--> Creating overlay workdir at /root
...
--> Mounting overlay of /tmp at /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f/proc/etc/resolv.conf (modifications in /root/upper/etc/resolv.conf, workspace in /root/work/etc/resolv.conf)
--> Bind-mounting /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f/proc/etc/resolv.conf/resolv.conf over /home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f/etc/resolv.conf (read-write)
...
About to run `/bin/bash` `-c` `mount`
overlay on / type overlay (rw,relatime,lowerdir=/home/tim/Julia/depot/artifacts/4d66e139e0bcfdfa5ec6a8942a938e754e17860f,upperdir=/root/upper/rootfs,workdir=/root/work/rootfs,index=off,xino=off,metacopy=off)
overlay on /proc/etc/resolv.conf type overlay (rw,relatime,lowerdir=/tmp,upperdir=/root/upper/etc/resolv.conf,workdir=/root/work/etc/resolv.conf,index=off,xino=off,metacopy=off)
overlay on /etc/resolv.conf type overlay (rw,relatime,lowerdir=/tmp,upperdir=/root/upper/etc/resolv.conf,workdir=/root/work/etc/resolv.conf,index=off,xino=off,metacopy=off)

maleadt · 2022-11-25T15:00:06Z

Pushed a commit that does the entrypoint hack as suggested by @staticfloat. I'm not particularly happy with that, as it only works when we have appropriate privileges, but on the other hand with --overlay we didn't support any of this using the Docker executor so I guess it's an improvement of some sort.

EDIT:

[  387.893369] overlayfs: filesystem on '/var/overlay//read_only/upper' not supported as upperdir

Gah. What if we used a host dir, and implement persistence just like how it's done with the UserNS executor?

Well, that works, except for the rootfs. Guess Linux doesn't like double overlayfs mounts? Using two separate persistence mechanisms works around this, but is not nice.

…the rootfs.

Otherwise we stop executing tests as soon as we fail one executor's tests.

maleadt · 2022-12-02T12:01:18Z

I'm not happy with how this turned out; the Docker entrypoint solution (in order to preserve compatibility with the UserNS sandbox) too way too messy: using two persistence mechanisms, requiring privileged containers AND overlayfs support, and likely to break when using stripped container images (say, not containing a /bin/sh).

So instead I've opted to switch PkgEval over to using OCI container engines directly, JuliaCI/PkgEval.jl#177. That gives me the control I need (pretty easy, too; instead of generating a SandboxConfig I just need to build slightly more verbose JSON), as well as many other features I would have to add to the UserNS sandbox here (notably cgroup resource limits).

So I'll go ahead and close this. FWIW, I think it would make sense to remove the UserNS sandbox here and build Sandbox.jl on top of an OCI container runtime; the code from JuliaCI/PkgEval.jl#177 could help with that. It supports almost all features of Sandbox.jl, with the exception of whole-root persistence (mounting an overlay at / seems unsupported by both crun and runc).

maleadt force-pushed the overlay_maps branch 6 times, most recently from 52763d4 to a957e3a Compare November 20, 2022 09:51

maleadt changed the title ~~RFC: Mount (read-only) maps as overlays to support modifications~~ RFC: Add support for overlay maps Nov 21, 2022

maleadt mentioned this pull request Nov 21, 2022

Implement safe caching using overlays. JuliaCI/PkgEval.jl#160

Closed

maleadt force-pushed the overlay_maps branch 5 times, most recently from 3f0e4dd to a9e5778 Compare November 21, 2022 13:43

maleadt marked this pull request as ready for review November 21, 2022 13:44

maleadt force-pushed the overlay_maps branch from a9e5778 to 8b0109b Compare November 21, 2022 13:51

User-namespace sandbox: Mount (read-only) maps as overlays to support…

528101f

… modifications.

maleadt force-pushed the overlay_maps branch 5 times, most recently from 77dc981 to 21b1e58 Compare November 21, 2022 14:18

maleadt added 3 commits November 21, 2022 15:27

Simplify overlay mounting: use /root, not /proc.

c33be13

These directories aren't visible anyway since they're outside of the sandbox root.

Use the host overlay dir for the workddir too.

f46f442

Both work and upper need to be on the same mount anyway.

Work around JuliaLang/julia#47650.

2f4dee7

maleadt force-pushed the overlay_maps branch from 21b1e58 to 2f4dee7 Compare November 21, 2022 14:27

Simplify things: just mount read-only maps as mutable overlays.

8012b8f

Handle overlayfs mounts of files.

975c302

Emulate overlay-based maps on Docker using an entrypoint script.

3aba7bf

maleadt force-pushed the overlay_maps branch 2 times, most recently from b119001 to c6ce387 Compare November 25, 2022 15:17

Use a host persistence dir for overlaying read-only maps.

49e50ef

maleadt force-pushed the overlay_maps branch from c6ce387 to 49e50ef Compare November 25, 2022 15:49

Mount an explicit tmpfs to work around failure to mount overlayfs in …

b219cc4

…the rootfs.

maleadt force-pushed the overlay_maps branch from d5896d4 to b219cc4 Compare November 25, 2022 16:22

maleadt added 2 commits November 28, 2022 13:18

Encompass executor testsets in another testset.

d91fe85

Otherwise we stop executing tests as soon as we fail one executor's tests.

Entrypoint improvements.

a3d1210

maleadt mentioned this pull request Nov 29, 2022

Switch from Sandbox.jl to OCI runtime (crun) JuliaCI/PkgEval.jl#177

Merged

maleadt closed this Dec 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Add support for overlay maps #106

RFC: Add support for overlay maps #106

maleadt commented Nov 19, 2022 •

edited

Loading

codecov bot commented Nov 19, 2022 •

edited

Loading

maleadt commented Nov 20, 2022 •

edited

Loading

maleadt commented Nov 21, 2022 •

edited

Loading

maleadt commented Nov 21, 2022 •

edited

Loading

maleadt commented Nov 22, 2022

staticfloat commented Nov 23, 2022

maleadt commented Nov 23, 2022 •

edited

Loading

maleadt commented Nov 25, 2022 •

edited

Loading

maleadt commented Dec 2, 2022

RFC: Add support for overlay maps #106

RFC: Add support for overlay maps #106

Conversation

maleadt commented Nov 19, 2022 • edited Loading

codecov bot commented Nov 19, 2022 • edited Loading

Codecov Report

maleadt commented Nov 20, 2022 • edited Loading

maleadt commented Nov 21, 2022 • edited Loading

maleadt commented Nov 21, 2022 • edited Loading

maleadt commented Nov 22, 2022

staticfloat commented Nov 23, 2022

Improved API

maleadt commented Nov 23, 2022 • edited Loading

maleadt commented Nov 25, 2022 • edited Loading

maleadt commented Dec 2, 2022

maleadt commented Nov 19, 2022 •

edited

Loading

codecov bot commented Nov 19, 2022 •

edited

Loading

maleadt commented Nov 20, 2022 •

edited

Loading

maleadt commented Nov 21, 2022 •

edited

Loading

maleadt commented Nov 21, 2022 •

edited

Loading

maleadt commented Nov 23, 2022 •

edited

Loading

maleadt commented Nov 25, 2022 •

edited

Loading