-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
seccomp: allow to override default errno return code #1087
Conversation
This makes the As previously discussed, it is not entirely clear when that default should be applied. To any system call not listed in the policy? Is |
I have always believed (1) |
Probably not, I've picked it just for backward compatibility.
I personally think the default for other actions should still be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Actually had a branch with this, but didn't find the time yet to open 😂
thanks anyone for the reviews. Before merging let's wait for @cyphar to weight in as he was looking the most at this issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I'm fine with this PR, but as I've discussed before -- just to make it completely explicit -- this does not solve the I think it would've been nicer to do this as one big changeset -- I think it would be a bad idea to release a new runtime-spec which adds this and doesn't address the larger
I agree, and the reason I wanted to wait before we did this change is because I wanted us to include all of the other changes we need to solve this problem properly in the spec. But we can take this change now and do the other necessary changes later. |
By that argument, merging this PR would make the spec "unreleaseable". Given how thorny this problem seems to be, and how there have been calls for a new version for ~8 months, would it be feasible to release |
Why can't the spec just be fixed to specify what runc is doing to fix the problem? |
would this happen when a syscall is not fully specified in the profile and the defaultErrnoRet is used or are there other cases? |
That is the main issue -- but given that libseccomp today doesn't let you create fully-specified rules (depending on how complicated the allow rule is), that's sufficient to say that In addition, in my view we should not be relying on profile authors to solve the To answer @richfelker, I tihnk that the spec-level solution should be a little bit less janky than the current runc solution. (Ideally we would do this through minimum kernel version information.)
Yeah let's do that first. |
true that won't fully solve the issue with
do you see the kernel version information as a replacement for |
They can both co-exist -- the lack of |
Did you mean In the long term I really hope we can bypass the seccomp generation and add to the specs something like containers/crun#578 (in the seccomp section). Alternatively, at least agree on something nicer than JSON and that users are not expected to deal directly with the libseccomp limitations. |
Oops, yeah that's what I meant. And practically this is more critical than
I agree, though I am on the fence on whether we should still have HOWEVER there are upsides to the current system:
It should also be noted that while the current seccomp schema is based around libseccomp's limitations, nothing is stopping us from switching away from libseccomp internally. |
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
the specs already support overriding the errno code for the syscalls but the default value is hardcoded to EPERM. Add a new attribute to override the default value. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
58cac83
to
f7ef278
Compare
I still think it is useful to allow the raw BPF. I have already considered the pre-compilation of BPF in the past, since most containers use the same BPF profile, to avoid the cost of looking up syscalls in libseccomp. The lookup function in 2.4.0 had The raw BPF feature should not replace the existing mechanism, neither be the default; but it would open up the possibility to experiment with different ways of generating the BPF. IMO, users should be able to shoot themselves in the foot if they wish so.
Could we move forward with it as it is just for completeness and we can address the |
Yeah we can merge this first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
This allows application to detect whether the kernel supports syscall or not. Previously, an error unconditionally was EPERM. There are many issues about glibc failed to new syscalls in containerized environments for which host run on old kernel. More about motivation for ENOSYS over EPERM: opencontainers/runc#2151 opencontainers/runc#2750 See about defaultErrnoRet introduction: opencontainers/runtime-spec#1087 Previously, FreeIPA profile was vendored from https://github.com/containers/podman/blob/main/vendor/github.com/containers/common/pkg/seccomp/seccomp.json Now it is merged directly from https://github.com/containers/common/blob/main/pkg/seccomp/seccomp.json
This allows application to detect whether the kernel supports syscall or not. Previously, an error was unconditionally EPERM. There are many issues about glibc failed with new syscalls in containerized environments if their host run on old kernel. More about motivation for ENOSYS over EPERM: opencontainers/runc#2151 opencontainers/runc#2750 See about defaultErrnoRet introduction: opencontainers/runtime-spec#1087 Previously, FreeIPA profile was vendored from https://github.com/containers/podman/blob/main/vendor/github.com/containers/common/pkg/seccomp/seccomp.json Now it is merged directly from https://github.com/containers/common/blob/main/pkg/seccomp/seccomp.json Fixes: https://pagure.io/freeipa/issue/9008 Signed-off-by: Stanislav Levin <slev@altlinux.org>
This allows application to detect whether the kernel supports syscall or not. Previously, an error was unconditionally EPERM. There are many issues about glibc failed with new syscalls in containerized environments if their host run on old kernel. More about motivation for ENOSYS over EPERM: opencontainers/runc#2151 opencontainers/runc#2750 See about defaultErrnoRet introduction: opencontainers/runtime-spec#1087 Previously, FreeIPA profile was vendored from https://github.com/containers/podman/blob/main/vendor/github.com/containers/common/pkg/seccomp/seccomp.json Now it is merged directly from https://github.com/containers/common/blob/main/pkg/seccomp/seccomp.json Fixes: https://pagure.io/freeipa/issue/9008 Signed-off-by: Stanislav Levin <slev@altlinux.org> Reviewed-By: Alexander Bokovoy <abokovoy@redhat.com>
This allows application to detect whether the kernel supports syscall or not. Previously, an error unconditionally was EPERM. There are many issues about glibc failed to new syscalls in containerized environments for which host run on old kernel. More about motivation for ENOSYS over EPERM: opencontainers/runc#2151 opencontainers/runc#2750 See about defaultErrnoRet introduction: opencontainers/runtime-spec#1087 Previously, FreeIPA profile was vendored from https://github.com/containers/podman/blob/main/vendor/github.com/containers/common/pkg/seccomp/seccomp.json Now it is merged directly from https://github.com/containers/common/blob/main/pkg/seccomp/seccomp.json
Arg, we've run into some fun with this downstream in moby/moby thanks to errno values being a per-architecture thing: moby/moby#42836 (comment) / moby/moby#42681 (comment) $ git grep -inE 'ENOSYS[[:space:]]+=' | grep linux | column -t
unix/zerrors_linux_386.go:588: ENOSYS = syscall.Errno(0x26)
unix/zerrors_linux_amd64.go:588: ENOSYS = syscall.Errno(0x26)
unix/zerrors_linux_arm.go:594: ENOSYS = syscall.Errno(0x26)
unix/zerrors_linux_arm64.go:585: ENOSYS = syscall.Errno(0x26)
unix/zerrors_linux_mips.go:591: ENOSYS = syscall.Errno(0x59)
unix/zerrors_linux_mips64.go:591: ENOSYS = syscall.Errno(0x59)
unix/zerrors_linux_mips64le.go:591: ENOSYS = syscall.Errno(0x59)
unix/zerrors_linux_mipsle.go:591: ENOSYS = syscall.Errno(0x59)
unix/zerrors_linux_ppc.go:646: ENOSYS = syscall.Errno(0x26)
unix/zerrors_linux_ppc64.go:650: ENOSYS = syscall.Errno(0x26)
unix/zerrors_linux_ppc64le.go:650: ENOSYS = syscall.Errno(0x26)
unix/zerrors_linux_riscv64.go:575: ENOSYS = syscall.Errno(0x26)
unix/zerrors_linux_s390x.go:650: ENOSYS = syscall.Errno(0x26)
unix/zerrors_linux_sparc64.go:639: ENOSYS = syscall.Errno(0x5a) Perhaps this should've been a string? 😞 |
Hi! I love this change. Is there any plan to add a check for this to the runtime-tools tests? |
indeed, that was a mistake :/ Any suggestions on how we can fix it? Do we need a new field and obsolete the current one? |
The portable way would probably to have a string representation, so that the runtime that sets the actual profile can do the conversion 🤔 FWIW, I think this would only be a problem if a profile is read from a JSON, and the JSON is created on / with a non-matching architecture in mind, but I can see a string representation easier to use (when writing a profile). |
If we decide to add a new field, we could make the new filed take precedence (if new field is set, old field must be ignored) |
yes, I agree with it. I've used the same technique for containers/common seccomp profiles: containers/common#821 |
…MIPS The value of ENOSYS differs between MIPS and non-MIPS architectures. While this is not problematic for the embedded seccomp profile, it prevents the profile from being saved as a portable JSON file that can be used for both architectures. To work around this situation, we include conditional rules for both arches. and hard-code the value for ENOSYS in both. For more details, refer to moby#42836 (comment) and opencontainers/runtime-spec#1087 (comment) A test was added, which can be run with: go test --tags=seccomp -run TestLoadConditionalClone3 -v ./profiles/seccomp/ === RUN TestLoadConditionalClone3 === RUN TestLoadConditionalClone3/clone3_default_amd64 === RUN TestLoadConditionalClone3/clone3_default_mips === RUN TestLoadConditionalClone3/clone3_cap_sys_admin_amd64 === RUN TestLoadConditionalClone3/clone3_cap_sys_admin_mips --- PASS: TestLoadConditionalClone3 (0.01s) --- PASS: TestLoadConditionalClone3/clone3_default_amd64 (0.00s) --- PASS: TestLoadConditionalClone3/clone3_default_mips (0.00s) --- PASS: TestLoadConditionalClone3/clone3_cap_sys_admin_amd64 (0.00s) --- PASS: TestLoadConditionalClone3/clone3_cap_sys_admin_mips (0.00s) PASS ok github.com/docker/docker/profiles/seccomp 0.015s Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
For the meantime, for moby, I opened moby/moby#43005, which uses conditional rules for MIPS / non-MIPS, with hard-coded values for |
…MIPS The value of ENOSYS differs between MIPS and non-MIPS architectures. While this is not problematic for the embedded seccomp profile, it prevents the profile from being saved as a portable JSON file that can be used for both architectures. To work around this situation, we include conditional rules for both arches. and hard-code the value for ENOSYS in both. For more details, refer to moby#42836 (comment) and opencontainers/runtime-spec#1087 (comment) A test was added, which can be run with: go test --tags=seccomp -run TestLoadConditionalClone3 -v ./profiles/seccomp/ === RUN TestLoadConditionalClone3 === RUN TestLoadConditionalClone3/clone3_default_amd64 === RUN TestLoadConditionalClone3/clone3_default_mips === RUN TestLoadConditionalClone3/clone3_cap_sys_admin_amd64 === RUN TestLoadConditionalClone3/clone3_cap_sys_admin_mips --- PASS: TestLoadConditionalClone3 (0.01s) --- PASS: TestLoadConditionalClone3/clone3_default_amd64 (0.00s) --- PASS: TestLoadConditionalClone3/clone3_default_mips (0.00s) --- PASS: TestLoadConditionalClone3/clone3_cap_sys_admin_amd64 (0.00s) --- PASS: TestLoadConditionalClone3/clone3_cap_sys_admin_mips (0.00s) PASS ok github.com/docker/docker/profiles/seccomp 0.015s Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
From the perspective of the specification, this hasn't actually been released yet, so I think we have a small amount of latitude in fixing this "properly", right? I imagine actual users in the wild are limited to the two cases already identified, and probably only really (I doubt this has been an extremely popular feature for users to take advantage of in their custom profiles -- the number of times I've seen someone write a non-trivial custom seccomp profile that wasn't just "download the default from my runtime and change one minor thing" I can count on a single hand. 🙈) |
I think we can probably drop Would you be fine if I add two fields |
Good call; looks indeed like it was not in a release yet; v1.0.2...main From that perspective, I'd be 👍 to fix it before it's released. With the added note that we need to check what projects already shipped with this feature in use (mostly concerned if, e.g., |
Commit 491daf4 ("iv_fd_epoll: Add support for epoll_pwait2().") added support for epoll_pwait2(), with a fallback to epoll_wait() in case epoll_pwait2() is not supported by the kernel we are running on, which would be indicated by epoll_pwait2() returning -ENOSYS. Some reports (e.g. axoflow/axosyslog#85 , #33 (comment) ) suggest that some container technologies can cause -EPERM to be returned for epoll_pwait2(), independently of whether or not epoll_pwait2() is actually supported by the kernel we are running on, and this trips us up because we don't currently handle -EPERM gracefully, as we did not expect that we would have to do so. Making system calls return -EPERM to indicate that they were filtered out by a security policy framework seems somewhat dubious, especially when considering the amount of application and user confusion generated by system calls that are not documented as being able to fail with -EPERM now suddenly being able to fail with -EPERM, but there is not much we can do about this. I would be against adding EPERM-as-ENOSYS fallbacks for every current or future case where we handle ENOSYS, but: 1. it seems that this is the only case where this triggers; 2. upstream seems to agree that this EPERM behavior is a bug (see e.g. these links dug up by László Várady: containers/common#573 , containers/podman#10337 , opencontainers/runtime-spec#1087 ), so there will hopefully be no new cases of this in the future; 3. there's at least one container technology release (podman on CentOS 7) where this bug triggers and where the platform is sufficiently old to no longer be receiving updates, as pointed out by Balazs Scheidler, so this issue can't be fixed by users updating their container software. Under these circumstances, adding a workaround on our end seems reasonable, and this commit does so. This issue was originally reported by @mstopa-splunk on GitHub. Workaround originally by Balazs Scheidler. Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
This allows application to detect whether the kernel supports syscall or not. Previously, an error was unconditionally EPERM. There are many issues about glibc failed with new syscalls in containerized environments if their host run on old kernel. More about motivation for ENOSYS over EPERM: opencontainers/runc#2151 opencontainers/runc#2750 See about defaultErrnoRet introduction: opencontainers/runtime-spec#1087 Previously, FreeIPA profile was vendored from https://github.com/containers/podman/blob/main/vendor/github.com/containers/common/pkg/seccomp/seccomp.json Now it is merged directly from https://github.com/containers/common/blob/main/pkg/seccomp/seccomp.json Fixes: https://pagure.io/freeipa/issue/9008 Signed-off-by: Stanislav Levin <slev@altlinux.org> Reviewed-By: Alexander Bokovoy <abokovoy@redhat.com>
opencontainers/runtime-spec#1087 added support for defaultErrnoRet to the OCI runtime specs. If a defaultErrnoRet is specified, disable patching the generated libseccomp cBPF. Closes: opencontainers/runc#2943 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
opencontainers/runtime-spec#1087 added support for defaultErrnoRet to the OCI runtime specs. If a defaultErrnoRet is specified, disable patching the generated libseccomp cBPF. Closes: opencontainers/runc#2943 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
opencontainers/runtime-spec#1087 added support for defaultErrnoRet to the OCI runtime specs. If a defaultErrnoRet is specified, disable patching the generated libseccomp cBPF. Closes: opencontainers/runc#2943 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
the specs already support overriding the errno code for the syscalls
but the default value is hardcoded to EPERM.
Add a new attribute to override the default value.
Signed-off-by: Giuseppe Scrivano gscrivan@redhat.com