Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel mounter refuses to mount when cephfs is built into kernel #4376

Closed
Infinoid opened this issue Jan 16, 2024 · 1 comment · Fixed by #4378
Closed

Kernel mounter refuses to mount when cephfs is built into kernel #4376

Infinoid opened this issue Jan 16, 2024 · 1 comment · Fixed by #4378
Assignees
Labels
bug Something isn't working component/cephfs Issues related to CephFS

Comments

@Infinoid
Copy link
Contributor

Describe the bug

When running on a kernel configured with CONFIG_CEPH_FS=y (not =m), the nodeplugin refuses to mount.

The mountKernel function begins with a call to modprobe ceph, even if the /proc/filesystems list already includes ceph. If that modprobe command fails, it doesn't even try to mount the volume.

When the kernel has enabled cephfs as a builtin feature (not as a module), modprobe won't find a file named ceph.ko, and thus, it doesn't work.

Environment details

  • Image/version of Ceph CSI driver : 3.10.1
  • Helm chart version : ceph-csi-cephfs-3.10.1
  • Kernel version : 6.6.6
  • Mounter used for mounting PVC : kernel
  • Kubernetes cluster version : 1.29.0
  • Ceph cluster version : 18.2.1

Steps to reproduce

Steps to reproduce the behavior:

  1. Build a custom kernel for your k8s nodes, which builds cephfs support directly into the kernel image.
  2. Install it on your k8s nodes.
  3. Make a cephfs PVC and attach it to a pod.
  4. Witness the destruction.

Actual results

Pods fail to start. The node plugin logs talk about modprobe failing.

Expected behavior

It mounts the thing and everything works.

Logs

I0116 12:10:16.999298    1227 nodeserver.go:312] ID: 18 Req-ID: 0001-0024-b5d35fba-7048-11ed-9ed9-001e0651162c-0000000000000006-a2253aa3-ef73-498b-acd5-89b97138e206 cephfs: mounting volume 0001-0024-b5d35fba-7048-11ed-9ed9-001e0651162c-0000000000000006-a2253aa3-ef73-498b-acd5-89b97138e206 with Ceph kernel client
I0116 12:10:17.000508    1227 cephcmds.go:98] ID: 18 Req-ID: 0001-0024-b5d35fba-7048-11ed-9ed9-001e0651162c-0000000000000006-a2253aa3-ef73-498b-acd5-89b97138e206 an error (exit status 1) occurred while running modprobe args: [ceph]
E0116 12:10:17.000518    1227 nodeserver.go:322] ID: 18 Req-ID: 0001-0024-b5d35fba-7048-11ed-9ed9-001e0651162c-0000000000000006-a2253aa3-ef73-498b-acd5-89b97138e206 failed to mount volume 0001-0024-b5d35fba-7048-11ed-9ed9-001e0651162c-0000000000000006-a2253aa3-ef73-498b-acd5-89b97138e206: an error (exit status 1) occurred while running modprobe args: [ceph] Check dmesg logs if required.
E0116 12:10:17.000533    1227 utils.go:169] ID: 18 Req-ID: 0001-0024-b5d35fba-7048-11ed-9ed9-001e0651162c-0000000000000006-a2253aa3-ef73-498b-acd5-89b97138e206 GRPC error: rpc error: code = Internal desc = an error (exit status 1) occurred while running modprobe args: [ceph]

Additional context

I was able to prove that modprobe is the only issue, by applying a hacky workaround. I exec'ed into the node plugin container and replaced the modprobe command with a symlink to /bin/true. After doing this, the node plugin mounted the cephfs volume successfully and the pods started, and everything seems to work great.

I had built my kernel this way in an attempt to optimize the kernel I use on k8s nodes, to minimize the attack surface and simplify the build/boot processes. I think these are worthwhile goals, and while building ceph as a module is extremely common, ceph csi's dependency on that is a bit problematic.

Maybe turn the error into a warning? Or check if ceph is in /proc/filesystems before doing the modprobe? Or try mounting anyway before returning the error?

@nixpanic nixpanic added bug Something isn't working component/cephfs Issues related to CephFS labels Jan 16, 2024
@nixpanic
Copy link
Member

You are correct, if /proc/filesystems contains cephfs there is no need to even try and load the cephfs kernel module.

@nixpanic nixpanic self-assigned this Jan 16, 2024
nixpanic added a commit to nixpanic/ceph-csi that referenced this issue Jan 16, 2024
By reading the contents of /proc/filesystems, and checking if "ceph" is
included there, running "modprobe ceph" can be skipped.

Fixes: ceph#4376
Signed-off-by: Niels de Vos <ndevos@ibm.com>
nixpanic added a commit to nixpanic/ceph-csi that referenced this issue Jan 16, 2024
By reading the contents of /proc/filesystems, and checking if "ceph" is
included there, running "modprobe ceph" can be skipped.

Fixes: ceph#4376
Signed-off-by: Niels de Vos <ndevos@ibm.com>
nixpanic added a commit to nixpanic/ceph-csi that referenced this issue Jan 16, 2024
By reading the contents of /proc/filesystems, and checking if "ceph" is
included there, running "modprobe ceph" can be skipped.

Fixes: ceph#4376
Signed-off-by: Niels de Vos <ndevos@ibm.com>
nixpanic added a commit to nixpanic/ceph-csi that referenced this issue Jan 17, 2024
By reading the contents of /proc/filesystems, and checking if "ceph" is
included there, running "modprobe ceph" can be skipped.

Fixes: ceph#4376
Signed-off-by: Niels de Vos <ndevos@ibm.com>
nixpanic added a commit to nixpanic/ceph-csi that referenced this issue Jan 17, 2024
By reading the contents of /proc/filesystems, and checking if "ceph" is
included there, running "modprobe ceph" can be skipped.

Fixes: ceph#4376
Signed-off-by: Niels de Vos <ndevos@ibm.com>
@mergify mergify bot closed this as completed in #4378 Jan 17, 2024
mergify bot pushed a commit that referenced this issue Jan 17, 2024
By reading the contents of /proc/filesystems, and checking if "ceph" is
included there, running "modprobe ceph" can be skipped.

Fixes: #4376
Signed-off-by: Niels de Vos <ndevos@ibm.com>
mergify bot pushed a commit that referenced this issue Jan 17, 2024
By reading the contents of /proc/filesystems, and checking if "ceph" is
included there, running "modprobe ceph" can be skipped.

Fixes: #4376
Signed-off-by: Niels de Vos <ndevos@ibm.com>
(cherry picked from commit ab87045)
mergify bot pushed a commit that referenced this issue Jan 18, 2024
By reading the contents of /proc/filesystems, and checking if "ceph" is
included there, running "modprobe ceph" can be skipped.

Fixes: #4376
Signed-off-by: Niels de Vos <ndevos@ibm.com>
(cherry picked from commit ab87045)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component/cephfs Issues related to CephFS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants