Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd: add volume healer #2108

Merged
merged 9 commits into from
Jul 16, 2021
Merged

rbd: add volume healer #2108

merged 9 commits into from
Jul 16, 2021

Conversation

pkalever
Copy link

Describe what this PR does

Problem:

For rbd nbd userspace mounter backends, after a restart of the nodeplugin all the mounts will start seeing IO errors. This is because, for rbd-nbd backends there will be a userspace mount daemon running per volume, post restart of the nodeplugin pod, there is no way to restore the daemons back to life.

Solution:

The volume healer is a one-time activity that is triggered at the startup time of the rbd nodeplugin. It navigates through the list of volume attachments on the node and acts accordingly.

For now, it is limited to nbd type storage only, but it is flexible and can be extended in the future for other backend types as needed.

From a few feets above:
This solves a severe problem for nbd backed csi volumes. The healer while going through the list of volume attachments on the node, if finds the volume is in attached state and is of type nbd, then it will attempt to fix the rbd-nbd volumes by sending a NodeStageVolume request with the required volume attributes like secrets, device name, image attributes, and etc.. which will finally help start the required rbd-nbd daemons in the nodeplugin csi-rbdplugin container. This will allow reattaching the backend images with the right nbd device, thus allowing the applications to perform IO without any interruptions even after a nodeplugin restart.

Some points to be noted

Q: What is the route for the upgrade path?
A: For upgrade:

  • Make sure no volume attachments of type nbd exists, meaning all fresh NodeStageVolume requests should happen after these changes are available so that the device attribute is captured in metadata stash file on the immediate NodeStageVolume.
    Or
  • Maybe, before upgrading we need to add device attribute to metadata stash file of nbd type PVs, we could provide an automated script, or perform manual edits to the metadata stash very carefully.

Q: Should this patch use rbd integrated CLI instead of rbd-nbd CLI?
A: Maybe once a release is available including ceph/ceph#41279, for now, attach/detach commands are not promoted at rbd CLI.

Q: What if the Lock is acquired by some other op for a given attachment?
A: Throw error for that attachment and continue for the next attachment on the node

Q: What is the rbd-nbd reattach timeout?
A: 5mins, this means the node plugin pod should come back and the healer should finish triggering NodeStageVolume requests for all attachments on that node in 300 seconds

Q: What version of rbd-nbd is needed on the node plugin pod?
A: rbd-nbd version should satisfy ceph pacific version (v16)

Q: What happens if attach command is absent?
A: You will see something like, 'NodeStageVolume failed, err: ... bla bla unknown option 'attach cephfs.a.data/csi-vol-6dd424b9-b971-11eb-af01-7a53e508ee3b' errors in the logs

Q: What happens when there is an rbd-nbd command failure?
A: Similar to above

Q: What kind of logs will we see at the nodeplugin on a successful healer run?
A: Here are some logs:

I0531 12:20:35.455288 3103257 server.go:131] Listening for connections on address: &net.UnixAddr{Name:"//csi/csi.sock", Net:"unix"}
I0531 12:20:35.540709 3103257 rbd_healer.go:91] sending nodeStageVolume for volID: 0001-0024-35f25a92-fce8-4ab9-b80a-69b7b35e2ef3-0000000000000003-80d14346-c1f9-11eb-898d-0a3d575e4f27, stagingPath: /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-87b90909-cf74-4ae8-a27a-a87f39655f72/globalmount
I0531 12:20:35.549634 3103257 rbd_util.go:984] setting disableInUseChecks: false image features: [layering] mounter: rbd-nbd
I0531 12:20:35.550360 3103257 rbd_healer.go:91] sending nodeStageVolume for volID: 0001-0024-35f25a92-fce8-4ab9-b80a-69b7b35e2ef3-0000000000000003-80acce7a-c1f9-11eb-898d-0a3d575e4f27, stagingPath: /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-98e4b639-4960-4091-ba10-80c3603da961/globalmount
I0531 12:20:35.557438 3103257 rbd_util.go:984] setting disableInUseChecks: false image features: [layering] mounter: rbd-nbd
I0531 12:20:35.561080 3103257 rbd_healer.go:91] sending nodeStageVolume for volID: 0001-0024-35f25a92-fce8-4ab9-b80a-69b7b35e2ef3-0000000000000003-8142bc1a-c1f9-11eb-898d-0a3d575e4f27, stagingPath: /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-b8dcb0ae-74ca-422a-82a1-6331a0cf4b47/globalmount
I0531 12:20:35.567149 3103257 rbd_util.go:984] setting disableInUseChecks: false image features: [layering] mounter: rbd-nbd
I0531 12:20:35.572580 3103257 rbd_healer.go:91] sending nodeStageVolume for volID: 0001-0024-35f25a92-fce8-4ab9-b80a-69b7b35e2ef3-0000000000000003-80866a60-c1f9-11eb-898d-0a3d575e4f27, stagingPath: /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-17c243b2-0e13-4880-95a5-5ff49b992509/globalmount
I0531 12:20:35.578949 3103257 rbd_util.go:984] setting disableInUseChecks: false image features: [layering] mounter: rbd-nbd
I0531 12:20:35.594686 3103257 rbd_healer.go:91] sending nodeStageVolume for volID: 0001-0024-35f25a92-fce8-4ab9-b80a-69b7b35e2ef3-0000000000000003-807853e6-c1f9-11eb-898d-0a3d575e4f27, stagingPath: /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-0dce65fb-6f07-4627-9b99-eca888b46810/globalmount
I0531 12:20:35.607218 3103257 rbd_util.go:984] setting disableInUseChecks: false image features: [layering] mounter: rbd-nbd

[...]

I0531 12:20:37.102849 3103257 nodeserver.go:245] rbd volID: 0001-0024-35f25a92-fce8-4ab9-b80a-69b7b35e2ef3-0000000000000003-807853e6-c1f9-11eb-898d-0a3d575e4f27 was successfully attached to device: /dev/nbd2
I0531 12:20:37.105944 3103257 nodeserver.go:245] rbd volID: 0001-0024-35f25a92-fce8-4ab9-b80a-69b7b35e2ef3-0000000000000003-8142bc1a-c1f9-11eb-898d-0a3d575e4f27 was successfully attached to device: /dev/nbd0
I0531 12:20:37.106125 3103257 nodeserver.go:245] rbd volID: 0001-0024-35f25a92-fce8-4ab9-b80a-69b7b35e2ef3-0000000000000003-80866a60-c1f9-11eb-898d-0a3d575e4f27 was successfully attached to device: /dev/nbd3          
I0531 12:20:37.108396 3103257 nodeserver.go:245] rbd volID: 0001-0024-35f25a92-fce8-4ab9-b80a-69b7b35e2ef3-0000000000000003-80acce7a-c1f9-11eb-898d-0a3d575e4f27 was successfully attached to device: /dev/nbd4
I0531 12:20:37.157031 3103257 nodeserver.go:245] rbd volID: 0001-0024-35f25a92-fce8-4ab9-b80a-69b7b35e2ef3-0000000000000003-80d14346-c1f9-11eb-898d-0a3d575e4f27 was successfully attached to device: /dev/nbd1

Initial discussion summary:

We have captured the initial discussion at, https://hackmd.io/mA8CtRUPS4SSV9oVcDFVeg. This PR has few changes from the design, mainly we don't start a new sidecar, rather trigger it inside csi-rbdplugin container itself.

Fixes: #667 #1929

@mergify mergify bot added the component/rbd Issues related to RBD label May 31, 2021
@pkalever pkalever force-pushed the volume-healer branch 7 times, most recently from 35faadc to e00874b Compare June 1, 2021 06:48
@pkalever
Copy link
Author

pkalever commented Jun 2, 2021

@Madhu-1 @nixpanic @humblec Please help review this change. [ I don't have access to add reviewers]

Also, I would like to get this added to release-3.4.0 tracker.

Thanks!

@Rakshith-R
Copy link
Contributor

@pkalever can try adding nolint in the following manner and see if that solves nolintlint issue , ref https://golangci-lint.run/usage/false-positives/

//nolint:gocyclo // This legacy function is complex but the team too busy to simplify it
func someLegacyFunction() *string {
  // ...
}

@pkalever
Copy link
Author

pkalever commented Jun 2, 2021

@pkalever can try adding nolint in the following manner and see if that solves nolintlint issue , ref https://golangci-lint.run/usage/false-positives/

//nolint:gocyclo // This legacy function is complex but the team too busy to simplify it
func someLegacyFunction() *string {
  // ...
}

@Rakshith-R this will imply the nolint on the whole function(block of code), but we just want to get nolint applied for one single line (func funcName(param1, param2 ...) {).

Hope it makes sense now.

@Madhu-1 Madhu-1 requested review from nixpanic, Madhu-1 and humblec June 2, 2021 07:14
@Rakshith-R
Copy link
Contributor

unparam linter is meant for functions https://github.com/mvdan/unparam.
Can you format by adding a new line like the following to prevent lll linter issue (improves readability)

func execCommandInContainer(f *framework.Framework, c, ns, cn string, opt *metav1.ListOptions) 
(string, string, error) { 

@Madhu-1 Madhu-1 added this to the release-3.4.0 milestone Jun 2, 2021
@Madhu-1
Copy link
Collaborator

Madhu-1 commented Jun 2, 2021

@pkalever Thanks for this one. Let's get confirmation on the approach then we can do the PR review?

@pkalever
Copy link
Author

pkalever commented Jun 2, 2021

@pkalever Thanks for this one. Let's get confirmation on the approach then we can do the PR review?

[As discussed already]
This is the same design we discussed a couple of times in the sync-up meetings before (attendees: You, Neils, Yug, and others), there are very few minor deviations as far as I can remember. But, I respect your opinion to get this also validated by others who might be missing in the meeting.

Many Thanks!

internal/rbd/nodeserver.go Show resolved Hide resolved
internal/rbd/rbd_util.go Outdated Show resolved Hide resolved
internal/rbd/rbd_util.go Outdated Show resolved Hide resolved
internal/rbd/rbd_util.go Outdated Show resolved Hide resolved
internal/rbd/nodeserver.go Outdated Show resolved Hide resolved
internal/rbd/nodeserver.go Outdated Show resolved Hide resolved
internal/rbd/rbd_attach.go Outdated Show resolved Hide resolved
internal/rbd/rbd_attach.go Outdated Show resolved Hide resolved
internal/rbd/rbd_healer.go Outdated Show resolved Hide resolved
@Rakshith-R
Copy link
Contributor

Rakshith-R commented Jun 3, 2021

How about using https://kubernetes.io/docs/concepts/workloads/pods/init-containers/ ?

(Or at the least have a knob to activate / deactivate volume healer )
cc @nixpanic @humblec @Madhu-1 @agarwal-mudit

@pkalever
Copy link
Author

pkalever commented Jun 3, 2021

How about using https://kubernetes.io/docs/concepts/workloads/pods/init-containers/ ?

(Or at the least have a knob to activate / deactivate volume healer )
cc @nixpanic @humblec @Madhu-1 @agarwal-mudit

The ultimate goal is to bring the rbd-nbd processes back to life within the csi-rbdplugin pod.

We thought about init containers initially, but the init containers should start and have to finish before the main container starts. Here our main container is csi-rbdplugin where the rbd-nbd processes should start and keep running, what are we supposed to do in the init containers and be done there before the main container starts?

Given csi-rbdplugin container will not be started until the init containers finish, We want something to run in parallel with csi-rbdplugin, so that the main container will handle NodeStageVolume calls.

Copy link
Author

@pkalever pkalever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @nixpanic for the review. I will respin ASAP.

internal/rbd/nodeserver.go Outdated Show resolved Hide resolved
internal/rbd/nodeserver.go Outdated Show resolved Hide resolved
internal/rbd/rbd_attach.go Outdated Show resolved Hide resolved
internal/rbd/rbd_attach.go Outdated Show resolved Hide resolved
internal/rbd/rbd_healer.go Outdated Show resolved Hide resolved
@Rakshith-R
Copy link
Contributor

How about using https://kubernetes.io/docs/concepts/workloads/pods/init-containers/ ?
(Or at the least have a knob to activate / deactivate volume healer )
cc @nixpanic @humblec @Madhu-1 @agarwal-mudit

The ultimate goal is to bring the rbd-nbd processes back to life within the csi-rbdplugin pod.

We thought about init containers initially, but the init containers should start and have to finish before the main container starts. Here our main container is csi-rbdplugin where the rbd-nbd processes should start and keep running, what are we supposed to do in the init containers and be done there before the main container starts?

Given csi-rbdplugin container will not be started until the init containers finish, We want something to run in parallel with csi-rbdplugin, so that the main container will handle NodeStageVolume calls.

Thanks for the clarification !, but a command line argument to enable volume healer (default being false) will be great considering we dont want to run this operation for people not using rbd-nbd.

@nixpanic
Copy link
Member

nixpanic commented Jun 3, 2021

Thanks for the clarification !, but a command line argument to enable volume healer (default being false) will be great considering we dont want to run this operation for people not using rbd-nbd.

We do not know in advance if users will configure a StorageClass with rbd-nbd or not. Having it as a global option/parameter would make this more error prone. It is more user-friendly to detect the need for it automatically.

When a worker node does not have any VolumeAttachments with type mounter: rbd-nbd, the starting of the NodeServer should not get much delayed.

scripts/golangci.yml.in Outdated Show resolved Hide resolved
internal/rbd/rbd_healer.go Outdated Show resolved Hide resolved
@pkalever
Copy link
Author

pkalever commented Jun 3, 2021

We do not know in advance if users will configure a StorageClass with rbd-nbd or not. Having it as a global option/parameter would make this more error prone. It is more user-friendly to detect the need for it automatically.

When a worker node does not have any VolumeAttachments with type mounter: rbd-nbd, the starting of the NodeServer should not get much delayed.

Completely agree with Niels here. I don't think we should have an option to enable Volume Healer.

@pkalever pkalever force-pushed the volume-healer branch 2 times, most recently from 1a259c8 to e6823bb Compare June 3, 2021 15:04
@pkalever
Copy link
Author

FYI, It seems that, we will have an updated today. Meanwhile @pkalever we can rebase and ready to go .

Done with the rebasing.

Given this is a new feature, we should give it some time. That will help soak the feature in the devel branch, the CI should help us catch some issues prior if there are any. So let's not postpone the merge until the last day of release.

@humblec Could you please remove the DNM flag. Given it's been close to 9-10 days we have the DNM blocking us on this route, it will be nice if we can lift this off and let this go.

Given the 3.4.0 release is very close it is high time that we make this decision now to avoid any last-minute regressions.
Rbd-NBD is alpha support, so we should be good to make any improvements later.

@Madhu-1 @nixpanic with the latest rebase the approvals are lost, please help replace them.

Thanks!

@humblec
Copy link
Collaborator

humblec commented Jul 16, 2021

FYI, It seems that, we will have an updated today. Meanwhile @pkalever we can rebase and ready to go .

Done with the rebasing.
Given this is a new feature, we should give it some time. That will help soak the feature in the devel branch, the CI should help us catch some issues prior if there are any. So let's not postpone the merge until the last day of release.

@humblec Could you please remove the DNM flag. Given it's been close to 9-10 days we have the DNM blocking us on this route, it will be nice if we can lift this off and let this go.

@pkalever please accept the fact that, in between this has happened:

  • We got our first design doc (just 3/4 days back) which would have been ideally added before this PR itself for better review of the design.

  • We have taken out removal or locks from nodestage and nodeunstage just to help this feature to GET in Get rid of locking at node {publish,unpublish} operations #2149

  • Eventhough it would have been ideal to add test cases for core areas this is touching, we are managed to very basic ones at this stage, but agreed to improve upon at the earliest.

  • Also tried to get second pair of eyes on the design doc or the approach ( it would have been given yesterday, but not yet) as we have seen many issues in the mount/unmount area with races in place and there are some corner cases left. Additionally we all accept the fact that there are race scenarios in our current volume healer path which is getting discussed in the design.

Given the 3.4.0 release is very close it is high time that we make this decision now to avoid any last-minute regressions.
Rbd-NBD is alpha support, so we should be good to make any improvements later.

As discussed in yesterday's and previous calls, eventhough it is alpha, if there are regressions its difficult to get cycled to users once rook have a release with the old images. so the extra care here.

While DNM was added, we added it with a timeout , considering it was hit by today, I am removing DNM for now .

@humblec humblec removed the DNM DO NOT MERGE label Jul 16, 2021
@Madhu-1
Copy link
Collaborator

Madhu-1 commented Jul 16, 2021

@Mergifyio rebase

Madhu-1
Madhu-1 previously approved these changes Jul 16, 2021
@mergify
Copy link
Contributor

mergify bot commented Jul 16, 2021

Command rebase: success

Branch has been successfully rebased

@mergify mergify bot dismissed Madhu-1’s stale review July 16, 2021 07:40

Pull request has been modified.

@pkalever
Copy link
Author

@pkalever please accept the fact that, in between this has happened:

  • We got our first design doc (just 3/4 days back) which would have been ideally added before this PR itself for better review of the design.

As mentioned earlier It was a quick move!
I will keep this in mind for the next RFEs, I thought discussing the design with the team and having a rough doc like https://hackmd.io/mA8CtRUPS4SSV9oVcDFVeg and the google doc that I shared in the meetings is good enough.

yes, the conflicting part is resolved.

  • Eventhough it would have been ideal to add test cases for core areas this is touching, we are managed to very basic ones at this stage, but agreed to improve upon at the earliest.

We still have this on our plate, we will make sure to get them soon.
Here is the issue: #2262

  • Also tried to get second pair of eyes on the design doc or the approach ( it would have been given yesterday, but not yet) as we have seen many issues in the mount/unmount area with races in place and there are some corner cases left. Additionally we all accept the fact that there are race scenarios in our current volume healer path which is getting discussed in the design.

Yes, getting a second pair of eyes from others is always good, but please I request to get them at the earliest when the PR is posted or the design is proposed/documented. (again I'm not saying that it wasn't my mistake to document the design)

Given the 3.4.0 release is very close it is high time that we make this decision now to avoid any last-minute regressions.
Rbd-NBD is alpha support, so we should be good to make any improvements later.

As discussed in yesterday's and previous calls, eventhough it is alpha, if there are regressions its difficult to get cycled to users once rook have a release with the old images. so the extra care here.

Yes, which is why I want this to land ASAP, so that there is some chance for CI to catch the regressions, if there are any.

While DNM was added, we added it with a timeout , considering it was hit by today, I am removing DNM for now .

Yes, and thanks for removing the DNM now.

Cheers to the userspace mounters!!

@pkalever
Copy link
Author

k8s-e2e-external-storage-1.21 fails

�[1mSTEP�[0m: Creating Pod in namespace fsgroupchangepolicy-7379 with fsgroup 1000
Jul 16 09:08:01.889: INFO: Pod fsgroupchangepolicy-7379/pod-581d0be0-0104-4e86-a33b-0f742b9feee2 started successfully
�[1mSTEP�[0m: Creating a sub-directory and file, and verifying their ownership is 1000
Jul 16 09:08:01.889: INFO: ExecWithOptions {Command:[/bin/sh -c touch /mnt/volume1/file1] Namespace:fsgroupchangepolicy-7379 PodName:pod-581d0be0-0104-4e86-a33b-0f742b9feee2 ContainerName:write-pod Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false Quiet:false}
Jul 16 09:08:01.889: INFO: >>> kubeConfig: /tmp/kubeconfig.pzFXDrGS
Jul 16 09:08:02.069: INFO: ExecWithOptions {Command:[/bin/sh -c ls -l /mnt/volume1/file1] Namespace:fsgroupchangepolicy-7379 PodName:pod-581d0be0-0104-4e86-a33b-0f742b9feee2 ContainerName:write-pod Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:false Quiet:false}
Jul 16 09:08:02.069: INFO: >>> kubeConfig: /tmp/kubeconfig.pzFXDrGS
Jul 16 09:08:02.251: INFO: pod fsgroupchangepolicy-7379/pod-581d0be0-0104-4e86-a33b-0f742b9feee2 exec for cmd ls -l /mnt/volume1/file1, stdout: -rw-r--r--    1 root     root             0 Jul 16 08:08 /mnt/volume1/file1, stderr: 
Jul 16 09:08:02.251: INFO: stdout split: [-rw-r--r-- 1 root root 0 Jul 16 08:08 /mnt/volume1/file1], expected gid: 1000
Jul 16 09:08:02.252: FAIL: Expected

@pkalever
Copy link
Author

mini-e2e-helm_k8s-1.20 & mini-e2e_k8s-1.19

* Failed to start kvm2 VM. Running "minikube delete" may fix it: creating host: create: Error creating machine: Error in driver during machine creation: IP not available after waiting: machine minikube didn't return IP after 1 minute

X Exiting due to GUEST_PROVISION: Failed to start host: creating host: create: Error creating machine: Error in driver during machine creation: IP not available after waiting: machine minikube didn't return IP after 1 minute
* 

@pkalever
Copy link
Author

�[1mSTEP�[0m: create a thick-provisioned PVC-PVC clone and bind it to an app
Jul 16 11:15:36.112: INFO: Running '/usr/bin/kubectl --server=https://192.168.39.194:8443 --kubeconfig=/root/.kube/config --namespace=cephcsi-e2e-9d87da431905 --namespace=cephcsi-e2e-9d87da431905 delete -f -'
Jul 16 11:15:36.199: INFO: stderr: "warning: deleting cluster-scoped resources, not scoped to the provided namespace\n"
Jul 16 11:15:36.199: INFO: stdout: "storageclass.storage.k8s.io \"csi-rbd-sc\" deleted\n"
Jul 16 11:15:36.204: INFO: ExecWithOptions {Command:[/bin/sh -c ceph fsid] Namespace:rook-ceph PodName:rook-ceph-tools-7467d8bf8-skswj ContainerName:rook-ceph-tools Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}
Jul 16 11:15:36.204: INFO: >>> kubeConfig: /root/.kube/config
Jul 16 11:15:37.139: INFO: Waiting up to &PersistentVolumeClaim{ObjectMeta:{rbd-pvc  rbd-1318    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] []  []},Spec:PersistentVolumeClaimSpec{AccessModes:[ReadWriteOnce],Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{storage: {{1073741824 0} {<nil>} 1Gi BinarySI},},},VolumeName:,Selector:nil,StorageClassName:*csi-rbd-sc,VolumeMode:nil,DataSource:nil,},Status:PersistentVolumeClaimStatus{Phase:,AccessModes:[],Capacity:ResourceList{},Conditions:[]PersistentVolumeClaimCondition{},},} to be in Bound state
Jul 16 11:15:37.140: INFO: waiting for PVC rbd-pvc (0 seconds elapsed)
Jul 16 11:15:39.145: INFO: waiting for PVC rbd-pvc (2 seconds elapsed)
Jul 16 11:15:55.144: INFO: waiting for PVC rbd-pvc (18 seconds elapsed)
Jul 16 11:16:19.144: INFO: waiting for PVC rbd-pvc (42 seconds elapsed)
Jul 16 11:17:03.144: INFO: waiting for PVC rbd-pvc (86 seconds elapsed)
Jul 16 11:17:15.910: INFO: Error getting pvc "rbd-pvc" in namespace "rbd-1318": etcdserver: request timed out
Jul 16 11:17:17.144: INFO: waiting for PVC rbd-pvc (100 seconds elapsed)
Jul 16 11:17:35.145: INFO: waiting for PVC rbd-pvc (118 seconds elapsed)
Jul 16 11:17:59.144: INFO: waiting for PVC rbd-pvc (142 seconds elapsed)
Jul 16 11:18:01.541: INFO: Error getting pvc "rbd-pvc" in namespace "rbd-1318": rpc error: code = Unknown desc = OK: HTTP status code 200; transport: missing content-type field
Jul 16 11:18:01.541: FAIL: failed to create PVC with error failed to get pvc: rpc error: code = Unknown desc = OK: HTTP status code 200; transport: missing content-type field

@pkalever
Copy link
Author

/test ci/centos/mini-e2e-helm/k8s-1.21

Prasanna Kumar Kalever added 9 commits July 16, 2021 13:31
path is used by standard package.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
As part of stage transaction if the mounter is of type nbd, then capture
device path after a successful rbd-nbd map.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Nodeplugin needs below cluster roles:
persistentvolumes: get
volumeattachments: list, get

These additional permissions are needed by the volume healer. Volume healer
aims at fixing the volume health issues at the very startup time of the
nodeplugin. As part of its operations, volume healer has to run through
the list of volume attachments and understand details about each
persistentvolume.

The later commits will use these additional cluster roles.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Problem:
-------
For rbd nbd userspace mounter backends, after a restart of the nodeplugin
all the mounts will start seeing IO errors. This is because, for rbd-nbd
backends there will be a userspace mount daemon running per volume, post
restart of the nodeplugin pod, there is no way to restore the daemons
back to life.

Solution:
--------
The volume healer is a one-time activity that is triggered at the startup
time of the rbd nodeplugin. It navigates through the list of volume
attachments on the node and acts accordingly.

For now, it is limited to nbd type storage only, but it is flexible and
can be extended in the future for other backend types as needed.

From a few feets above:
This solves a severe problem for nbd backed csi volumes. The healer while
going through the list of volume attachments on the node, if finds the
volume is in attached state and is of type nbd, then it will attempt to
fix the rbd-nbd volumes by sending a NodeStageVolume request with the
required volume attributes like secrets, device name, image attributes,
and etc.. which will finally help start the required rbd-nbd daemons in
the nodeplugin csi-rbdplugin container. This will allow reattaching the
backend images with the right nbd device, thus allowing the applications
to perform IO without any interruptions even after a nodeplugin restart.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
This will bring down the healer run time by a great factor.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Now that the healer functionaity for mounter processes is available,
lets start, using it.

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
warnings from golangci-lint:

e2e/pod.go:207:122: directive `//nolint:unparam,lll // cn can be used
with different inputs later` is unused for linter unparam (nolintlint)
func execCommandInContainer(f *framework.Framework, c, ns, cn string,
opt *metav1.ListOptions) (string, string, error) { //nolint:unparam,lll
// cn can be used with different inputs later

e2e/pod.go:307:70: directive `//nolint:unparam // skipNotFound can be
used with different inputs later` is unused for linter unparam (nolintlint)
func deletePodWithLabel(label, ns string, skipNotFound bool) error {
//nolint:unparam // skipNotFound can be used with different inputs later

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
@mergify mergify bot merged commit 10fc639 into ceph:devel Jul 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/rbd Issues related to RBD Priority-2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Qualify or move RBD NBD to full support
6 participants