change node staging path for csi driver to PV agnostic #107065

saikat-royc · 2021-12-16T00:11:31Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

Without this fix, PVs pointing to the same underlying volume handle, fail to mount on the same node. This is because kubelet is confused into thinking that the volume is already node staged.
Fixes:

By making the staging path PV agnostic, a unique volume will be node staged only once. And this path will be bind-mounted on the pod specific path during pod bringup on nodes.
The global path uses a sha of the volumeId. This means the node staged path can no longer be used to backtrace the PV information (which is used to figure out Driver and VolumeID). This change therefore creates a hard dependency on the vol_data.json to exist with correct driver and volumeID information.
A fallback option is provided such that if the vol_data.json is missing, kubelet will try to parse the PV information from the old path format.

Which issue(s) this PR fixes:

Fixes #105899

Special notes for your reviewer:

CSI migration flows should not be impacted, because it is expected that the node is drained before migration is enabled.

An example staging path looks like this for PDCSI driver:

saikatroyc@e2e-test-saikatroyc-minion-group-5h8f ~ $ sudo ls /var/lib/kubelet/plugins/kubernetes.io/csi/pd.csi.storage.gke.io/d7b18a91056b638c3459de6043a10982e468f3b400c42c8e467e023b24aeab08
globalmount  vol_data.json

Does this PR introduce a user-facing change?

Change node staging path for csi driver to use a PV agnostic path. Nodes must be drained before updating the kubelet with this change.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

saikat-royc · 2021-12-16T00:11:40Z

/hold not yet ready for review.

saikat-royc · 2021-12-16T06:46:35Z

/retest

saikat-royc · 2021-12-20T22:20:50Z

/assign @jingxu97

saikat-royc · 2021-12-20T22:48:48Z

/retest

saikat-royc · 2021-12-20T23:26:53Z

/retest

saikat-royc · 2021-12-21T02:39:30Z

@jingxu97 @jsafrane the PR is ready for review. Based on the initial review comments, I will add a e2e test as needed.

jsafrane

This looks good to me if we explicitly add a release note that nodes must be drained during update. It is implied by upgrade, but IMO an explicit note would help to avoid confusion.

And we can't ever cherry pick this into an older release.

It would be trivial to test if the old staging path exists and has a valid json file in makeDeviceMountPath and use that (although, unit tests would be not that simple).

pkg/volume/csi/csi_attacher.go

pkg/volume/csi/csi_attacher_test.go

saikat-royc · 2021-12-21T18:29:19Z

This looks good to me if we explicitly add a release note that nodes must be drained during update. It is implied by upgrade, but IMO an explicit note would help to avoid confusion.

And we can't ever cherry pick this into an older release.

It would be trivial to test if the old staging path exists and has a valid json file in makeDeviceMountPath and use that (although, unit tests would be not that simple).

Added a release note. Also the UTs (TestAttacherUnmountDevice) cover the old and new path test cases.

saikat-royc · 2021-12-21T18:30:57Z

/hold cancel

saikat-royc · 2021-12-21T18:31:28Z

/kind bug

Kubernetes 1.24 and newer use a different path for staging the volume. That means the CSI-driver is requested to mount the volume at an other location, compared to previous versions of Kubernetes. CSI-drivers implementing the CSI-Addons NodeReclaimSpace procedure, must receive the correct path, otherwise the driver will not be able to free space and possibly return an error. See-also: kubernetes/kubernetes#107065 See-also: https://bugzilla.redhat.com/2096209 Signed-off-by: Niels de Vos <ndevos@redhat.com>

Kubernetes 1.24 and newer use a different path for staging the volume. That means the CSI-driver is requested to mount the volume at an other location, compared to previous versions of Kubernetes. CSI-drivers implementing the CSI-Addons NodeReclaimSpace procedure, must receive the correct path, otherwise the driver will not be able to free space and possibly return an error. See-also: kubernetes/kubernetes#107065 See-also: https://bugzilla.redhat.com/2096209 Signed-off-by: Niels de Vos <ndevos@redhat.com> (cherry picked from commit eb0718f)

Kubernetes 1.24 and newer use a different path for staging the volume. That means the CSI-driver is requested to mount the volume at an other location, compared to previous versions of Kubernetes. CSI-drivers implementing the CSI-Addons volumeHealer, must receive the correct path, otherwise the after a nodeplugin restart the NBD mounts will bailout attempting to NodeStageVolume() call and return an error. See-also: kubernetes/kubernetes#107065 Fixes: ceph#3176 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>

Kubernetes 1.24 and newer use a different path for staging the volume. That means the CSI-driver is requested to mount the volume at an other location, compared to previous versions of Kubernetes. The backward compatibility should be taken care by the CSI driver. See-also: kubernetes/kubernetes#107065 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>

Kubernetes 1.24 and newer use a different path for staging the volume. That means the CSI-driver is requested to mount the volume at an other location, compared to previous versions of Kubernetes. CSI-drivers implementing the volumeHealer, must receive the correct path, otherwise the after a nodeplugin restart the NBD mounts will bailout attempting to NodeStageVolume() call and return an error. See-also: kubernetes/kubernetes#107065 Fixes: ceph#3176 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>

Kubernetes 1.24 and newer use a different path for staging the volume. That means the CSI-driver is requested to mount the volume at an other location, compared to previous versions of Kubernetes. CSI-drivers implementing the volumeHealer, must receive the correct path, otherwise the after a nodeplugin restart the NBD mounts will bailout attempting to NodeStageVolume() call and return an error. See-also: kubernetes/kubernetes#107065 Fixes: #3176 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>

Kubernetes 1.24 and newer use a different path for staging the volume. That means the CSI-driver is requested to mount the volume at an other location, compared to previous versions of Kubernetes. CSI-drivers implementing the volumeHealer, must receive the correct path, otherwise the after a nodeplugin restart the NBD mounts will bailout attempting to NodeStageVolume() call and return an error. See-also: kubernetes/kubernetes#107065 Fixes: #3176 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> (cherry picked from commit 1da446d)

Kubernetes 1.24 and newer use a different path for staging the volume. That means the CSI-driver is requested to mount the volume at an other location, compared to previous versions of Kubernetes. The backward compatibility should be taken care by the CSI driver. See-also: kubernetes/kubernetes#107065 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> (cherry picked from commit 2627f32)

commit d9fcdae Merge: 6569b29 07f79d4 Author: weizhichen <weizhichen@microsoft.com> Date: Tue Apr 4 08:36:31 2023 +0000 Merge branch 'master' of https://github.com/kubernetes-sigs/blob-csi-driver into e2e-test commit 6569b29 Author: weizhichen <weizhichen@microsoft.com> Date: Mon Apr 3 11:37:40 2023 +0000 parallel again commit 9ed8a55 Merge: 551c409 a47bc07 Author: weizhichen <weizhichen@microsoft.com> Date: Mon Apr 3 10:27:26 2023 +0000 Merge branch 'master' of https://github.com/kubernetes-sigs/blob-csi-driver into e2e-test commit 551c409 Author: weizhichen <weizhichen@microsoft.com> Date: Mon Apr 3 10:21:04 2023 +0000 another flaky test commit 38e0b6a Author: weizhichen <weizhichen@microsoft.com> Date: Mon Apr 3 08:46:01 2023 +0000 fix panic commit cce9102 Author: weizhichen <weizhichen@microsoft.com> Date: Mon Apr 3 07:40:40 2023 +0000 flaky: e2e: fix pre-provisioned test commit 3b5a6ee Author: weizhichen <weizhichen@microsoft.com> Date: Mon Apr 3 03:19:55 2023 +0000 framework init commit 6fa9cd4 Author: weizhichen <weizhichen@microsoft.com> Date: Sun Apr 2 08:31:37 2023 +0000 flake attempts commit 7c752ae Author: weizhichen <weizhichen@microsoft.com> Date: Sun Apr 2 03:29:32 2023 +0000 cancel parallel commit cffd14a Author: weizhichen <weizhichen@microsoft.com> Date: Sun Apr 2 00:31:23 2023 +0000 flakeattempts commit c6a0266 Author: weizhichen <weizhichen@microsoft.com> Date: Sat Apr 1 20:10:30 2023 +0000 make nfs test serial commit 9bba102 Author: weizhichen <weizhichen@microsoft.com> Date: Sat Apr 1 01:46:32 2023 +0000 make private endpoint test serial commit fe1e3b8 Author: weizhichen <weizhichen@microsoft.com> Date: Sat Apr 1 00:52:47 2023 +0000 output-interceptor-mode=none commit c4a3a91 Author: weizhichen <weizhichen@microsoft.com> Date: Fri Mar 31 10:05:40 2023 +0000 no flake-attempts commit a94726a Author: weizhichen <weizhichen@microsoft.com> Date: Fri Mar 31 09:59:57 2023 +0000 gomega success commit f9638e2 Author: weizhichen <weizhichen@microsoft.com> Date: Fri Mar 31 09:27:02 2023 +0000 pass project root to e2e test commit 6b795c6 Author: weizhichen <weizhichen@microsoft.com> Date: Fri Mar 31 07:48:30 2023 +0000 fix commit 2a14b30 Author: weizhichen <weizhichen@microsoft.com> Date: Fri Mar 31 07:07:39 2023 +0000 fix restart driver commit 3553af8 Author: weizhichen <weizhichen@microsoft.com> Date: Fri Mar 31 06:37:30 2023 +0000 fix pre_provisioned_provided_credentials_tester commit 0aa27dd Author: weizhichen <weizhichen@microsoft.com> Date: Fri Mar 31 06:10:13 2023 +0000 move verify examples to ginkgo Node container commit 7ebd065 Author: weizhichen <weizhichen@microsoft.com> Date: Fri Mar 31 04:58:18 2023 +0000 add flake-attempts commit 02a48ba Author: weizhichen <weizhichen@microsoft.com> Date: Fri Mar 31 04:57:40 2023 +0000 Revert "use seed to repro" This reverts commit 1c5fea8. commit 1c5fea8 Author: weizhichen <weizhichen@microsoft.com> Date: Thu Mar 30 16:27:03 2023 +0000 use seed to repro commit c52b495 Author: weizhichen <weizhichen@microsoft.com> Date: Thu Mar 30 13:57:37 2023 +0000 fix: container name commit f1f55e4 Author: weizhichen <weizhichen@microsoft.com> Date: Thu Mar 30 13:23:25 2023 +0000 fix dynamic inline volume and byok volume commit 0e4b11e Author: weizhichen <weizhichen@microsoft.com> Date: Thu Mar 30 12:24:47 2023 +0000 revert --delete-namespace commit 7df5df4 Author: weizhichen <weizhichen@microsoft.com> Date: Thu Mar 30 10:53:45 2023 +0000 fix: set framework flags commit 103840c Author: weizhichen <weizhichen@microsoft.com> Date: Thu Mar 30 09:52:05 2023 +0000 set delete-namespace=false to avoid deleting ns which is used by other specs during parallel testing commit cda91f8 Author: weizhichen <weizhichen@microsoft.com> Date: Thu Mar 30 08:36:47 2023 +0000 fix: should notify all goroutine channel by close commit 0aa42e6 Author: weizhichen <weizhichen@microsoft.com> Date: Thu Mar 30 07:08:10 2023 +0000 fix NodeTimeout, need context commit 9c1fb1d Author: weizhichen <weizhichen@microsoft.com> Date: Thu Mar 30 06:49:52 2023 +0000 fix defer cleanup order commit b4d44b2 Author: weizhichen <weizhichen@microsoft.com> Date: Thu Mar 30 06:43:29 2023 +0000 add 10min GracePeriod for AfterSuite to avoid exit too quick commit 49fdec3 Author: weizhichen <weizhichen@microsoft.com> Date: Thu Mar 30 05:22:55 2023 +0000 workaround the issue: kubernetes/kubernetes#107065 commit 30369fb Author: weizhichen <weizhichen@microsoft.com> Date: Thu Mar 30 05:19:35 2023 +0000 fix restartDriverTest panic commit 67ff546 Author: weizhichen <weizhichen@microsoft.com> Date: Thu Mar 30 05:00:44 2023 +0000 fix: dump namespace info commit 54a2d2f Author: weizhichen <weizhichen@microsoft.com> Date: Thu Mar 30 04:25:58 2023 +0000 add log after driver pod is restarted commit fd32cab Author: weizhichen <weizhichen@microsoft.com> Date: Thu Mar 30 03:04:35 2023 +0000 fix commit 7074288 Author: weizhichen <weizhichen@microsoft.com> Date: Thu Mar 30 02:37:11 2023 +0000 adjust AccountCreationLeak check threshold commit b86e385 Author: weizhichen <weizhichen@microsoft.com> Date: Thu Mar 30 02:08:51 2023 +0000 fix: reduce csi driver daemon restart times commit 63e820f Author: weizhichen <weizhichen@microsoft.com> Date: Wed Mar 29 18:35:54 2023 +0000 fix pwd commit 65c8c04 Author: weizhichen <weizhichen@microsoft.com> Date: Wed Mar 29 17:44:04 2023 +0000 fix: no log print out after blob daemonset is recreated commit b9754e4 Author: weizhichen <weizhichen@microsoft.com> Date: Wed Mar 29 15:03:27 2023 +0000 fix commit a9b913d Author: weizhichen <weizhichen@microsoft.com> Date: Wed Mar 29 14:55:11 2023 +0000 fix: createvolume and initialize volumeID in beforeeach commit e6aa3a8 Author: weizhichen <weizhichen@microsoft.com> Date: Wed Mar 29 11:04:30 2023 +0000 fix: set azidentity.EnvironmentCredential for each process commit 2dd2015 Author: weizhichen <weizhichen@microsoft.com> Date: Wed Mar 29 09:19:49 2023 +0000 fix init k8s client error commit b319406 Author: weizhichen <weizhichen@microsoft.com> Date: Wed Mar 29 08:33:25 2023 +0000 fix BeforeSuite and AfterSuite commit 1496638 Author: weizhichen <weizhichen@microsoft.com> Date: Wed Mar 29 07:49:07 2023 +0000 1. use e2e.test v1.26.0 2. upgrade ginkgo to v2.9.2 to use GinkgoHelper 3. add --fast-fail commit d6d05ea Author: weizhichen <weizhichen@microsoft.com> Date: Wed Mar 29 03:49:51 2023 +0000 fix commit 6e6d71f Author: weizhichen <weizhichen@microsoft.com> Date: Wed Mar 29 03:22:14 2023 +0000 test: speed up e2e test by running in parallel

k8s-ci-robot requested review from gnufied and Jiawei0227 December 16, 2021 00:12

saikat-royc force-pushed the fix-node-stage-path branch from 2858b66 to 06a3b70 Compare December 17, 2021 07:44

jingxu97 assigned jsafrane Dec 20, 2021

saikat-royc force-pushed the fix-node-stage-path branch from 06a3b70 to d2c972a Compare December 20, 2021 22:07

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 20, 2021

k8s-ci-robot assigned jingxu97 Dec 20, 2021

jsafrane reviewed Dec 21, 2021

View reviewed changes

pkg/volume/csi/csi_attacher.go Outdated Show resolved Hide resolved

pkg/volume/csi/csi_attacher_test.go Outdated Show resolved Hide resolved

saikat-royc force-pushed the fix-node-stage-path branch from d2c972a to cb10d1b Compare December 21, 2021 18:24

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 21, 2021

nixpanic mentioned this pull request Jun 14, 2022

reclaimspace: detect Kubernetes version for right StagingTargetPath csi-addons/kubernetes-csi-addons#165

Merged

pkalever mentioned this pull request Jun 23, 2022

rbd: healer detect Kubernetes version for right StagingTargetPath ceph/ceph-csi#3207

Merged

pkalever mentioned this pull request Jun 23, 2022

csi: fix stagingpath rook/rook#10490

Merged

leiyiz mentioned this pull request Aug 4, 2022

remove const that's not used #111707

Merged

jsafrane mentioned this pull request Aug 9, 2022

Kubernetes test "multiple PV pointing to the same storage on the same node" fails kubernetes-sigs/vsphere-csi-driver#1913

Closed

jsafrane mentioned this pull request Oct 13, 2022

Add capability for tests with multiple PVs with the same VolumeHandle #113046

Merged

mattcary mentioned this pull request Dec 7, 2022

the volume is not detached after the pod and PVC objects are deleted #114207

Closed

ejweber mentioned this pull request Jan 9, 2023

Failed mount after kubernetes worker node upgrade from v1.23.15 to v1.24.9 ThinkParQ/beegfs-csi-driver#14

Closed

cvvz added a commit to cvvz/blob-csi-driver that referenced this pull request Mar 30, 2023

workaround the issue: kubernetes/kubernetes#107065

49fdec3

saikat-royc mentioned this pull request Mar 31, 2023

fix: the volume is not detached after the pod and PVC objects are deleted #116138

Merged

Chaunceyctx mentioned this pull request Apr 14, 2023

the same device is mounted twice and generates two different globalmountpath #117336

Closed

zuzzas mentioned this pull request Jul 10, 2023

[candi] CSI old mount cleaner deckhouse/deckhouse#5153

Merged

4 tasks

deckhouse-BOaTswain mentioned this pull request Jul 12, 2023

Backport: [candi] CSI old mount cleaner deckhouse/deckhouse#5180

Merged

4 tasks

LastNight1997 mentioned this pull request Oct 11, 2023

After upgrading k8s to version above 1.24, PVC is blocked in the UmountDevice stage #121134

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

change node staging path for csi driver to PV agnostic #107065

change node staging path for csi driver to PV agnostic #107065

saikat-royc commented Dec 16, 2021 •

edited

Loading

saikat-royc commented Dec 16, 2021

saikat-royc commented Dec 16, 2021

saikat-royc commented Dec 20, 2021

saikat-royc commented Dec 20, 2021

saikat-royc commented Dec 20, 2021

saikat-royc commented Dec 21, 2021

jsafrane left a comment

saikat-royc commented Dec 21, 2021

saikat-royc commented Dec 21, 2021

saikat-royc commented Dec 21, 2021

change node staging path for csi driver to PV agnostic #107065

change node staging path for csi driver to PV agnostic #107065

Conversation

saikat-royc commented Dec 16, 2021 • edited Loading

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

saikat-royc commented Dec 16, 2021

saikat-royc commented Dec 16, 2021

saikat-royc commented Dec 20, 2021

saikat-royc commented Dec 20, 2021

saikat-royc commented Dec 20, 2021

saikat-royc commented Dec 21, 2021

jsafrane left a comment

Choose a reason for hiding this comment

saikat-royc commented Dec 21, 2021

saikat-royc commented Dec 21, 2021

saikat-royc commented Dec 21, 2021

saikat-royc commented Dec 16, 2021 •

edited

Loading