Kubernetes test "multiple PV pointing to the same storage on the same node" fails #1913

jsafrane · 2022-08-09T12:42:06Z

Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug

What happened:

Running Kubernetes e2e tests, this test fails about 50% of time:

External Storage [Driver: csi.vsphere.vmware.com] [Testpattern: Dynamic PV (default fs)] provisioning should mount multiple PV pointing to the same storage on the same node

What you expected to happen:

The test passes

How to reproduce it (as minimally and precisely as possible):

Run Kubernetes 1.24 CSI tests with vSphere CSI driver.

Anything else we need to know?:

The test is quite complicated, here are individual steps:

Create PVC1 and dynamically provisions PV1 for it and run Pod1 with it.
Inspect PV1 and manually create PV2 that points to the same volume. I.e. there are two PVs with the same volumeHandle.
Create PVC2 for PV2 and run Pod2 with it, on the same node as Pod1.
Both Pod1 and Pod2 are running at this time on the same node, each with its own PVC+PV, but actually with the same volumeHandle. So far so good
Delete Pod2 and PVC2 and PV2. PV2 has reclaimPolicy: Retain, so no deletion in the storage backend should happen. Again, so far so good.
Delete Pod1.

At this time, the CSI driver is not able to detach PV1 from the node, because of this error:

    detachError:
      message: 'rpc error: code = Internal desc = volumeID "ca71413b-6f03-48c9-aaa3-533545cc2d26"
        not found in QueryVolume'

I was able to see that in step 5 (after PV2 is deleted), syncer deletes the volume from CNS:

PVDeleted: PV: ... PV2
PVDeleted: vSphere CSI Driver is deleting volume ... PV2
DeleteVolume: volumeID: "ca71413b-6f03-48c9-aaa3-533545cc2d26",
DeleteVolume: Volume deleted successfully. volumeID: "ca71413b-6f03-48c9-aaa3-533545cc2d26"
internalDeleteVolume: returns fault "" for volume "ca71413b-6f03-48c9-aaa3-533545cc2d26"

But PV1 still exists at this time and the volume is still attached to a node. The attacher is then not able to find + detach the volume.

The test was added in 1.24 in this PR to test for a regression.

Environment:

csi-vsphere version: 2.4.0, 2.6.0
vsphere-cloud-controller-manager version: n/a?
Kubernetes version: 1.24
vSphere version: 7.0.3
Install tools: OpenShift

The text was updated successfully, but these errors were encountered:

divyenpatel · 2022-08-09T18:20:56Z

This is expected behavior.
when you delete PV1 with retaining volume, we do de-register this volume from vCenter. back end VMDK and FCD is present on the vCenter, but volume is de-registered.

for PV2, after CSI full sync determines that this volume needs to be registered and container volume, it registered it, and then the volume will be available in the query volume call. until that happens detach with fail.

vSphere CSI driver does not support creating multiple PVs with the same volume handle.

We recommend customers use RWM volume if we have a use case to use the same volume across many pods.

divyenpatel · 2022-08-09T18:22:28Z

DeleteVolume: Volume deleted successfully. volumeID: "ca71413b-6f03-48c9-aaa3-533545cc2d26"

this delete is just a de-registration of volume as a Container volume. FCD and VMDK are not deleted from the back end.

vSphere CSI driver is broken: - https://bugzilla.redhat.com/show_bug.cgi?id=2106736 - kubernetes-sigs/vsphere-csi-driver#1913 The test was added in 4.11, skip it in 4.11 and newer.

) vSphere CSI driver is broken: - https://bugzilla.redhat.com/show_bug.cgi?id=2106736 - kubernetes-sigs/vsphere-csi-driver#1913 The test was added in 4.11, skip it in 4.11 and newer.

jsafrane · 2022-10-13T15:35:33Z

/close
I am making the failing test optional in kubernetes/kubernetes#113046

k8s-ci-robot · 2022-10-13T15:35:37Z

@jsafrane: Closing this issue.

In response to this:

/close
I am making the failing test optional in kubernetes/kubernetes#113046

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

jingxu97 · 2023-02-18T00:01:07Z

So should it be considered as a regression from intree to CSI because intree test can pass? Is there public document about this issue?

@divyenpatel
@jsafrane
@xing-yang

xing-yang · 2023-02-23T22:51:52Z

@jingxu97 CSI driver supports CNS volume. The volume handle in a PV points to a FCD UUID for a CSI driver. When the PV is deleted with retain policy, the volume is deregistered from CNS. Since the volume is no longer a CNS volume, detach will fail until full sync happens, as explained here.

The in-tree volume plugin does not support CNS volume. The volume handle in a PV for the in-tree plugin points to a VMDK path. When the PV is deleted with retain policy, there isn't a step to deregister from CNS.

It is not a regression. It works as expected. It is just that the in-tree plugin and CSI driver have very different architecture.

jingxu97 · 2023-02-27T19:26:40Z

I think it would be good to have a document mentioning vSphere does not support this use case "multiple PV pointing to the same storage on the same node"?

jingxu97 · 2023-02-28T19:45:31Z

I created an issue for it #2248

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Aug 9, 2022

jsafrane mentioned this issue Aug 10, 2022

SPLAT-687: Skip "multiple PV pointing to the same storage" tests on vSphere openshift/release#31209

Merged

jsafrane mentioned this issue Oct 13, 2022

Add capability for tests with multiple PVs with the same VolumeHandle kubernetes/kubernetes#113046

Merged

k8s-ci-robot closed this as completed Oct 13, 2022

gnufied mentioned this issue Jan 9, 2023

Disable multiple pv mount tests for vsphere intree driver kubernetes/kubernetes#114933

Merged

jsafrane mentioned this issue Jan 16, 2023

Bug 2106736: Fix multiplePVsSameID value in tests openshift/vmware-vsphere-csi-driver-operator#132

Merged

gnufied mentioned this issue Jan 16, 2023

Bug 2106736: Add multiplePVsSameID capability openshift/vmware-vsphere-csi-driver-operator#131

Merged

jingxu97 mentioned this issue Feb 28, 2023

CSI migration issue: multiple PV pointing to the same storage on the same node #2248

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubernetes test "multiple PV pointing to the same storage on the same node" fails #1913

Kubernetes test "multiple PV pointing to the same storage on the same node" fails #1913

jsafrane commented Aug 9, 2022

divyenpatel commented Aug 9, 2022

divyenpatel commented Aug 9, 2022 •

edited

Loading

jsafrane commented Oct 13, 2022

k8s-ci-robot commented Oct 13, 2022

jingxu97 commented Feb 18, 2023 •

edited

Loading

xing-yang commented Feb 23, 2023

jingxu97 commented Feb 27, 2023

jingxu97 commented Feb 28, 2023

Kubernetes test "multiple PV pointing to the same storage on the same node" fails #1913

Kubernetes test "multiple PV pointing to the same storage on the same node" fails #1913

Comments

jsafrane commented Aug 9, 2022

divyenpatel commented Aug 9, 2022

divyenpatel commented Aug 9, 2022 • edited Loading

jsafrane commented Oct 13, 2022

k8s-ci-robot commented Oct 13, 2022

jingxu97 commented Feb 18, 2023 • edited Loading

xing-yang commented Feb 23, 2023

jingxu97 commented Feb 27, 2023

jingxu97 commented Feb 28, 2023

divyenpatel commented Aug 9, 2022 •

edited

Loading

jingxu97 commented Feb 18, 2023 •

edited

Loading