Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd: call undoStagingTransaction() when NodeStageVolume() fails #2618

Merged
merged 1 commit into from
Nov 17, 2021

Conversation

nixpanic
Copy link
Member

@nixpanic nixpanic commented Nov 3, 2021

On line 341 a transaction is created. This is passed to the deferred
undoStagingTransaction() function when an error in the
NodeStageVolume procedure is detected. So far, so good.

However, on line 356 a new transaction is returned. This new
transaction is not used for the defer call.

By removing the empty transaction that is used in the defer call, and
calling undoStagingTransaction() on an error of stageTransaction(),
the code is a little simpler, and the cleanup of the transaction should
be done correctly now.

Updates: #2610 (does not fix it)


Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

  • /retest ci/centos/<job-name>: retest the <job-name> after unrelated
    failure (please report the failure too!)
  • /retest all: run this in case the CentOS CI failed to start/report any test
    progress or results

@nixpanic nixpanic requested review from humblec and pkalever November 3, 2021 13:22
@mergify mergify bot added the component/rbd Issues related to RBD label Nov 3, 2021
@nixpanic
Copy link
Member Author

nixpanic commented Nov 3, 2021

/retest ci/centos/k8s-e2e-external-storage/1.21

@nixpanic
Copy link
Member Author

nixpanic commented Nov 3, 2021

/retest ci/centos/k8s-e2e-external-storage/1.21

Failed due to #2264 (logs)

@pkalever
Copy link

pkalever commented Nov 8, 2021

https://jenkins-ceph-csi.apps.ocp.ci.centos.org/job/mini-e2e-helm_k8s-1.20/2991/display/redirect

�[1mSTEP�[0m: create a PVC and bind it to an app using rbd-nbd mounter with encryption
Nov  3 14:28:18.871: INFO: waiting for kubectl (delete -f args []) to finish
Nov  3 14:28:18.871: INFO: Running '/usr/bin/kubectl --server=https://192.168.39.242:8443 --kubeconfig=/root/.kube/config --namespace=cephcsi-e2e-6475a92e3dae delete -f -'
Nov  3 14:28:19.002: INFO: stderr: "warning: deleting cluster-scoped resources, not scoped to the provided namespace\n"
Nov  3 14:28:19.002: INFO: stdout: "storageclass.storage.k8s.io \"csi-rbd-sc\" deleted\n"
Nov  3 14:28:19.009: INFO: ExecWithOptions {Command:[/bin/sh -c ceph fsid] Namespace:rook-ceph PodName:rook-ceph-tools-7467d8bf8-tns74 ContainerName:rook-ceph-tools Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}
Nov  3 14:28:19.009: INFO: >>> kubeConfig: /root/.kube/config
Nov  3 14:28:21.239: INFO: Waiting up to &PersistentVolumeClaim{ObjectMeta:{rbd-pvc  rbd-3274    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] []  []},Spec:PersistentVolumeClaimSpec{AccessModes:[ReadWriteOnce],Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{storage: {{1073741824 0} {<nil>} 1Gi BinarySI},},},VolumeName:,Selector:nil,StorageClassName:*csi-rbd-sc,VolumeMode:nil,DataSource:nil,DataSourceRef:nil,},Status:PersistentVolumeClaimStatus{Phase:,AccessModes:[],Capacity:ResourceList{},Conditions:[]PersistentVolumeClaimCondition{},},} to be in Bound state
Nov  3 14:28:21.239: INFO: waiting for PVC rbd-pvc (0 seconds elapsed)
Nov  3 14:28:23.242: INFO: waiting for PVC rbd-pvc (2 seconds elapsed)
Nov  3 14:28:23.249: INFO: Waiting for PV pvc-61377b8e-3394-419b-a1a4-cc59fff74d10 to bind to PVC rbd-pvc
Nov  3 14:28:23.249: INFO: Waiting up to timeout=10m0s for PersistentVolumeClaims [rbd-pvc] to have phase Bound
Nov  3 14:28:23.251: INFO: PersistentVolumeClaim rbd-pvc found and phase=Bound (2.528273ms)
Nov  3 14:28:23.251: INFO: Waiting up to 10m0s for PersistentVolume pvc-61377b8e-3394-419b-a1a4-cc59fff74d10 to have phase Bound
Nov  3 14:28:23.254: INFO: PersistentVolume pvc-61377b8e-3394-419b-a1a4-cc59fff74d10 found and phase=Bound (2.293117ms)
Nov  3 14:28:23.265: INFO: Waiting up to csi-rbd-demo-pod to be in Running state
Nov  3 14:38:23.275: FAIL: failed to validate encrypted pvc with error timed out waiting for the condition

https://jenkins-ceph-csi.apps.ocp.ci.centos.org/blue/organizations/jenkins/mini-e2e_k8s-1.20/detail/mini-e2e_k8s-1.20/3016/pipeline

�[1mSTEP�[0m: create a PVC and bind it to an app using rbd-nbd mounter with encryption
Nov  3 14:16:26.795: INFO: waiting for kubectl (delete -f args []) to finish
Nov  3 14:16:26.795: INFO: Running '/usr/bin/kubectl --server=https://192.168.39.147:8443 --kubeconfig=/root/.kube/config --namespace=cephcsi-e2e-b85affa4 delete -f -'
Nov  3 14:16:26.926: INFO: stderr: "warning: deleting cluster-scoped resources, not scoped to the provided namespace\n"
Nov  3 14:16:26.926: INFO: stdout: "storageclass.storage.k8s.io \"csi-rbd-sc\" deleted\n"
Nov  3 14:16:26.931: INFO: ExecWithOptions {Command:[/bin/sh -c ceph fsid] Namespace:rook-ceph PodName:rook-ceph-tools-7467d8bf8-kwzk9 ContainerName:rook-ceph-tools Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}
Nov  3 14:16:26.931: INFO: >>> kubeConfig: /root/.kube/config
Nov  3 14:16:29.121: INFO: Waiting up to &PersistentVolumeClaim{ObjectMeta:{rbd-pvc  rbd-8081    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] []  []},Spec:PersistentVolumeClaimSpec{AccessModes:[ReadWriteOnce],Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{storage: {{1073741824 0} {<nil>} 1Gi BinarySI},},},VolumeName:,Selector:nil,StorageClassName:*csi-rbd-sc,VolumeMode:nil,DataSource:nil,DataSourceRef:nil,},Status:PersistentVolumeClaimStatus{Phase:,AccessModes:[],Capacity:ResourceList{},Conditions:[]PersistentVolumeClaimCondition{},},} to be in Bound state
Nov  3 14:16:29.121: INFO: waiting for PVC rbd-pvc (0 seconds elapsed)
Nov  3 14:16:31.126: INFO: waiting for PVC rbd-pvc (2 seconds elapsed)
Nov  3 14:16:31.132: INFO: Waiting for PV pvc-16daf702-5452-4cc7-8895-f7a7569af183 to bind to PVC rbd-pvc
Nov  3 14:16:31.132: INFO: Waiting up to timeout=10m0s for PersistentVolumeClaims [rbd-pvc] to have phase Bound
Nov  3 14:16:31.134: INFO: PersistentVolumeClaim rbd-pvc found and phase=Bound (2.154951ms)
Nov  3 14:16:31.134: INFO: Waiting up to 10m0s for PersistentVolume pvc-16daf702-5452-4cc7-8895-f7a7569af183 to have phase Bound
Nov  3 14:16:31.136: INFO: PersistentVolume pvc-16daf702-5452-4cc7-8895-f7a7569af183 found and phase=Bound (2.154977ms)
Nov  3 14:16:31.150: INFO: Waiting up to csi-rbd-demo-pod to be in Running state
Nov  3 14:26:31.162: FAIL: failed to validate encrypted pvc with error timed out waiting for the condition

https://jenkins-ceph-csi.apps.ocp.ci.centos.org/blue/organizations/jenkins/mini-e2e_k8s-1.21/detail/mini-e2e_k8s-1.21/1670/pipeline

�[1mSTEP�[0m: create a PVC and Bind it to an app with journaling/exclusive-lock image-features and rbd-nbd mounter
Nov  3 14:19:26.167: INFO: waiting for kubectl (delete -f args []) to finish
Nov  3 14:19:26.167: INFO: Running '/usr/bin/kubectl --server=https://192.168.39.178:8443 --kubeconfig=/root/.kube/config --namespace=cephcsi-e2e-11d03781 delete -f -'
Nov  3 14:19:26.308: INFO: stderr: "warning: deleting cluster-scoped resources, not scoped to the provided namespace\n"
Nov  3 14:19:26.308: INFO: stdout: "storageclass.storage.k8s.io \"csi-rbd-sc\" deleted\n"
Nov  3 14:19:26.314: INFO: ExecWithOptions {Command:[/bin/sh -c ceph fsid] Namespace:rook-ceph PodName:rook-ceph-tools-7467d8bf8-x24sn ContainerName:rook-ceph-tools Stdin:<nil> CaptureStdout:true CaptureStderr:true PreserveWhitespace:true Quiet:false}
Nov  3 14:19:26.314: INFO: >>> kubeConfig: /root/.kube/config
Nov  3 14:19:28.514: INFO: Waiting up to &PersistentVolumeClaim{ObjectMeta:{rbd-pvc  rbd-8081    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] []  []},Spec:PersistentVolumeClaimSpec{AccessModes:[ReadWriteOnce],Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{storage: {{1073741824 0} {<nil>} 1Gi BinarySI},},},VolumeName:,Selector:nil,StorageClassName:*csi-rbd-sc,VolumeMode:nil,DataSource:nil,DataSourceRef:nil,},Status:PersistentVolumeClaimStatus{Phase:,AccessModes:[],Capacity:ResourceList{},Conditions:[]PersistentVolumeClaimCondition{},},} to be in Bound state
Nov  3 14:19:28.514: INFO: waiting for PVC rbd-pvc (0 seconds elapsed)
Nov  3 14:19:30.517: INFO: waiting for PVC rbd-pvc (2 seconds elapsed)
Nov  3 14:19:30.523: INFO: Waiting for PV pvc-9a363347-4b34-408b-b354-618e5fb8a158 to bind to PVC rbd-pvc
Nov  3 14:19:30.523: INFO: Waiting up to timeout=10m0s for PersistentVolumeClaims [rbd-pvc] to have phase Bound
Nov  3 14:19:30.528: INFO: PersistentVolumeClaim rbd-pvc found and phase=Bound (4.704971ms)
Nov  3 14:19:30.528: INFO: Waiting up to 10m0s for PersistentVolume pvc-9a363347-4b34-408b-b354-618e5fb8a158 to have phase Bound
Nov  3 14:19:30.530: INFO: PersistentVolume pvc-9a363347-4b34-408b-b354-618e5fb8a158 found and phase=Bound (2.335935ms)
Nov  3 14:19:30.541: INFO: Waiting up to csi-rbd-demo-pod to be in Running state
Nov  3 14:29:30.548: FAIL: failed to validate pvc and application binding with error timed out waiting for the condition

@nixpanic
Copy link
Member Author

nixpanic commented Nov 9, 2021

This does not seem to fix #2610, but maybe it is still an improvement?

@nixpanic nixpanic requested a review from a team November 9, 2021 08:14
@nixpanic
Copy link
Member Author

@Mergifyio rebase

@mergify
Copy link
Contributor

mergify bot commented Nov 16, 2021

rebase

✅ Branch has been successfully rebased

Copy link

@pkalever pkalever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pkalever
Copy link

Fails
Fails ci/centos/mini-e2e/k8s-1.20

�[1mSTEP�[0m: create rbd clones in different pool
[...]
Nov 16 13:24:42.415: INFO: Waiting up to rbd-32745 to be in Running state
Nov 16 13:34:42.428: INFO: failed to create PVC and application (rbd-32740): timed out waiting for the condition
Nov 16 13:34:42.428: INFO: failed to create PVC and application (rbd-32741): timed out waiting for the condition
Nov 16 13:34:42.428: INFO: failed to create PVC and application (rbd-32742): timed out waiting for the condition
Nov 16 13:34:42.428: INFO: failed to create PVC and application (rbd-32743): timed out waiting for the condition
Nov 16 13:34:42.428: INFO: failed to create PVC and application (rbd-32744): timed out waiting for the condition
Nov 16 13:34:42.428: INFO: failed to create PVC and application (rbd-32745): timed out waiting for the condition
Nov 16 13:34:42.428: INFO: failed to create PVC and application (rbd-32746): timed out waiting for the condition
Nov 16 13:34:42.428: INFO: failed to create PVC and application (rbd-32747): timed out waiting for the condition
Nov 16 13:34:42.428: INFO: failed to create PVC and application (rbd-32748): timed out waiting for the condition
Nov 16 13:34:42.428: INFO: failed to create PVC and application (rbd-32749): timed out waiting for the condition
Nov 16 13:34:42.428: FAIL: failed to validate clones in different pool with error creating PVCs and applications failed, 10 errors were logged

https://jenkins-ceph-csi.apps.ocp.ci.centos.org/blue/organizations/jenkins/mini-e2e_k8s-1.20/detail/mini-e2e_k8s-1.20/3138/pipeline

@nixpanic
Copy link
Member Author

ci/centos/mini-e2e/k8s-1.20 failed with logs:

    Nov 16 13:34:42.428: failed to validate clones in different pool with error creating PVCs and applications failed, 10 errors were logged

@nixpanic nixpanic added the ci/retry/e2e Label to retry e2e retesting on approved PR's label Nov 17, 2021
@nixpanic
Copy link
Member Author

@Mergifyio rebase

@mergify
Copy link
Contributor

mergify bot commented Nov 17, 2021

rebase

✅ Branch has been successfully rebased

@nixpanic
Copy link
Member Author

@Mergifyio rebase

@mergify
Copy link
Contributor

mergify bot commented Nov 17, 2021

rebase

✅ Branch has been successfully rebased

@github-actions
Copy link

/retest ci/centos/mini-e2e/k8s-1.21

@github-actions
Copy link

@nixpanic "ci/centos/mini-e2e/k8s-1.21" test failed. Logs are available at location for debugging

@github-actions
Copy link

/retest ci/centos/mini-e2e/k8s-1.21

@github-actions
Copy link

@nixpanic "ci/centos/mini-e2e/k8s-1.21" test failed. Logs are available at location for debugging

@github-actions
Copy link

/retest ci/centos/mini-e2e/k8s-1.21

@github-actions
Copy link

@nixpanic "ci/centos/mini-e2e/k8s-1.21" test failed. Logs are available at location for debugging

@github-actions
Copy link

/retest ci/centos/mini-e2e/k8s-1.21

@github-actions
Copy link

@nixpanic "ci/centos/mini-e2e/k8s-1.21" test failed. Logs are available at location for debugging

@github-actions
Copy link

/retest ci/centos/mini-e2e/k8s-1.21

@github-actions
Copy link

@nixpanic "ci/centos/mini-e2e/k8s-1.21" test failed. Logs are available at location for debugging

On line 341 a `transaction` is created. This is passed to the deferred
`undoStagingTransaction()` function when an error in the
`NodeStageVolume` procedure is detected. So far, so good.

However, on line 356 a new `transaction` is returned. This new
`transaction` is not used for the defer call.

By removing the empty `transaction` that is used in the defer call, and
calling `undoStagingTransaction()` on an error of `stageTransaction()`,
the code is a little simpler, and the cleanup of the transaction should
be done correctly now.

Updates: ceph#2610
Signed-off-by: Niels de Vos <ndevos@redhat.com>
@mergify mergify bot merged commit 7e22180 into ceph:devel Nov 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/retry/e2e Label to retry e2e retesting on approved PR's component/rbd Issues related to RBD
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants