-
Notifications
You must be signed in to change notification settings - Fork 559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rbd-nbd: failed to validate encrypted pvc with error timed out waiting for the condition #2610
Comments
This is really being hit a lot, not constantly, but still very often. Maybe we should skip the test for now, so that PRs do not need the frequent |
@nixpanic this would be the first thing that I'm going to look at once I'm back from holidays i.e. 08th Nov. |
I do not know why we start to hit this so frequently, not sure what changed. It could be that there is a new Ceph base image with updated rbd-nbd or other components that handle failures differently. My current suspicion is that there is a problem when |
True, I did see this hit on recent PRs too. not sure what changed recently though. For ex: https://jenkins-ceph-csi.apps.ocp.ci.centos.org/blue/rest/organizations/jenkins/pipelines/mini-e2e-helm_k8s-1.22/runs/931/nodes/97/steps/100/log/?start=0 |
Thanks for sharing your understanding @nixpanic, I will take a detailed look later as mentioned. If we are not able to solve this (say by Next weekend?), we will skip this test. |
I have probably spotted the issue: ceph-csi/internal/rbd/nodeserver.go Lines 341 to 359 in b95f3cd
On line 341 a However, on line 356 a new So, either a pointer to the |
@nixpanic yes make sense. Thanks! |
On line 341 a `transaction` is created. This is passed to the deferred `undoStagingTransaction()` function when an error in the `NodeStageVolume` procedure is detected. So far, so good. However, on line 356 a new `transaction` is returned. This new `transaction` is not used for the defer call. By removing the empty `transaction` that is used in the defer call, and calling `undoStagingTransaction()` on an error of `stageTransaction()`, the code is a little simpler, and the cleanup of the transaction should be done correctly now. Fixes: ceph#2610 Signed-off-by: Niels de Vos <ndevos@redhat.com>
On line 341 a `transaction` is created. This is passed to the deferred `undoStagingTransaction()` function when an error in the `NodeStageVolume` procedure is detected. So far, so good. However, on line 356 a new `transaction` is returned. This new `transaction` is not used for the defer call. By removing the empty `transaction` that is used in the defer call, and calling `undoStagingTransaction()` on an error of `stageTransaction()`, the code is a little simpler, and the cleanup of the transaction should be done correctly now. Fixes: ceph#2610 Signed-off-by: Niels de Vos <ndevos@redhat.com>
@nixpanic @pkalever #2618 is not the fix for this issue, isn't it? defer is a safer check to do unstage in normal case what if the plugin is restarted before hitting the defer? |
We are hitting this even with #2618 as mentioned in PR comments. Not sure why cryptsetup format device is failing, which was not the case before?
Any behavioural code changes with |
If we hit any error while running the cryptosetup commands we are logging only the error message. with only error message it difficult to analyze the problem, logging the stdError will help us to check what is the problem. updates: ceph#2610 Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If we hit any error while running the cryptosetup commands we are logging only the error message. with only error message it difficult to analyze the problem, logging the stdError will help us to check what is the problem. updates: ceph#2610 Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If we hit any error while running the cryptosetup commands we are logging only the error message. with only error message it is difficult to analyze the problem, logging the stdError will help us to check what is the problem. updates: ceph#2610 Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If we hit any error while running the cryptosetup commands we are logging only the error message. with only error message it is difficult to analyze the problem, logging the stdError will help us to check what is the problem. updates: ceph#2610 Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If we hit any error while running the cryptosetup commands we are logging only the error message. with only error message it is difficult to analyze the problem, logging the stdError will help us to check what is the problem. updates: ceph#2610 Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If we hit any error while running the cryptosetup commands we are logging only the error message. with only error message it is difficult to analyze the problem, logging the stdError will help us to check what is the problem. updates: ceph#2610 Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If we hit any error while running the cryptosetup commands we are logging only the error message. with only error message it is difficult to analyze the problem, logging the stdError will help us to check what is the problem. updates: ceph#2610 Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
On line 341 a `transaction` is created. This is passed to the deferred `undoStagingTransaction()` function when an error in the `NodeStageVolume` procedure is detected. So far, so good. However, on line 356 a new `transaction` is returned. This new `transaction` is not used for the defer call. By removing the empty `transaction` that is used in the defer call, and calling `undoStagingTransaction()` on an error of `stageTransaction()`, the code is a little simpler, and the cleanup of the transaction should be done correctly now. Updates: ceph#2610 Signed-off-by: Niels de Vos <ndevos@redhat.com>
Frequently hitting ceph#2610 Also see: ceph#2629 (comment) updates: ceph#2610 Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
If we hit any error while running the cryptosetup commands we are logging only the error message. with only error message it is difficult to analyze the problem, logging the stdError will help us to check what is the problem. updates: ceph#2610 Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
On line 341 a `transaction` is created. This is passed to the deferred `undoStagingTransaction()` function when an error in the `NodeStageVolume` procedure is detected. So far, so good. However, on line 356 a new `transaction` is returned. This new `transaction` is not used for the defer call. By removing the empty `transaction` that is used in the defer call, and calling `undoStagingTransaction()` on an error of `stageTransaction()`, the code is a little simpler, and the cleanup of the transaction should be done correctly now. Updates: ceph#2610 Signed-off-by: Niels de Vos <ndevos@redhat.com>
If we hit any error while running the cryptosetup commands we are logging only the error message. with only error message it is difficult to analyze the problem, logging the stdError will help us to check what is the problem. updates: ceph#2610 Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
On line 341 a `transaction` is created. This is passed to the deferred `undoStagingTransaction()` function when an error in the `NodeStageVolume` procedure is detected. So far, so good. However, on line 356 a new `transaction` is returned. This new `transaction` is not used for the defer call. By removing the empty `transaction` that is used in the defer call, and calling `undoStagingTransaction()` on an error of `stageTransaction()`, the code is a little simpler, and the cleanup of the transaction should be done correctly now. Updates: ceph#2610 Signed-off-by: Niels de Vos <ndevos@redhat.com>
On line 341 a `transaction` is created. This is passed to the deferred `undoStagingTransaction()` function when an error in the `NodeStageVolume` procedure is detected. So far, so good. However, on line 356 a new `transaction` is returned. This new `transaction` is not used for the defer call. By removing the empty `transaction` that is used in the defer call, and calling `undoStagingTransaction()` on an error of `stageTransaction()`, the code is a little simpler, and the cleanup of the transaction should be done correctly now. Updates: ceph#2610 Signed-off-by: Niels de Vos <ndevos@redhat.com>
On line 341 a `transaction` is created. This is passed to the deferred `undoStagingTransaction()` function when an error in the `NodeStageVolume` procedure is detected. So far, so good. However, on line 356 a new `transaction` is returned. This new `transaction` is not used for the defer call. By removing the empty `transaction` that is used in the defer call, and calling `undoStagingTransaction()` on an error of `stageTransaction()`, the code is a little simpler, and the cleanup of the transaction should be done correctly now. Updates: ceph#2610 Signed-off-by: Niels de Vos <ndevos@redhat.com>
If we hit any error while running the cryptosetup commands we are logging only the error message. with only error message it is difficult to analyze the problem, logging the stdError will help us to check what is the problem. updates: ceph#2610 Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
On line 341 a `transaction` is created. This is passed to the deferred `undoStagingTransaction()` function when an error in the `NodeStageVolume` procedure is detected. So far, so good. However, on line 356 a new `transaction` is returned. This new `transaction` is not used for the defer call. By removing the empty `transaction` that is used in the defer call, and calling `undoStagingTransaction()` on an error of `stageTransaction()`, the code is a little simpler, and the cleanup of the transaction should be done correctly now. Updates: ceph#2610 Signed-off-by: Niels de Vos <ndevos@redhat.com>
On line 341 a `transaction` is created. This is passed to the deferred `undoStagingTransaction()` function when an error in the `NodeStageVolume` procedure is detected. So far, so good. However, on line 356 a new `transaction` is returned. This new `transaction` is not used for the defer call. By removing the empty `transaction` that is used in the defer call, and calling `undoStagingTransaction()` on an error of `stageTransaction()`, the code is a little simpler, and the cleanup of the transaction should be done correctly now. Updates: #2610 Signed-off-by: Niels de Vos <ndevos@redhat.com>
If we hit any error while running the cryptosetup commands we are logging only the error message. with only error message it is difficult to analyze the problem, logging the stdError will help us to check what is the problem. updates: ceph#2610 Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
If we hit any error while running the cryptosetup commands we are logging only the error message. with only error message it is difficult to analyze the problem, logging the stdError will help us to check what is the problem. updates: #2610 Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Describe the bug
e2e testing failed with
Actual results
The following errors are repeatedly reported while doing a
NodeStageVolume
call:Expected behavior
Encrypting the
/dev/nbd0
device should not fail, andNodeStageVolume
should succeed.Logs
The logs of the failed job are marked for keeping and can be found at mini-e2e_k8s-1.20/2974.
minikube logs from log system status:
The text was updated successfully, but these errors were encountered: