-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
e2e: cannot delete remaining Elemental cluster after uninstallation of operator #515
Comments
Tested this morning: as a workaround the operator can be reinstalled (crds+operator) and the deletion is finished. Operator can be uninstalled then. Even if it's better to remove all Elemental resources before uninstalling the operator I think it's good to be able to remove remaining resources after the uninstallation, And it worked before. |
Does it still happen on Rancher HEAD (aka 2.8.0) ? Then we might need to open an issue on rancher/rancher 🤔 |
The educated guess is that the A manual workaround would be to either reinstall the We have at least 2 ways to fix this:
Option n.2 would be better since it does not rely on Helm, however consider this operation may take some long time. |
After more tests I can confirm that on Rancher Manager HEAD version the issue mainly happens because the |
I have been thinking about it and I struggle to find a good solution. Generally speaking I consider not a good practice to delete CRs on a I would expect resources to fully disappear with the second call The other problem of the OnShutdown strategy is that it would still require some sort of external signal for uninstall shutdown (having the option to flag cleanup or not) so we can be sure it is only executed for uninstalls and not on pod restarts (some spurious unwanted deletion would be dramatic). So my suggestion would be to actually have a cleanup command and apply it as a pre-uninstall step in crds chart. I think it is absolutely safe to state that if one uninstalls the crds chart the expectation is that any elemental resources including resource definitions are deleted. |
@ldevulder note witht he change from #553 the work around you implemented is no longer a workaround and it should be the way to go. Now if trying to reinstall with pending deletions due to machineinventory leftovers are there it will just fail. I wonder if it would make sense testing the sequence:
I think is almost the current case, just that we are not validating the reinstall failure and the finalizers removal is done as a parallel thread of the tests, while probably it should be part of the test sequence. What you think? does it make sense? |
@davidcassany yes it could be implemented to validate that the re-installation is failing "as expected". I opened issue rancher/elemental#1075 to track this in CI. |
It happens on Elemental CI, for example: https://github.com/rancher/elemental/actions/runs/6068163183/job/16460773463.
How to reproduce:
elemental-operator
Dev version$ kubectl -n fleet-default delete cluster cluster-k3s cluster.provisioning.cattle.io "cluster-k3s" deleted [blocked...]
I saw that

MachineInventories
are still present but inRemoving
state forever:Please note that it ONLY HAPPEN ON RANCHER MANAGER HEAD VERSION (2.7.7)!. I don't have this issue in Rancher Manager Stable (2.7.6). I know that 2.7.7-dev includes some new stuff for CAPI (but I don't know what exactly).
The text was updated successfully, but these errors were encountered: