-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elemental process is OOM killed during OS upgrade #865
Comments
I have done a couple of tests: from staging downgrade to stable and then from stable upgrade to staging. Both of them worked smooth with 2GiB of memory. However when trying to upgrade from staging to dev I saw the issue. Hence I'd say this is a problem only happening on In any case, I'd say the good news are this is not affecting staging and stable. Seams to be a regression on Dev, and according to the changes we did I'd say this is a regression on elemental-cli. |
Needing a final confirmation, but this should be already fixed in @ldevulder was this consistently (or nearly consistent) failing in Dev tests? can we relay on automated tests to verify this is fixed? |
the current automated tests use a workaround by using 8GB of RAM. But I can easy test it on my lab with 2 or 3GB as soon as it is integrated in Dev image. |
This should be already integrated in Dev, I'd say if a manual test with 2GB passes then we should revert the workaround and close this bug. |
Ok, I will quickly check after lunch manually. |
After manual tests I can confirm that the PR mentioned fix the issue. |
What steps did you take and what happened:
Active
stateelemental
process)What did you expect to happen:
Upgrade to be done without any issue. As nothing is deployed on the cluster it's very weird that 3GB is not enough to perform an OS upgrade.
Anything else you would like to add:
I tried directly with
elemental upgrade
command but I wasn't able to reproduce the issue. It happens only when the OS upgrade is triggered with Rancher Manager.With K3s v1.25.7+k3s1 behavior is a little bit different: I don't always get an OOM kill but some processes are core dumped.EDIT: after re-configuring my lab from scratch I can confirm that I'm able to get an OOM kill even with v1.25.7+k3s1.
In Elemental docs there is nothing about minimal RAM value, so if I check for K3s I found 512MB as the minimal value and 1GB recommended. For SLE Micro I found 1GB as the minimal value. So it seems that 1GB should be enough but it's clearly not.
The CI used 3GB without any issue until recently. During my manual tests I was able to run an OS upgrade on a node with 4GB without any issue, but if I create a cluster of 3 nodes with 4GB RAM for each I can still see sporadic OOM kills (same with 6GB of RAM, but it appears less in that case).
So, even if the minimal values are, in my opinion, too small, it seems that we have some weird issues with the memory.
Environment:
cat /etc/os-release
): Dev for operator and Stable for ISOkubectl version
): v1.24.10+k3s1 and v1.25.7+k3s1Please find attached some logs I was able to catch (not easy after an OOM kill).
elemental_oom_kill_2gb_ram.log
elemental_oom_kill_3gb_ram.log
node-55392308-8b8f-4a04-b3b9-6dc1eed55689-2023-06-08T084512Z.tar.gz
The text was updated successfully, but these errors were encountered: