Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KubernetesPipelineTest.errorPod flake revealed bug in Reaper #1417

Merged
merged 1 commit into from
Aug 15, 2023

Conversation

jglick
Copy link
Member

@jglick jglick commented Aug 15, 2023

While checking why #1415 did not automatically deploy, I saw that the trunk commit failed with a flake in KubernetesPipelineTest.errorPod

Expected: a string containing "jnlp -- terminated (1)"
     but: was "Started
[Pipeline] Start of Pipeline
[Pipeline] podTemplate
[Pipeline] {
[Pipeline] node
Created Pod: kubernetes kubernetes-plugin-test/error-pod-1-rpsj1-z7xj2-7nnm8
kubernetes-plugin-test/error-pod-1-rpsj1-z7xj2-7nnm8 Container jnlp was terminated (Exit Code: 1, Reason: Error)
[Pipeline] // node
[Pipeline] }
[Pipeline] // podTemplate
[Pipeline] End of Pipeline
Queue task was cancelled
org.jenkinsci.plugins.workflow.actions.ErrorAction$ErrorId: 65b59291-4c40-4b7c-925a-ea0448f6dbcc
Finished: ABORTED
"

I found that the same failure could be reproduced easily:

diff --git src/test/java/org/csanchez/jenkins/plugins/kubernetes/pipeline/KubernetesPipelineTest.java src/test/java/org/csanchez/jenkins/plugins/kubernetes/pipeline/KubernetesPipelineTest.java
index f9ffdace..3df21eb8 100644
--- src/test/java/org/csanchez/jenkins/plugins/kubernetes/pipeline/KubernetesPipelineTest.java
+++ src/test/java/org/csanchez/jenkins/plugins/kubernetes/pipeline/KubernetesPipelineTest.java
@@ -505,6 +505,7 @@ public class KubernetesPipelineTest extends AbstractKubernetesPipelineTest {
     public void errorPod() throws Exception {
         r.waitForMessage("jnlp -- terminated (1)", b);
         r.waitForMessage("Foo", b);
+        Thread.sleep(5_000);
         b.doKill();
     }
 

Turns out the eager hard-kill was preventing the normal event sequence from being observed:

[Pipeline] // node
[Pipeline] }

- jnlp -- terminated (1)
-----Logs-------------
Foo

[Pipeline] // podTemplate
[Pipeline] End of Pipeline
Hard kill!
Finished: ABORTED

The log lines being asserted were introduced in #1050. It appears that #1118 prevented these from being shown in some cases, but due to the odd structure of the test this regression was not caught. Fixing the production code to print failing pod log lines before cancelling the queue item (which will abort the node step and thus also podTemplate), and improving the test to wait for the build to terminate naturally.

@jglick jglick requested a review from a team as a code owner August 15, 2023 14:22
@jglick jglick added the bug Bug Fixes label Aug 15, 2023
@jglick jglick merged commit 227c16b into jenkinsci:master Aug 15, 2023
@jglick jglick deleted the errorPod branch August 15, 2023 17:10
Copy link
Member

@Vlatombe Vlatombe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants