-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JENKINS-73224] Do not try to create a pod which already exists #1561
Conversation
.build()); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without the fix this test fails with the log in the linked ticket.
...java/org/csanchez/jenkins/plugins/kubernetes/pipeline/AbstractKubernetesPipelineRJRTest.java
Show resolved
Hide resolved
// the pod might already exist and the creating logic must be skipped. | ||
Pod existingPod = | ||
client.pods().inNamespace(namespace).withName(podName).get(); | ||
if (existingPod == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything inside this if
branch is untouched (except formatting, which I was forced to do by Spotless)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
listener.getLogger().printf("Created Pod: %s %s/%s%n", cloudName, namespace, podName); | ||
Metrics.metricRegistry().counter(MetricNames.PODS_CREATED).inc(); | ||
|
||
node.getRunListener().getLogger().printf("Created Pod: %s %s/%s%n", cloudName, namespace, podName); | ||
kubernetesComputer.setLaunching(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we fix this field please, so it is not persisting an AtomicBoolean
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it need to be an AtomicBoolean
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see #1479 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 3b78a0b
(disabling automerge while I review too) |
@@ -68,7 +67,7 @@ public class KubernetesLauncher extends JNLPLauncher { | |||
|
|||
private static final Logger LOGGER = Logger.getLogger(KubernetesLauncher.class.getName()); | |||
|
|||
private final AtomicBoolean launched = new AtomicBoolean(false); | |||
private volatile boolean launched = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FTR: this is not directly related or needed for the goal of this PR, but it was requested by @jglick in a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(but consider #1561 (comment))
LOGGER.log(INFO, () -> "Pod already exists: " + cloudName + " " + namespace + "/" + podName); | ||
listener.getLogger().printf("Pod already exists: %s %s/%s%n", cloudName, namespace, podName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So how does this actually differ from
kubernetes-plugin/src/main/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesLauncher.java
Lines 106 to 110 in 3b78a0b
if (launched) { | |
LOGGER.log(INFO, "Agent has already been launched, activating: {0}", node.getNodeName()); | |
computer.setAcceptingTasks(true); | |
return; | |
} |
launching
field if we have this logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is another face of the same behaviour, just under a slightly different timing (a lot narrower than the one being fixed in this PR).
This PR catches the case there the controller is restarted between the pod creating call and the end of the waiting for loop. While the existing launched
check would only catch a restart right after the node save and the end of the method, which is quite unlikely to happen, but possible I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we even still need the launching field if we have this logic?
Not sure. https://github.com/amuniz/kubernetes-plugin/blob/f10083a42e708a6bff990c6ccb306ce618b8bccd/src/main/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesComputer.java#L201 is never used from this plugin sources, but it's public and it could be used somewhere else, so I would stay on the safe side and keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is never used from this plugin sources, but it's public and it could be used somewhere else
Well, if it is not used it can be deleted. (I usually just do a GH org-wide search.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then there is stuff like https://github.com/amuniz/kubernetes-plugin/blob/3b78a0ba311af4c091b94c070b9a8c8b6ab6768b/src/main/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesLauncher.java#L199-L200 which it would not make sense to do twice, but there is no current defense against race conditions. It seems there are deeper bugs waiting here.
// restart | ||
rjr.stopJenkins(); | ||
rjr.startJenkins(); | ||
// update k8s to make a node suitable to schedule (add disktype=special to the node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little scary to have the test even temporarily patch the Node
! Would be safer to do something to the namespace like a quota 🤔 Fine enough for Kind/CI, could just become a problem if we try to run tests on a permanent cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the easiest way I found, and with the finally
it's granted to be back to the original state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So long as the JVM terminates normally, yes.
import org.jvnet.hudson.test.JenkinsRule; | ||
import org.jvnet.hudson.test.RealJenkinsRule; | ||
|
||
public class AssertBuildLogMessage implements RealJenkinsRule.Step { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW you can also make these things static
methods in some util class; a bit less boilerplate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but less reusable then...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is that less reusable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When writing a new test it seems natural to go to the steps
package and look for generic steps to use in the new test, however inspecting other test classes looking for static methods seems less discoverable to me at least. That will likely end up with re-writing the same (or very similar) static methods in multiple test classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inspecting other test classes looking for static methods
I meant reusable static methods in a utility class in the steps
package. Or just a single class in the main test package called Steps
. Etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks to fix the stated bug. I am pretty sure there are other things wrong here but perhaps they rarely matter.
See JENKINS-73224 for details.
Easier to review hiding whitespace changes: https://github.com/jenkinsci/kubernetes-plugin/pull/1561/files?w=1
Testing done
Automated test added.
Submitter checklist