Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-73224] Do not try to create a pod which already exists #1561

Merged
merged 5 commits into from
May 29, 2024

Conversation

amuniz
Copy link
Member

@amuniz amuniz commented May 28, 2024

See JENKINS-73224 for details.

Easier to review hiding whitespace changes: https://github.com/jenkinsci/kubernetes-plugin/pull/1561/files?w=1

Testing done

Automated test added.

Submitter checklist

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests - that demonstrates feature works or fixes the issue

@amuniz amuniz requested a review from a team as a code owner May 28, 2024 13:36
.build());
}
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the fix this test fails with the log in the linked ticket.

// the pod might already exist and the creating logic must be skipped.
Pod existingPod =
client.pods().inNamespace(namespace).withName(podName).get();
if (existingPod == null) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything inside this if branch is untouched (except formatting, which I was forced to do by Spotless)

Copy link
Member

@Vlatombe Vlatombe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Vlatombe Vlatombe added the bug Bug Fixes label May 28, 2024
@Vlatombe Vlatombe enabled auto-merge (squash) May 28, 2024 15:28
listener.getLogger().printf("Created Pod: %s %s/%s%n", cloudName, namespace, podName);
Metrics.metricRegistry().counter(MetricNames.PODS_CREATED).inc();

node.getRunListener().getLogger().printf("Created Pod: %s %s/%s%n", cloudName, namespace, podName);
kubernetesComputer.setLaunching(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we fix this field please, so it is not persisting an AtomicBoolean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it need to be an AtomicBoolean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see #1479 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3b78a0b

@jglick jglick disabled auto-merge May 28, 2024 15:33
@jglick
Copy link
Member

jglick commented May 28, 2024

(disabling automerge while I review too)

@@ -68,7 +67,7 @@ public class KubernetesLauncher extends JNLPLauncher {

private static final Logger LOGGER = Logger.getLogger(KubernetesLauncher.class.getName());

private final AtomicBoolean launched = new AtomicBoolean(false);
private volatile boolean launched = false;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR: this is not directly related or needed for the goal of this PR, but it was requested by @jglick in a review.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(but consider #1561 (comment))

Comment on lines +193 to +194
LOGGER.log(INFO, () -> "Pod already exists: " + cloudName + " " + namespace + "/" + podName);
listener.getLogger().printf("Pod already exists: %s %s/%s%n", cloudName, namespace, podName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So how does this actually differ from

if (launched) {
LOGGER.log(INFO, "Agent has already been launched, activating: {0}", node.getNodeName());
computer.setAcceptingTasks(true);
return;
}
and do we even still need the launching field if we have this logic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is another face of the same behaviour, just under a slightly different timing (a lot narrower than the one being fixed in this PR).

This PR catches the case there the controller is restarted between the pod creating call and the end of the waiting for loop. While the existing launched check would only catch a restart right after the node save and the end of the method, which is quite unlikely to happen, but possible I guess.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we even still need the launching field if we have this logic?

Not sure. https://github.com/amuniz/kubernetes-plugin/blob/f10083a42e708a6bff990c6ccb306ce618b8bccd/src/main/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesComputer.java#L201 is never used from this plugin sources, but it's public and it could be used somewhere else, so I would stay on the safe side and keep it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is never used from this plugin sources, but it's public and it could be used somewhere else

Well, if it is not used it can be deleted. (I usually just do a GH org-wide search.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then there is stuff like https://github.com/amuniz/kubernetes-plugin/blob/3b78a0ba311af4c091b94c070b9a8c8b6ab6768b/src/main/java/org/csanchez/jenkins/plugins/kubernetes/KubernetesLauncher.java#L199-L200 which it would not make sense to do twice, but there is no current defense against race conditions. It seems there are deeper bugs waiting here.

// restart
rjr.stopJenkins();
rjr.startJenkins();
// update k8s to make a node suitable to schedule (add disktype=special to the node)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little scary to have the test even temporarily patch the Node! Would be safer to do something to the namespace like a quota 🤔 Fine enough for Kind/CI, could just become a problem if we try to run tests on a permanent cluster.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the easiest way I found, and with the finally it's granted to be back to the original state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So long as the JVM terminates normally, yes.

import org.jvnet.hudson.test.JenkinsRule;
import org.jvnet.hudson.test.RealJenkinsRule;

public class AssertBuildLogMessage implements RealJenkinsRule.Step {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW you can also make these things static methods in some util class; a bit less boilerplate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but less reusable then...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is that less reusable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When writing a new test it seems natural to go to the steps package and look for generic steps to use in the new test, however inspecting other test classes looking for static methods seems less discoverable to me at least. That will likely end up with re-writing the same (or very similar) static methods in multiple test classes.

Copy link
Member

@jglick jglick May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inspecting other test classes looking for static methods

I meant reusable static methods in a utility class in the steps package. Or just a single class in the main test package called Steps. Etc.

Copy link
Member

@jglick jglick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks to fix the stated bug. I am pretty sure there are other things wrong here but perhaps they rarely matter.

@Vlatombe Vlatombe merged commit c06f753 into jenkinsci:master May 29, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants