Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle the case when watch.exitCode may return null #1366

Merged
merged 12 commits into from
Oct 10, 2023
Merged

Handle the case when watch.exitCode may return null #1366

merged 12 commits into from
Oct 10, 2023

Conversation

kuzaxak
Copy link
Contributor

@kuzaxak kuzaxak commented Apr 26, 2023

After changes introduced in #1260 we have frequent error with NullPointerException. Based on a documentation for watch.exitCode method it's clear that watch.exitCode() can return null if close is received before the exit code. In such a case, you can handle the null value and provide an appropriate default or error value.

java.lang.NullPointerException at
org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecProc.join(ContainerExecProc.java:89)
at hudson.Proc.joinWithTimeout(Proc.java:174)
at com.cloudbees.jenkins.plugins.sshagent.exec.ExecRemoteAgent.stop(ExecRemoteAgent.java:129)

Fix for JENKINS-71135

  • Make sure you are opening from a topic/feature/bugfix branch (right side) and not your main branch!
  • Ensure that the pull request title represents the desired changelog entry
  • Please describe what you did
  • Link to relevant issues in GitHub or Jira
  • Link to relevant pull requests, esp. upstream and downstream changes
  • Ensure you have provided tests - that demonstrates feature works or fixes the issue

@kuzaxak kuzaxak requested a review from a team as a code owner April 26, 2023 17:08

CompletableFuture<Integer> exitCodeFuture = watch.exitCode();
if (exitCodeFuture == null) {
LOGGER.log(Level.FINEST, "Watcher return 'null' instead of exitCode");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should extend

doExec(in, !launcher.isUnix(), printStream, masks, commands);
LOGGER.log(Level.INFO, "Created process inside pod: [" + getPodName() + "], container: ["
+ containerName + "]" + "[" + TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - startMethod) + " ms]");
ContainerExecProc proc = new ContainerExecProc(watch, alive, finished, stdin);
to pass the printStream to proc so unusual issues can be printed in the build log, rather than the system log where they would only be visible to admins.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, but I don't know how to do it in a better way. Can I add an additional arg for ContainerExecProc to pass printStream and then use it to print a log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can be implemented as a separate PR to unblock the issue fix. I can prepare a PR for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I add an additional arg

That is what I was thinking.

I think it can be implemented as a separate PR

Agreed, it was just an idea.

@annaagaf
Copy link
Contributor

annaagaf commented May 5, 2023

We have been running this build and it's still throwing NPEs at exitCodeFuture.join() :

10:31:47  java.lang.NullPointerException
10:31:47  	at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecProc.join(ContainerExecProc.java:97)
10:31:47  	at hudson.Proc.joinWithTimeout(Proc.java:174)
10:31:47  	at com.cloudbees.jenkins.plugins.sshagent.exec.ExecRemoteAgent.stop(ExecRemoteAgent.java:129)

@annaagaf
Copy link
Contributor

@jglick Added a commit to address another NullPointerException issue. We will also be testing this build on our Jenkins.

@annaagaf
Copy link
Contributor

The updated version is still throwing NPEs, back to the drawing board.

10:21:32  java.lang.NullPointerException
10:21:32  	at org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecProc.join(ContainerExecProc.java:99)
10:21:32  	at hudson.Proc.joinWithTimeout(Proc.java:174)

@annaagaf
Copy link
Contributor

Updated this PR to the current state we are running. This solves NPEs and improves logging; however, this now shows the original issue, which is with ssh-agent plugin (ssh-agent -k failing at the end of pipeline). Will be patching that plugin to validate.

@jglick jglick dismissed their stale review May 24, 2023 11:42

Addressed, I guess (but please refrain from force-pushing since it makes it much harder for reviewers to track what you changed and when)

pom.xml Outdated
@@ -3,7 +3,7 @@
<parent>
<groupId>org.jenkins-ci.plugins</groupId>
<artifactId>plugin</artifactId>
<version>4.61</version>
<version>4.62</version>
Copy link
Member

@Vlatombe Vlatombe May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, for next time, avoid introducing unrelated changes, as it makes the review work harder and increases the likelihood of merge conflicts.

Version bumps like this are usually handled via automated PRs filed by Dependabot, e.g. #1382

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the note! Currently rebased to latest upstream and this bump is no longer included with PR.

@Vlatombe Vlatombe added the bug Bug Fixes label May 26, 2023
@mbecca
Copy link

mbecca commented Sep 1, 2023

Hello, does anyone have a date to include this bug fix?

kuzaxak and others added 9 commits September 6, 2023 09:30
After changes introduced in [PR][1] we have frequent error with
`NullPointerException`. Based on a [documentation][2] for
`watch.exitCode` method it's clear that watch.exitCode() can return null
if close is received before the exit code. In such a case, you can
handle the null value and provide an appropriate default or error value.

```
java.lang.NullPointerException at
org.csanchez.jenkins.plugins.kubernetes.pipeline.ContainerExecProc.join(ContainerExecProc.java:89)
at hudson.Proc.joinWithTimeout(Proc.java:174)
at com.cloudbees.jenkins.plugins.sshagent.exec.ExecRemoteAgent.stop(ExecRemoteAgent.java:129)
```

Fix for JENKINS-71135

[1]: #1260
[2]: https://github.com/fabric8io/kubernetes-client/blob/8a879128d893627a40d64cf2a9c7d5da2e7f47d2/kubernetes-client-api/src/main/java/io/fabric8/kubernetes/client/dsl/ExecWatch.java#L68-L82
Completable future can be completed exceptionally with
NullPointerException, in which case join() would re-throw this exception.
Replacing join() with get(), which throws ExecutionException instead,
which wraps the original exception. This exception is caught and handled
appropriately.
Completable future can be completed exceptionally with
NullPointerException, in which case join() would re-throw this exception.
Replacing join() with get(), which throws ExecutionException instead,
which wraps the original exception. This exception is caught and handled
appropriately.
Adding logging of errors in build log for
improved visibility.
…/ContainerExecProc.java

Co-authored-by: Vincent Latombe <vincent@latombe.net>
…/ContainerExecProc.java

Co-authored-by: Vincent Latombe <vincent@latombe.net>
@anna-agafonova
Copy link
Contributor

@Vlatombe @jglick addressed the PR comments, please re-review

return watch.exitCode().join();

CompletableFuture<Integer> exitCodeFuture = watch.exitCode();
Integer exitCode = exitCodeFuture.get();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this not contradict the PR title, that it is possible exitCodeFuture == null? Either the code or the PR title is incorrect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Docs are ambiguous whether null refers to CompletableFuture<Integer>, or the value of the CompletableFuture.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal here is to handle all possible outcomes (null exitCode is handled in this block, null exitCodeFuture in catchall block, checked ExecutionException that can be thrown by get() in catch block).
The PR title refers to the original issue that's causing NPEs on current master when exitCode is null, but what would you suggest as a better naming/solution here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null exitCodeFuture in catchall block

If you mean in the catch (Exception block, then yes, but code should never knowingly be catching NullPointerException; it should check for null. Conversely, if you were relying on arbitrary unknown exceptions, you might as well keep it simple

try {
    return watch.exitCode().join();
} catch (Exception x) {
    LOGGER.log(Level.FINEST, "Exception occurred while waiting for exit code", x);
    return -1;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, agreed - how about now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jglick could you please re-review?

…/ContainerExecProc.java

Co-authored-by: Jesse Glick <jglick@cloudbees.com>
return watch.exitCode().join();

CompletableFuture<Integer> exitCodeFuture = watch.exitCode();
Integer exitCode = exitCodeFuture.get();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null exitCodeFuture in catchall block

If you mean in the catch (Exception block, then yes, but code should never knowingly be catching NullPointerException; it should check for null. Conversely, if you were relying on arbitrary unknown exceptions, you might as well keep it simple

try {
    return watch.exitCode().join();
} catch (Exception x) {
    LOGGER.log(Level.FINEST, "Exception occurred while waiting for exit code", x);
    return -1;
}

@jglick jglick requested a review from Vlatombe October 4, 2023 20:45
@Vlatombe Vlatombe merged commit 2da8e27 into jenkinsci:master Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants