-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-11655] [core] Fix deadlock in handling of launcher stop(). #9633
Conversation
The stop() callback was trying to close the launcher connection in the same thread that handles connection data, which ended up causing a deadlock. So avoid that by dispatching the stop() request in its own thread. On top of that, add some exception safety to a few parts of the code, and use "destroyForcibly" from Java 8 if it's available, to force kill the child process. The flip side is that "kill()" may not actually work if running Java 7.
Test build #45656 has finished for PR 9633 at commit
|
I can confirm that this seems to fix the problem when running locally. |
Based on http://bugs.java.com/view_bug.do?bug_id=4073195, it sounds like many *nix implementations of Line 1580 in b8ff688
I commented out the |
@@ -102,8 +103,20 @@ public synchronized void kill() { | |||
disconnect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was initially worried that this needs to be in a try
block but it doesn't look like disconnect()
is capable of throwing any exceptions.
Note that the fix is NOT about whether With the deadlock out of the way, calling |
@JoshRosen do you have any extra feedback here? I'll push the change otherwise. |
Merging to master / 1.6, we can do post-review later if needed. |
The stop() callback was trying to close the launcher connection in the same thread that handles connection data, which ended up causing a deadlock. So avoid that by dispatching the stop() request in its own thread. On top of that, add some exception safety to a few parts of the code, and use "destroyForcibly" from Java 8 if it's available, to force kill the child process. The flip side is that "kill()" may not actually work if running Java 7. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes #9633 from vanzin/SPARK-11655. (cherry picked from commit 767d288) Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
Sorry for the late / flaky review replies; I've been home sick with strep throat and spent most of the day asleep. This seems fine to me. |
Maybe I'm overlooking something really obvious, but I think it's pretty hard to spot the circular wait condition which led to the deadlock. For posterity, could you post a brief description of the participants in that cycle? |
The stop() callback was trying to close the launcher connection in the same thread that handles connection data, which ended up causing a deadlock. So avoid that by dispatching the stop() request in its own thread. On top of that, add some exception safety to a few parts of the code, and use "destroyForcibly" from Java 8 if it's available, to force kill the child process. The flip side is that "kill()" may not actually work if running Java 7. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes apache#9633 from vanzin/SPARK-11655.
|
The stop() callback was trying to close the launcher connection in the
same thread that handles connection data, which ended up causing a
deadlock. So avoid that by dispatching the stop() request in its own
thread.
On top of that, add some exception safety to a few parts of the code,
and use "destroyForcibly" from Java 8 if it's available, to force
kill the child process. The flip side is that "kill()" may not actually
work if running Java 7.