Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3741] Make ConnectionManager propagate errors properly and add mo... #2593

Closed
wants to merge 3 commits into from
Closed

Conversation

zsxwing
Copy link
Member

@zsxwing zsxwing commented Sep 30, 2014

...re logs to avoid Executors swallowing errors

This PR made the following changes:

  • Register a callback to Connection so that the error will be propagated properly.
  • Add more logs so that the errors won't be swallowed by Executors.
  • Use trySuccess/tryFailure because Promise doesn't allow to call success/failure more than once.

… more logs to avoid Executors swallowing errors
@SparkQA
Copy link

SparkQA commented Sep 30, 2014

QA tests have started for PR 2593 at commit 764aec5.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 30, 2014

QA tests have finished for PR 2593 at commit 764aec5.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21035/

@zsxwing
Copy link
Member Author

zsxwing commented Sep 30, 2014

retest this please. This test is OK in my machine.

@SparkQA
Copy link

SparkQA commented Sep 30, 2014

QA tests have started for PR 2593 at commit 764aec5.

  • This patch merges cleanly.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21036/

callback(this, e)
} catch {
case NonFatal(e) => {
logWarning("Ignore error", e)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change this to be more descriptive? How about something like "Ignored error in onExceptionCallback"?

@JoshRosen
Copy link
Contributor

Hi @zsxwing,

Thanks for submitting this PR (and sorry for the delayed review)! These changes will be very helpful in debugging certain types of connection manager issues that we've encountered. I like the careful attention to error-handling cases that we missed before, especially the use of afterExecute to detect unhandled errors.

I left a few comments, mostly regarding naming. If you fix up the merge conflicts, I think this looks ready to merge. Thanks!

Conflicts:
	core/src/main/scala/org/apache/spark/network/nio/Connection.scala
	core/src/main/scala/org/apache/spark/network/nio/ConnectionManager.scala

override def afterExecute(r: Runnable, t: Throwable): Unit = {
super.afterExecute(r, t)
if (t != null && NonFatal(t)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added NonFatal(t) to avoid to output fatal exceptions. It's expected that they are not be handled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change.

@zsxwing
Copy link
Member Author

zsxwing commented Oct 9, 2014

Merged and updated the naming.

@SparkQA
Copy link

SparkQA commented Oct 9, 2014

QA tests have started for PR 2593 at commit 1d5aed5.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 9, 2014

QA tests have finished for PR 2593 at commit 1d5aed5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21518/Test PASSed.

@JoshRosen
Copy link
Contributor

This looks good to me, so I'm going to merge it. Thanks a bunch; this will really help with debugging!

@asfgit asfgit closed this in 73bf3f2 Oct 9, 2014
@zsxwing zsxwing deleted the SPARK-3741 branch October 10, 2014 11:08
asfgit pushed a commit that referenced this pull request Oct 17, 2014
Sorry. I found that I forgot to add `afterExecute` for `handleConnectExecutor` in #2593.

Author: zsxwing <zsxwing@gmail.com>

Closes #2794 from zsxwing/SPARK-3741 and squashes the following commits:

a0bc4dd [zsxwing] Add afterExecute for handleConnectExecutor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants