Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18406][CORE][Backport-2.0] Race between end-of-task and completion iterator read lock release #18096

Closed

Conversation

jiangxb1987
Copy link
Contributor

@jiangxb1987 jiangxb1987 commented May 24, 2017

This is a backport PR of #18076 to 2.0 and 2.1.

What changes were proposed in this pull request?

When a TaskContext is not propagated properly to all child threads for the task, just like the reported cases in this issue, we fail to get to TID from TaskContext and that causes unable to release the lock and assertion failures. To resolve this, we have to explicitly pass the TID value to the unlock method.

How was this patch tested?

Add new failing regression test case in RDDSuite.

…read lock release

When a TaskContext is not propagated properly to all child threads for the task, just like the reported cases in this issue, we fail to get to TID from TaskContext and that causes unable to release the lock and assertion failures. To resolve this, we have to explicitly pass the TID value to the `unlock` method.

Add new failing regression test case in `RDDSuite`.

Author: Xingbo Jiang <xingbo.jiang@databricks.com>

Closes apache#18076 from jiangxb1987/completion-iterator.
@jiangxb1987
Copy link
Contributor Author

cc @gatorsmile @cloud-fan

@SparkQA
Copy link

SparkQA commented May 24, 2017

Test build #77308 has finished for PR 18096 at commit c85afb2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

asfgit pushed a commit that referenced this pull request May 24, 2017
…tion iterator read lock release

This is a backport PR of  #18076 to 2.0 and 2.1.

## What changes were proposed in this pull request?

When a TaskContext is not propagated properly to all child threads for the task, just like the reported cases in this issue, we fail to get to TID from TaskContext and that causes unable to release the lock and assertion failures. To resolve this, we have to explicitly pass the TID value to the `unlock` method.

## How was this patch tested?

Add new failing regression test case in `RDDSuite`.

Author: Xingbo Jiang <xingbo.jiang@databricks.com>

Closes #18096 from jiangxb1987/completion-iterator-2.0.
@gatorsmile
Copy link
Member

Thanks! Merging to 2.0. Could you close it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants