Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49852][SQL] Fix deadlock caused by explain string generation #48375

Closed
wants to merge 1 commit into from

Conversation

liuzqt
Copy link
Contributor

@liuzqt liuzqt commented Oct 7, 2024

What changes were proposed in this pull request?

Replace TreeNode.allChildren with identity map (which was Set previously) to avoid hashcode lazy evalution.

Why are the changes needed?

We hit a deadlock between shuffle dependency initialization and explain string generation, in which both code paths trigger lazy variable instantiation while visiting some tree nodes in different orders and thus acquiring object locks in a reversed order.

The hashcode of plan node is implemented as lazy val. So the fix is to remove hash code computation from explain string generation so to break the chain of lazy variable instantiation.

TreeNode.allChildren is only used in explain string generation and only require identity equality. This should be also a small perforamance improvement BTW.

Does this PR introduce any user-facing change?

NO

How was this patch tested?

Current UTs

Was this patch authored or co-authored using generative AI tooling?

NO

@github-actions github-actions bot added the SQL label Oct 7, 2024
@cloud-fan
Copy link
Contributor

the pyspark failure is unrelated, thanks, merging to master!

@cloud-fan cloud-fan closed this in d8aca18 Oct 8, 2024
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
### What changes were proposed in this pull request?

Replace `TreeNode.allChildren` with identity map (which was `Set` previously) to avoid hashcode lazy evalution.

### Why are the changes needed?

We hit a deadlock between shuffle dependency initialization and explain string generation, in which both code paths trigger lazy variable instantiation while visiting some tree nodes in different orders and thus acquiring object locks in a reversed order.

The hashcode of plan node is implemented as lazy val. So the fix is to remove hash code computation from explain string generation so to break the chain of lazy variable instantiation.

`TreeNode.allChildren` is only used in explain string generation and only require identity equality. This should be also a small perforamance improvement BTW.

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
Current UTs

### Was this patch authored or co-authored using generative AI tooling?
NO

Closes apache#48375 from liuzqt/SPARK-49852.

Authored-by: Ziqi Liu <ziqi.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants