[SPARK-28699][SQL] Disable using radix sort for ShuffleExchangeExec in repartition case #640
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Disable using radix sort in ShuffleExchangeExec when we do repartition.
In apache#20393, we fixed the indeterministic result in the shuffle repartition case by performing a local sort before repartitioning.
But for the newly added sort operation, we use radix sort which is wrong because binary data can't be compared by only the prefix. This makes the sort unstable and fails to solve the indeterminate shuffle output problem.
Why are the changes needed?
Fix the correctness bug caused by repartition after a shuffle.
Does this PR introduce any user-facing change?
Yes, user will get the right result in the case of repartition stage rerun.
How was this patch tested?
Test with
local-cluster[5, 2, 5120]
, use the integrated test below, it can return a right answer 100000000.Closes apache#25491 from xuanyuanking/SPARK-28699-fix.
Authored-by: Yuanjian Li xyliyuanjian@gmail.com
Signed-off-by: Dongjoon Hyun dhyun@apple.com
This cherry-picks apache#25491