-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-4105][CORE] regenerate the shuffle file when it is corrupted #12700
Conversation
Can one of the admins verify this patch? |
I get that it's just a band-aid, but it isn't solving the underlying problem right? |
Can you explain more about the problem you encountered? |
yeah, I haven't found the root-cause yet and been troubled by this problem for a long time. Any idea for this problem @srowen |
I find that some task recompute before FAILED_TO_UNCOMPRESS happened and think that something like #9610 caused this problem. @jerryshao |
What's the meaning of this? From the code you changed, looks like this corrupted file is happened in shuffle fetch, so what are you referring to "task recompute", map task or reduce task? Also it would be better to have a simple reproducible case to narrow down the problem and fix it. Otherwise I don't think current fix is quite solid. |
@jerryshao @srowen |
Since I don't meet this problem recently, so I cannot exactly tell what actually cause it, maybe race condition, maybe flush problem. Since you already have the reproducible case, why not dig into more details. |
Now ,i just know that corrupted shuffle file could caused this problem, but i do not know why shufflle file is corrupted. @jerryshao @viper-kun |
Can one of the admins verify this patch? |
I find that some task recompute before FAILED_TO_UNCOMPRESS happened,and I think that retry operation Corrupted shuffle file that caused this problem. I debug the code and corrupted the shuffle file before it has been readed, this problem happened every time.maybe we can regenerate the shuffle file when it is corrupted