-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-4467] fix elements read count for ExtrenalSorter #3302
Conversation
Can one of the admins verify this patch? |
As far as I can tell, |
Also, mind filing a JIRA for this, or, if one already exists, including the name in the title here? |
I think it's used for controlling the frequency of checking to spilling. It's used in Spillable.scala. |
I guess, the original intention of this variable is to make sure there is at least 1K records before each spilling. We saw a too many files open exception due to this variable is not being updated correctly. Of course , this is not the root cause of the issue, I currently have another working branch trying to tackle the deeper cause of this, as mentioned in https://issues.apache.org/jira/browse/SPARK-4452. But at the same time, I'm sending this PR to fix the updating the elementsRead to alleviate the problem |
That makes sense, my IDE for some reason didn't show me the usage in Spillable.scala. In that case, this change makes sense. Spilling is also based on the amount of memory taken up. Do you know what the thinking is for basing it on the number of records as well? |
From my understanding, this variable is used as a lower bound of the number of records when spilling. It's useful when the memory is really low. |
I don't entirely understand that line of argument. Why would we want to place a lower bound if the data structure is pushing the memory threshold? I filed https://issues.apache.org/jira/browse/SPARK-4456 to figure this out and document it better. This patch LGTM though. |
Yeah, I guess we need that lower bound when memory threshold is not "pushable", meaning when memory is too small and you can not acquire memory... I agree this behavior is kinda weird. Maybe it's used as "the last defense"? Please refer to https://issues.apache.org/jira/browse/SPARK-4452 why this last defense saved my svd job. I'm also working on another branch documented on SPARK-4452 to make a deeper fix for memory allocation for spillable objects |
add to whitelist |
I see, after a while we unconditionally try to spill every 32 elements regardless of whether the in-memory buffer has exceeded the spill threshold. This is a serious problem and it seems that this is just an omission in the original code since we don't ever update I think this is the first step towards fixing the too many files open issue that many are seeing. We still need to hunt down the root cause for why the lower bound for how much memory a data structure can have is not being accounted for properly. |
Test build #23515 has started for PR 3302 at commit
|
@tsdeng this patch looks good but wouldn't it be better to set it to 0 in Spillable, after it calls spill()? Then it will fix this problem in all subclasses, and you can remove the code that sets it to 0 in ExternalAppendOnlyMap. |
The code still verifies currentMemory >= myMemoryThreshold as well, right? |
@sryza yeah it does, the problem is just that myMemoryThreshold turns to 0 after you spill. The idea was to wait for at least 1000 more elements before requesting memory, but it currently doesn't, and it gets a 0 returned. |
Test build #23515 has finished for PR 3302 at commit
|
Test PASSed. |
@mateiz |
Oh, weird, it is. What about changing it to a var, any problems with that? |
Basically it is weird to have a var shared with subclasses, but I think this will be more obvious. Probably the very best way is to make elementsRead be modified only in Spillable, and add a method called addElementRead or something like that subclasses can call to tell it to increment the count. |
Ha, my bad, elementsRead is a var, but also like to keep it private or protected and make addElementRead a method to manipulate it |
Sounds good. |
Test build #23533 has started for PR 3302 at commit
|
Test build #23533 has finished for PR 3302 at commit
|
Test PASSed. |
|
||
// subclass calls this method to notify reading an element | ||
// it's used to check spilling frequency | ||
protected def addElementsRead = elementsRead += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add parentheses to this and call it with addElementsRead()
since it has side effects.
Test build #23553 has started for PR 3302 at commit
|
Test build #23553 has finished for PR 3302 at commit
|
Test PASSed. |
@@ -132,7 +130,7 @@ class ExternalAppendOnlyMap[K, V, C]( | |||
currentMap = new SizeTrackingAppendOnlyMap[K, C] | |||
} | |||
currentMap.changeValue(curEntry._1, update) | |||
elementsRead += 1 | |||
addElementsRead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you still need the ()
here and everywhere this is used
A few more minor comments. This LGTM otherwise |
Test build #23564 has started for PR 3302 at commit
|
Test build #23564 has finished for PR 3302 at commit
|
Test PASSed. |
@mateiz any other comments? |
@andrewor14 nope, it looks good. |
the elementsRead variable should be reset to 0 after each spilling Author: Tianshuo Deng <tdeng@twitter.com> Closes #3302 from tsdeng/fix_external_sorter_record_count and squashes the following commits: 7b56ca0 [Tianshuo Deng] fix method signature 782c7de [Tianshuo Deng] make elementsRead private, fix comment bb7ff28 [Tianshuo Deng] update elemetsRead through addElementsRead method 74ca246 [Tianshuo Deng] fix elements read count (cherry picked from commit d75579d) Signed-off-by: Andrew Or <andrew@databricks.com>
the elementsRead variable should be reset to 0 after each spilling