-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-11805] free the array in UnsafeExternalSorter during spilling #9793
Conversation
cc @JoshRosen |
Test build #46178 has finished for PR 9793 at commit
|
released += inMemSorter.getMemoryUsage(); | ||
inMemSorter.free(); | ||
inMemSorter = null; | ||
logger.warn("released {} from {}", released, this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems a little strong. should be info
Test build #46547 has finished for PR 9793 at commit
|
@@ -489,10 +495,6 @@ public void loadNext() throws IOException { | |||
} | |||
upstream = nextUpstream; | |||
nextUpstream = null; | |||
|
|||
assert(inMemSorter != null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if we don't spill? Will we still free the sorter? I'm wondering whether we should also try to free it here in that case, maybe by changing assert into if
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't spill, we need the array to iterator all the records, can't free the sorter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the free()
call at this line freeing in the old code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant to say is that my understanding of the old free()
call here was that it was performed when we hit the final record of the iterator, so I thought we'd still need / want to free the iterator at that point even in the new code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no spilling, the inMemSorter
will be freed in cleanupResources()
as usual.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see that getSortedIterator()
's contract specifies that the caller should call cleanupResources()
after consuming the iterator:
/**
* Returns a sorted iterator. It is the caller's responsibility to call `cleanupResources()`
* after consuming this iterator.
*/
Even if we're merging an in-memory iterator with a bunch of on-disk spills, there isn't an advantage to trying to free the in-memory iterator's array as soon as we hit the end of that in-memory iterator, since in expectation I think that we would hit the end of that iterator at about the same time that we hit the end of the other iterators / the merged iterator as a whole.
Therefore, LGTM.
Thanks for clarifying my question upthread. This looks good to me, so I'm going to merge to master and branch-1.6. |
After calling spill() on SortedIterator, the array inside InMemorySorter is not needed, it should be freed during spilling, this could help to join multiple tables with limited memory. Author: Davies Liu <davies@databricks.com> Closes #9793 from davies/free_array. (cherry picked from commit 58d9b26) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
After calling spill() on SortedIterator, the array inside InMemorySorter is not needed, it should be freed during spilling, this could help to join multiple tables with limited memory.