Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11805] free the array in UnsafeExternalSorter during spilling #9793

Closed
wants to merge 3 commits into from

Conversation

davies
Copy link
Contributor

@davies davies commented Nov 18, 2015

After calling spill() on SortedIterator, the array inside InMemorySorter is not needed, it should be freed during spilling, this could help to join multiple tables with limited memory.

@davies
Copy link
Contributor Author

davies commented Nov 18, 2015

cc @JoshRosen

@SparkQA
Copy link

SparkQA commented Nov 18, 2015

Test build #46178 has finished for PR 9793 at commit b69c1ee.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * public final class SortedIterator extends UnsafeSorterIterator\n * public final class UTF8String implements Comparable<UTF8String>, Externalizable, KryoSerializable\n

released += inMemSorter.getMemoryUsage();
inMemSorter.free();
inMemSorter = null;
logger.warn("released {} from {}", released, this);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems a little strong. should be info

@SparkQA
Copy link

SparkQA commented Nov 23, 2015

Test build #46547 has finished for PR 9793 at commit 996233f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * public final class SortedIterator extends UnsafeSorterIterator\n

@@ -489,10 +495,6 @@ public void loadNext() throws IOException {
}
upstream = nextUpstream;
nextUpstream = null;

assert(inMemSorter != null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we don't spill? Will we still free the sorter? I'm wondering whether we should also try to free it here in that case, maybe by changing assert into if.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't spill, we need the array to iterator all the records, can't free the sorter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the free() call at this line freeing in the old code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant to say is that my understanding of the old free() call here was that it was performed when we hit the final record of the iterator, so I thought we'd still need / want to free the iterator at that point even in the new code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no spilling, the inMemSorter will be freed in cleanupResources() as usual.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see that getSortedIterator()'s contract specifies that the caller should call cleanupResources() after consuming the iterator:

 /**
   * Returns a sorted iterator. It is the caller's responsibility to call `cleanupResources()`
   * after consuming this iterator.
   */

Even if we're merging an in-memory iterator with a bunch of on-disk spills, there isn't an advantage to trying to free the in-memory iterator's array as soon as we hit the end of that in-memory iterator, since in expectation I think that we would hit the end of that iterator at about the same time that we hit the end of the other iterators / the merged iterator as a whole.

Therefore, LGTM.

@JoshRosen
Copy link
Contributor

Thanks for clarifying my question upthread. This looks good to me, so I'm going to merge to master and branch-1.6.

asfgit pushed a commit that referenced this pull request Nov 24, 2015
After calling spill() on SortedIterator, the array inside InMemorySorter is not needed, it should be freed during spilling, this could help to join multiple tables with limited memory.

Author: Davies Liu <davies@databricks.com>

Closes #9793 from davies/free_array.

(cherry picked from commit 58d9b26)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
@asfgit asfgit closed this in 58d9b26 Nov 24, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants