-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-22143][SQL] Fix memory leak in OffHeapColumnVector #19367
Conversation
@@ -85,6 +85,7 @@ public long nullsNativeAddress() { | |||
|
|||
@Override | |||
public void close() { | |||
super.close(); | |||
Platform.freeMemory(nulls); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is not related to this fix directly. However, is it better to check whether each field is not 0
before calling Platform.freeMemory()
? For example, data
or lengthData/offsetData may be
0`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is a bit weird. The doc in the JDK8's Unsafe. freeMemory()
states the following: The address passed to this method may be null, in which case no action is taken.
See: http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/687fd7c7986d/src/share/classes/sun/misc/Unsafe.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for some minor comments.
dt: DataType)( | ||
block: WritableColumnVector => Unit): Unit = { | ||
test(name) { | ||
val c1 = new OnHeapColumnVector(size, dt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this c1
for?
} | ||
|
||
test("toArray for primitive types") { | ||
// (MemoryMode.ON_HEAP :: MemoryMode.OFF_HEAP :: Nil).foreach { memMode => { | ||
(MemoryMode.ON_HEAP :: Nil).foreach { memMode => { | ||
(MemoryMode.ON_HEAP :: MemoryMode.OFF_HEAP :: Nil).foreach { memMode => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove a pair of brace like .foreach { memMode => ...
.
// (MemoryMode.ON_HEAP :: MemoryMode.OFF_HEAP :: Nil).foreach { memMode => { | ||
(MemoryMode.ON_HEAP :: Nil).foreach { memMode => { | ||
(MemoryMode.ON_HEAP :: MemoryMode.OFF_HEAP :: Nil).foreach { memMode => { | ||
// (MemoryMode.ON_HEAP :: Nil).foreach { memMode => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just some nitpicks.
dt: DataType)( | ||
block: WritableColumnVector => Unit): Unit = { | ||
test(name) { | ||
val c1 = new OnHeapColumnVector(size, dt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is c1
for?
test(name) { | ||
modes.foreach { mode => | ||
val vector = allocate(size, dt, mode) | ||
try block(vector, mode) finally { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way you use {}
here is weird.
} | ||
|
||
test("Float APIs") { | ||
(MemoryMode.ON_HEAP :: MemoryMode.OFF_HEAP :: Nil).foreach { memMode => { | ||
testVector("Float Apis", 1024, FloatType, MemoryMode.ON_HEAP, MemoryMode.OFF_HEAP) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think changing Apis
to APIs
would be better than the other way around.
testVector( | ||
"Nest Struct in Array", | ||
10, | ||
new ArrayType(new StructType().add("int", IntegerType).add("long", LongType), true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean new ArrayType(structType)
?
test("struct") { | ||
val schema = new StructType().add("int", IntegerType).add("double", DoubleType) | ||
testVector = allocate(10, schema) | ||
val structType: StructType = new StructType().add("int", IntegerType).add("double", DoubleType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Do we need : StructType
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a class level field now, so scalastyle wants me type it...
name: String, | ||
size: Int, | ||
dt: DataType, | ||
modes: MemoryMode*)( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do modes: Seq[MemoryMode] = Seq(MemoryMode.ON_HEAP, MemoryMode.ON_HEAP))(
? Then, we can avoid repeat testVector(..., MemoryMode.ON_HEAP, MemoryMode.OFF_HEAP)
at each suite.
Test build #82243 has finished for PR 19367 at commit
|
Test build #82247 has finished for PR 19367 at commit
|
Test build #82248 has finished for PR 19367 at commit
|
retest this please |
Test build #82251 has finished for PR 19367 at commit
|
Merging to master. |
## What changes were proposed in this pull request? `WriteableColumnVector` does not close its child column vectors. This can create memory leaks for `OffHeapColumnVector` where we do not clean up the memory allocated by a vectors children. This can be especially bad for string columns (which uses a child byte column vector). ## How was this patch tested? I have updated the existing tests to always use both on-heap and off-heap vectors. Testing and diagnoses was done locally. Author: Herman van Hovell <hvanhovell@databricks.com> Closes apache#19367 from hvanhovell/SPARK-22143. # Conflicts: # sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java # sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnVectorSuite.scala # sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
What changes were proposed in this pull request?
WriteableColumnVector
does not close its child column vectors. This can create memory leaks forOffHeapColumnVector
where we do not clean up the memory allocated by a vectors children. This can be especially bad for string columns (which uses a child byte column vector).How was this patch tested?
I have updated the existing tests to always use both on-heap and off-heap vectors. Testing and diagnoses was done locally.