-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-22003][SQL] support array column in vectorized reader with UDF #19230
Conversation
Add a test for it? |
} else if (dt instanceof StringType) { | ||
for (int i = 0; i < length; i++) { | ||
if (!data.isNullAt(offset + i)) { | ||
list[i] = getUTF8String(i).toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks suspicious. Why we get String
before? Seems we should get UTF8String
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a bug.
for (int i = 0; i < length; i++) { | ||
if (!data.isNullAt(offset + i)) { | ||
list[i] = data.getDouble(offset + i); | ||
list[i] = getAtMethod.call(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just call get(i + offset, dt)
? The getAtMethod
seems not very useful, as we still need to go through the if-else branches in get
everytime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be get(i, dt)
? I updated it anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea should be get(i, dt)
.
Test build #81759 has finished for PR 19230 at commit
|
since |
Yea we should add an unit test for it. |
@viirya @cloud-fan unit test updated. |
@@ -16,6 +16,7 @@ | |||
*/ | |||
package org.apache.spark.sql.execution.vectorized; | |||
|
|||
import org.apache.spark.api.java.function.Function; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use this now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liufengdb I think we don't need to import this now?
Test build #81835 has finished for PR 19230 at commit
|
// Populate it with arrays [0], [1, 2], [], [3, 4, 5] | ||
testVector.putArray(0, 0, 1) | ||
testVector.putArray(1, 1, 2) | ||
testVector.putArray(2, 2, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it doesn't affect the result. But looks like the third array should be testVector.putArray(2, 3, 0)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@liufengdb The PR description looks like an end-to-end failure. I'm curious are you facing the failure in an end-to-end case? |
@@ -158,7 +158,7 @@ private static void appendValue(WritableColumnVector dst, DataType t, Object o) | |||
dst.getChildColumn(0).appendInt(c.months); | |||
dst.getChildColumn(1).appendLong(c.microseconds); | |||
} else if (t instanceof DateType) { | |||
dst.appendInt(DateTimeUtils.fromJavaDate((Date)o)); | |||
dst.appendInt((int) DateTimeUtils.fromJavaDate((Date)o)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it necessary?
LGTM except some minor comments |
LGTM too. |
Test build #81850 has finished for PR 19230 at commit
|
@@ -16,6 +16,7 @@ | |||
*/ | |||
package org.apache.spark.sql.execution.vectorized; | |||
|
|||
import org.apache.spark.api.java.function.Function; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, reverted it.
assert(array.get(1, schema).asInstanceOf[ColumnarBatch.Row].get(0, IntegerType) === 456) | ||
assert(array.get(1, schema).asInstanceOf[ColumnarBatch.Row].get(1, DoubleType) === 5.67) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to add a test for map
, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mapType is not supported in ColumnVector: https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java#L235
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
Does your change expect that this call finally throws an exception for Map
element in array?
retest this please |
Test build #81861 has finished for PR 19230 at commit
|
retest this please |
1 similar comment
retest this please |
Can we add test code for |
Test build #81872 has finished for PR 19230 at commit
|
retest this please |
Test build #81877 has finished for PR 19230 at commit
|
Thanks! Merged to master. |
What changes were proposed in this pull request?
The UDF needs to deserialize the
UnsafeRow
. When the column type is Array, theget
method from theColumnVector
, which is used by the vectorized reader, is called, but this method is not implemented.How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
Please review http://spark.apache.org/contributing.html before opening a pull request.