-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Arrow convertor to honor dictionary encoding inside complex types. #7171
Conversation
✅ Deploy Preview for meta-velox canceled.
|
This pull request was exported from Phabricator. Differential Revision: D50515953 |
velox/vector/arrow/Bridge.cpp
Outdated
@@ -1114,7 +1115,13 @@ VectorPtr importFromArrowImpl( | |||
|
|||
if (arrowSchema.dictionary) { | |||
return createDictionaryVector( | |||
pool, type, nulls, arrowSchema, arrowArray, isViewer, wrapInBufferView); | |||
pool, | |||
INTEGER(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to do some conversion if type
is not INTEGER
, cannot silently replace it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does does this function even takes type as a parameter if it only accepts INTEGER?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be expanded to support different widths of indices from arrow. In Velox we have only INTEGER
, so some transformation needs to be done on the buffer depending on the type
passed in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure I am on same page:
- In arrow dictionaries can have different index types apart from integers ?
- Since velox only supports integer indexes I will also make a change to createDictionary where we do the type conversion from whatever type arrow has to integer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI this diff doesnt add support for non integer index's - that should be in another diff.
@@ -726,7 +726,8 @@ void exportToArrow(const VectorPtr& vec, ArrowSchema& arrowSchema) { | |||
} | |||
|
|||
TypePtr importFromArrow(const ArrowSchema& arrowSchema) { | |||
const char* format = arrowSchema.format; | |||
const char* format = arrowSchema.dictionary ? arrowSchema.dictionary->format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have unit tests for this change. This should go under ArrowBridgeSchemaTest.cpp
velox/vector/arrow/Bridge.cpp
Outdated
@@ -1114,7 +1115,13 @@ VectorPtr importFromArrowImpl( | |||
|
|||
if (arrowSchema.dictionary) { | |||
return createDictionaryVector( | |||
pool, type, nulls, arrowSchema, arrowArray, isViewer, wrapInBufferView); | |||
pool, | |||
INTEGER(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does does this function even takes type as a parameter if it only accepts INTEGER?
ed0815b
to
5182aa9
Compare
facebookincubator#7171) Summary: Arrow converter currently doesnt check for dictionary encoding etc when inside a complex type. Differential Revision: D50515953
This pull request was exported from Phabricator. Differential Revision: D50515953 |
facebookincubator#7171) Summary: Arrow converter currently doesnt check for dictionary encoding etc when inside a complex type. Differential Revision: D50515953
5182aa9
to
d1b2d1e
Compare
This pull request was exported from Phabricator. Differential Revision: D50515953 |
facebookincubator#7171) Summary: Arrow converter currently doesnt check for dictionary encoding etc when inside a complex type. Differential Revision: D50515953
d1b2d1e
to
78b752e
Compare
This pull request was exported from Phabricator. Differential Revision: D50515953 |
facebookincubator#7171) Summary: Arrow converter currently doesnt check for dictionary encoding etc when inside a complex type. Differential Revision: D50515953
78b752e
to
db47886
Compare
This pull request was exported from Phabricator. Differential Revision: D50515953 |
facebookincubator#7171) Summary: Arrow converter currently doesnt check for dictionary encoding etc when inside a complex type. Differential Revision: D50515953
db47886
to
c7bb3a2
Compare
This pull request was exported from Phabricator. Differential Revision: D50515953 |
velox/vector/arrow/Bridge.cpp
Outdated
} | ||
VELOX_USER_FAIL( | ||
"Unable to convert '{}' ArrowSchema format type to Velox.", format); | ||
// As per |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps more clearly:
"For dictionaries, format
encodes the index type, while the dictionary value is encoded in the dictionary
member, as per: ..."
*testSchemaDictionaryImport( | ||
"i", | ||
makeComplexArrowSchema( | ||
schemas, schemaPtrs, "+s", {"s", "f"}, {"col1", "col2"}))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have nested dictionary types (dict(dict(integer)))? If so, do we have tests for them?
facebookincubator#7171) Summary: Arrow converter currently doesnt check for dictionary encoding etc when inside a complex type. Reviewed By: pedroerp Differential Revision: D50515953
c7bb3a2
to
b97c527
Compare
This pull request was exported from Phabricator. Differential Revision: D50515953 |
facebookincubator#7171) Summary: Arrow converter currently doesnt check for dictionary encoding etc when inside a complex type. Reviewed By: pedroerp Differential Revision: D50515953
b97c527
to
4c32261
Compare
This pull request was exported from Phabricator. Differential Revision: D50515953 |
facebookincubator#7171) Summary: Arrow converter currently doesnt check for dictionary encoding etc when inside a complex type. Reviewed By: pedroerp Differential Revision: D50515953
4c32261
to
a675126
Compare
This pull request was exported from Phabricator. Differential Revision: D50515953 |
facebookincubator#7171) Summary: Arrow converter currently doesnt check for dictionary encoding etc when inside a complex type. Reviewed By: pedroerp Differential Revision: D50515953
a675126
to
2f63a97
Compare
This pull request was exported from Phabricator. Differential Revision: D50515953 |
facebookincubator#7171) Summary: Arrow converter currently doesnt check for dictionary encoding etc when inside a complex type. Reviewed By: pedroerp Differential Revision: D50515953
2f63a97
to
7182324
Compare
This pull request was exported from Phabricator. Differential Revision: D50515953 |
facebookincubator#7171) Summary: Arrow converter currently doesnt check for dictionary encoding etc when inside a complex type. Reviewed By: pedroerp Differential Revision: D50515953
7182324
to
ca8f1e2
Compare
This pull request was exported from Phabricator. Differential Revision: D50515953 |
facebookincubator#7171) Summary: Arrow converter currently doesnt check for dictionary encoding etc when inside a complex type. Reviewed By: pedroerp Differential Revision: D50515953
This pull request was exported from Phabricator. Differential Revision: D50515953 |
ca8f1e2
to
fb2dcfe
Compare
This pull request was exported from Phabricator. Differential Revision: D50515953 |
This pull request has been merged in a86cd7b. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Summary: Arrow converter currently doesnt check for dictionary encoding etc when inside a complex type.
Reviewed By: panditsurabhi
Differential Revision: D50515953