Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Arrow convertor to honor dictionary encoding inside complex types. #7171

Closed
wants to merge 1 commit into from

Conversation

kgpai
Copy link
Contributor

@kgpai kgpai commented Oct 20, 2023

Summary: Arrow converter currently doesnt check for dictionary encoding etc when inside a complex type.

Reviewed By: panditsurabhi

Differential Revision: D50515953

@netlify
Copy link

netlify bot commented Oct 20, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit fb2dcfe
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/655813943604b9000821876c

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 20, 2023
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50515953

@@ -1114,7 +1115,13 @@ VectorPtr importFromArrowImpl(

if (arrowSchema.dictionary) {
return createDictionaryVector(
pool, type, nulls, arrowSchema, arrowArray, isViewer, wrapInBufferView);
pool,
INTEGER(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to do some conversion if type is not INTEGER, cannot silently replace it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does does this function even takes type as a parameter if it only accepts INTEGER?

Copy link
Contributor

@Yuhta Yuhta Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be expanded to support different widths of indices from arrow. In Velox we have only INTEGER, so some transformation needs to be done on the buffer depending on the type passed in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure I am on same page:

  1. In arrow dictionaries can have different index types apart from integers ?
  2. Since velox only supports integer indexes I will also make a change to createDictionary where we do the type conversion from whatever type arrow has to integer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this diff doesnt add support for non integer index's - that should be in another diff.

@@ -726,7 +726,8 @@ void exportToArrow(const VectorPtr& vec, ArrowSchema& arrowSchema) {
}

TypePtr importFromArrow(const ArrowSchema& arrowSchema) {
const char* format = arrowSchema.format;
const char* format = arrowSchema.dictionary ? arrowSchema.dictionary->format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have unit tests for this change. This should go under ArrowBridgeSchemaTest.cpp

@@ -1114,7 +1115,13 @@ VectorPtr importFromArrowImpl(

if (arrowSchema.dictionary) {
return createDictionaryVector(
pool, type, nulls, arrowSchema, arrowArray, isViewer, wrapInBufferView);
pool,
INTEGER(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does does this function even takes type as a parameter if it only accepts INTEGER?

kgpai added a commit to kgpai/velox-1 that referenced this pull request Nov 13, 2023
facebookincubator#7171)

Summary:

Arrow converter currently doesnt check for dictionary encoding etc when inside a complex type.

Differential Revision: D50515953
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50515953

kgpai added a commit to kgpai/velox-1 that referenced this pull request Nov 13, 2023
facebookincubator#7171)

Summary:

Arrow converter currently doesnt check for dictionary encoding etc when inside a complex type.

Differential Revision: D50515953
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50515953

@kgpai
Copy link
Contributor Author

kgpai commented Nov 13, 2023

cc: @pedroerp @Yuhta

kgpai added a commit to kgpai/velox-1 that referenced this pull request Nov 13, 2023
facebookincubator#7171)

Summary:

Arrow converter currently doesnt check for dictionary encoding etc when inside a complex type.

Differential Revision: D50515953
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50515953

kgpai added a commit to kgpai/velox-1 that referenced this pull request Nov 13, 2023
facebookincubator#7171)

Summary:

Arrow converter currently doesnt check for dictionary encoding etc when inside a complex type.

Differential Revision: D50515953
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50515953

kgpai added a commit to kgpai/velox-1 that referenced this pull request Nov 14, 2023
facebookincubator#7171)

Summary:

Arrow converter currently doesnt check for dictionary encoding etc when inside a complex type.

Differential Revision: D50515953
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50515953

}
VELOX_USER_FAIL(
"Unable to convert '{}' ArrowSchema format type to Velox.", format);
// As per
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps more clearly:

"For dictionaries, format encodes the index type, while the dictionary value is encoded in the dictionary member, as per: ..."

*testSchemaDictionaryImport(
"i",
makeComplexArrowSchema(
schemas, schemaPtrs, "+s", {"s", "f"}, {"col1", "col2"})));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have nested dictionary types (dict(dict(integer)))? If so, do we have tests for them?

kgpai added a commit to kgpai/velox-1 that referenced this pull request Nov 16, 2023
facebookincubator#7171)

Summary:

Arrow converter currently doesnt check for dictionary encoding etc when inside a complex type.

Reviewed By: pedroerp

Differential Revision: D50515953
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50515953

kgpai added a commit to kgpai/velox-1 that referenced this pull request Nov 16, 2023
facebookincubator#7171)

Summary:

Arrow converter currently doesnt check for dictionary encoding etc when inside a complex type.

Reviewed By: pedroerp

Differential Revision: D50515953
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50515953

kgpai added a commit to kgpai/velox-1 that referenced this pull request Nov 16, 2023
facebookincubator#7171)

Summary:

Arrow converter currently doesnt check for dictionary encoding etc when inside a complex type.

Reviewed By: pedroerp

Differential Revision: D50515953
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50515953

kgpai added a commit to kgpai/velox-1 that referenced this pull request Nov 17, 2023
facebookincubator#7171)

Summary:

Arrow converter currently doesnt check for dictionary encoding etc when inside a complex type.

Reviewed By: pedroerp

Differential Revision: D50515953
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50515953

kgpai added a commit to kgpai/velox-1 that referenced this pull request Nov 17, 2023
facebookincubator#7171)

Summary:

Arrow converter currently doesnt check for dictionary encoding etc when inside a complex type.

Reviewed By: pedroerp

Differential Revision: D50515953
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50515953

kgpai added a commit to kgpai/velox-1 that referenced this pull request Nov 17, 2023
facebookincubator#7171)

Summary:

Arrow converter currently doesnt check for dictionary encoding etc when inside a complex type.

Reviewed By: pedroerp

Differential Revision: D50515953
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50515953

facebookincubator#7171)

Summary:

Arrow converter currently doesnt check for dictionary encoding etc when inside a complex type.

Reviewed By: pedroerp

Differential Revision: D50515953
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50515953

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50515953

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in a86cd7b.

Copy link

Conbench analyzed the 1 benchmark run on commit a86cd7be.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants