Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MINOR [Go] Add nested struct test for pqarrow #35851

Closed

Conversation

candiduslynx
Copy link
Contributor

@zeroshade I've found a disturbing behavior with structs (namely, nested ones) in pqarrow.

This blocks cloudquery/filetypes#172

Could you please take a look?

@candiduslynx candiduslynx requested a review from zeroshade as a code owner May 31, 2023 17:08
@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels May 31, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels May 31, 2023
@candiduslynx
Copy link
Contributor Author

@zeroshade I found an issue with the test & it should now pass, but I'm really confused about array.NewStructData behavior in conjunction with pqarrow.
Could you please take a look at the code in cloudquery/filetypes#172?
Specifically, the issue I see is that the offsets/null bitmaps for the nested field are propagated strangely.
It panics while trying to create Struct array from data, so I don't know where to look.

@candiduslynx candiduslynx requested a review from zeroshade May 31, 2023 17:52
@candiduslynx candiduslynx changed the title [Go] Repro for nested null struct panic MINOR [Go] Add nested struct test for pqarrow May 31, 2023
@candiduslynx
Copy link
Contributor Author

I guess the issue can be connected to https://github.com/apache/arrow/blob/main/go/parquet/pqarrow/encode_arrow_test.go#L1040-L1042

// current impl of ArrayEquals for structs doesn't correctly handle nulls in the parent
// with a non-nullable child when comparing. Since after the round trip, the data in the
// child will have the nulls, not the original data.

kodiakhq bot pushed a commit to cloudquery/filetypes that referenced this pull request Jun 1, 2023
#172)

Extracted from #171

Structs reading is slow because of apache/arrow#35851
Fields copy is because of apache/arrow#35867
@candiduslynx
Copy link
Contributor Author

I'm closing this for now, however, it would've been nice to be able to use array.NewStructData on the read values

zeroshade pushed a commit that referenced this pull request Jun 1, 2023
…al` (#35872)

### Rationale for this change

When comparing `array.Struct` values with `array.ApproxEqual` the validity bitmap of the struct itself should take precedence:
> When reading the struct array the parent validity bitmap takes priority.

This follows a brief discussion from #35851.

### What changes are included in this PR?

`array.arrayApproxEqualStruct` will check the fields data only if the struct elem is valid.

### Are these changes tested?

`pqarrow` tests are updated accordingly (no special treatment for structs, just `array.ApproxEqual`

### Are there any user-facing changes?

`array.ApproxEqual` behavior changed to match the docs about validity bitmap precedence.

* Closes: #35871

Authored-by: candiduslynx <candiduslynx@gmail.com>
Signed-off-by: Matt Topol <zotthewizard@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants