Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-38575: [Python] Include metadata when creating pa.schema from PyCapsule #41538

Merged
merged 3 commits into from
May 17, 2024

Conversation

JacobHayes
Copy link
Contributor

@JacobHayes JacobHayes commented May 4, 2024

Rationale for this change

Fixes the dropped pa.schema metadata reported in #38575, which was introduced in #37797.

What changes are included in this PR?

Passes through the metadata to the short-circuited Schema created with _import_from_c_capsule.

Are these changes tested?

Yes - added metadata to the existing test.

Are there any user-facing changes?

I'm not sure this quite rises to the (b) a bug that caused incorrect or invalid data to be produced, condition, but I added that note to be safe since the resulting schema is "incorrect" (and broke some round-trip tests on my end after a pyarrow update):

This PR contains a "Critical Fix".

Copy link

github-actions bot commented May 4, 2024

⚠️ GitHub issue #38575 has been automatically assigned in GitHub to PR creator.

Copy link

github-actions bot commented May 4, 2024

⚠️ GitHub issue #38575 has no components, please add labels for components.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels May 7, 2024
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

@@ -1331,7 +1331,7 @@ def __init__(self, schema):
def __arrow_c_schema__(self):
return self.schema.__arrow_c_schema__()

schema = pa.schema([pa.field("field_name", pa.int32())])
schema = pa.schema([pa.field("field_name", pa.int32())], metadata={"a", "b"})
wrapped_schema = Wrapper(schema)

assert pa.schema(wrapped_schema) == schema
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you additionally need to test is pa.schema(wrapped_schema, metadata=..), i.e. passing metadata to the pa.schema(..) constructor function when passing a schema object to it (the test as you edited it now will already pass, I think, and so not cover the regression)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks. I tried pushing just the test first hoping CI would run them but got lazy when it didn't. :)

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels May 7, 2024
@JacobHayes JacobHayes force-pushed the py-fix-schema-metadata branch from 5b99daf to 5556af6 Compare May 7, 2024 21:35
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update!

wrapped_schema = Wrapper(schema)

assert pa.schema(wrapped_schema) == schema
assert pa.schema(wrapped_schema).metadata == {b"a": b"b"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this additional line, because assert pa.schema(wrapped_schema) == schema (the line above) in itself does not check the metadata (that gets ignored in basic equality testing)

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels May 16, 2024
@JacobHayes
Copy link
Contributor Author

Thanks for the help @jorisvandenbossche - I need to spend a bit more time getting my dev pyarrow builds working so I'm not just shooting in the dark 🙈

@jorisvandenbossche jorisvandenbossche merged commit 6a9e2d5 into apache:main May 17, 2024
14 checks passed
@jorisvandenbossche jorisvandenbossche removed the awaiting merge Awaiting merge label May 17, 2024
Copy link

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 6a9e2d5.

There were 5 benchmark results indicating a performance regression:

The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them.

@JacobHayes JacobHayes deleted the py-fix-schema-metadata branch May 17, 2024 16:27
vibhatha pushed a commit to vibhatha/arrow that referenced this pull request May 25, 2024
…m PyCapsule (apache#41538)

### Rationale for this change

Fixes the dropped `pa.schema` metadata reported in apache#38575, which was introduced in apache#37797.

### What changes are included in this PR?

Passes through the `metadata` to the short-circuited `Schema` created with `_import_from_c_capsule`.

### Are these changes tested?

Yes - added `metadata` to the existing test.

### Are there any user-facing changes?

I'm not sure this quite rises to the `(b) a bug that caused incorrect or invalid data to be produced,` condition, but I added that note to be safe since the resulting schema is "incorrect" (and broke some round-trip tests on my end after a pyarrow update):

**This PR contains a "Critical Fix".**

* GitHub Issue: apache#38575

Lead-authored-by: Jacob Hayes <jacob.r.hayes@gmail.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants