-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-38575: [Python] Include metadata when creating pa.schema from PyCapsule #41538
GH-38575: [Python] Include metadata when creating pa.schema from PyCapsule #41538
Conversation
|
|
7c98632
to
d7d54ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution!
@@ -1331,7 +1331,7 @@ def __init__(self, schema): | |||
def __arrow_c_schema__(self): | |||
return self.schema.__arrow_c_schema__() | |||
|
|||
schema = pa.schema([pa.field("field_name", pa.int32())]) | |||
schema = pa.schema([pa.field("field_name", pa.int32())], metadata={"a", "b"}) | |||
wrapped_schema = Wrapper(schema) | |||
|
|||
assert pa.schema(wrapped_schema) == schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you additionally need to test is pa.schema(wrapped_schema, metadata=..)
, i.e. passing metadata to the pa.schema(..)
constructor function when passing a schema object to it (the test as you edited it now will already pass, I think, and so not cover the regression)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks. I tried pushing just the test first hoping CI would run them but got lazy when it didn't. :)
5b99daf
to
5556af6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update!
wrapped_schema = Wrapper(schema) | ||
|
||
assert pa.schema(wrapped_schema) == schema | ||
assert pa.schema(wrapped_schema).metadata == {b"a": b"b"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this additional line, because assert pa.schema(wrapped_schema) == schema
(the line above) in itself does not check the metadata (that gets ignored in basic equality testing)
Thanks for the help @jorisvandenbossche - I need to spend a bit more time getting my dev pyarrow builds working so I'm not just shooting in the dark 🙈 |
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 6a9e2d5. There were 5 benchmark results indicating a performance regression:
The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…m PyCapsule (apache#41538) ### Rationale for this change Fixes the dropped `pa.schema` metadata reported in apache#38575, which was introduced in apache#37797. ### What changes are included in this PR? Passes through the `metadata` to the short-circuited `Schema` created with `_import_from_c_capsule`. ### Are these changes tested? Yes - added `metadata` to the existing test. ### Are there any user-facing changes? I'm not sure this quite rises to the `(b) a bug that caused incorrect or invalid data to be produced,` condition, but I added that note to be safe since the resulting schema is "incorrect" (and broke some round-trip tests on my end after a pyarrow update): **This PR contains a "Critical Fix".** * GitHub Issue: apache#38575 Lead-authored-by: Jacob Hayes <jacob.r.hayes@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Rationale for this change
Fixes the dropped
pa.schema
metadata reported in #38575, which was introduced in #37797.What changes are included in this PR?
Passes through the
metadata
to the short-circuitedSchema
created with_import_from_c_capsule
.Are these changes tested?
Yes - added
metadata
to the existing test.Are there any user-facing changes?
I'm not sure this quite rises to the
(b) a bug that caused incorrect or invalid data to be produced,
condition, but I added that note to be safe since the resulting schema is "incorrect" (and broke some round-trip tests on my end after a pyarrow update):This PR contains a "Critical Fix".