-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-86: [Python] Implement zero-copy Arrow-to-Pandas conversion #52
Conversation
Performance benchmarks confirm that we do not allocate any additional RAM and the time to convert Arrow->Pandas drops by 50% but still seeing a dependency between size and runtime. So it is zero-copy but still not zero non-constant overhead. |
Benchmark results for
|
After apache#52 is merged, I'd like to split Column and Table into separate .pyx files, array.pyx seems a bit overcrowded. Author: Uwe L. Korn <uwelk@xhochy.com> Closes apache#53 from xhochy/arrow-49 and squashes the following commits: b01b201 [Uwe L. Korn] Use correct number of chunks e422faf [Uwe L. Korn] Incoportate PR feedback, Add ChunkedArray interface e8f84a9 [Uwe L. Korn] ARROW-49: [Python] Add Column and Table wrapper interface
Can you rebase? I'll review this soon |
} | ||
|
||
// Arrow data is immutable. | ||
PyArray_CLEARFLAGS(out_, NPY_ARRAY_WRITEABLE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this contract should be enforced here (basically a caveat emptor to the caller of this to-Arrow conversion). But I don't feel extremely strongly about it
We can create zero-copy NumPy arrays for floats and ints if we have no nulls. Each numpy-arrow-view has a reference to the underlying column to ensure that the Arrow structure lives at least as long as the newly created NumPy array.
lgtm, thank you |
- added target "make stylecheck" to check style - added target "make stylefix" to check style - fixed README.md - fixed ci script - used stylefix to fix all existing style violations
- added target "make stylecheck" to check style - added target "make stylefix" to check style - fixed README.md - fixed ci script - used stylefix to fix all existing style violations
- added target "make stylecheck" to check style - added target "make stylefix" to check style - fixed README.md - fixed ci script - used stylefix to fix all existing style violations
- added target "make stylecheck" to check style - added target "make stylefix" to check style - fixed README.md - fixed ci script - used stylefix to fix all existing style violations
…ypes and test end to end Includes tests for end to end plain encoding and decoding of all data types. Author: Deepak Majeti <deepak.majeti@hp.com> Closes apache#52 from majetideepak/PARQUET-499 and squashes the following commits: 897859b [Deepak Majeti] minor edits 2067ef5 [Deepak Majeti] renamed a test dfb19f8 [Deepak Majeti] templated all types 059967a [Deepak Majeti] templated int and real tests da86d4d [Deepak Majeti] minor fix 4976bec [Deepak Majeti] include pruning d0f8ab9 [Deepak Majeti] addressed comments 07257c0 [Deepak Majeti] minor format edits 6ca0b30 [Deepak Majeti] fixed formatting and casting issues 9815062 [Deepak Majeti] PARQUET-499
- added target "make stylecheck" to check style - added target "make stylefix" to check style - fixed README.md - fixed ci script - used stylefix to fix all existing style violations
…ypes and test end to end Includes tests for end to end plain encoding and decoding of all data types. Author: Deepak Majeti <deepak.majeti@hp.com> Closes apache#52 from majetideepak/PARQUET-499 and squashes the following commits: 897859b [Deepak Majeti] minor edits 2067ef5 [Deepak Majeti] renamed a test dfb19f8 [Deepak Majeti] templated all types 059967a [Deepak Majeti] templated int and real tests da86d4d [Deepak Majeti] minor fix 4976bec [Deepak Majeti] include pruning d0f8ab9 [Deepak Majeti] addressed comments 07257c0 [Deepak Majeti] minor format edits 6ca0b30 [Deepak Majeti] fixed formatting and casting issues 9815062 [Deepak Majeti] PARQUET-499 Change-Id: I45b2811e9abc8cad1277a533280d7fc3727d13e7
…ypes and test end to end Includes tests for end to end plain encoding and decoding of all data types. Author: Deepak Majeti <deepak.majeti@hp.com> Closes apache#52 from majetideepak/PARQUET-499 and squashes the following commits: 897859b [Deepak Majeti] minor edits 2067ef5 [Deepak Majeti] renamed a test dfb19f8 [Deepak Majeti] templated all types 059967a [Deepak Majeti] templated int and real tests da86d4d [Deepak Majeti] minor fix 4976bec [Deepak Majeti] include pruning d0f8ab9 [Deepak Majeti] addressed comments 07257c0 [Deepak Majeti] minor format edits 6ca0b30 [Deepak Majeti] fixed formatting and casting issues 9815062 [Deepak Majeti] PARQUET-499 Change-Id: I45b2811e9abc8cad1277a533280d7fc3727d13e7
…ypes and test end to end Includes tests for end to end plain encoding and decoding of all data types. Author: Deepak Majeti <deepak.majeti@hp.com> Closes apache#52 from majetideepak/PARQUET-499 and squashes the following commits: 897859b [Deepak Majeti] minor edits 2067ef5 [Deepak Majeti] renamed a test dfb19f8 [Deepak Majeti] templated all types 059967a [Deepak Majeti] templated int and real tests da86d4d [Deepak Majeti] minor fix 4976bec [Deepak Majeti] include pruning d0f8ab9 [Deepak Majeti] addressed comments 07257c0 [Deepak Majeti] minor format edits 6ca0b30 [Deepak Majeti] fixed formatting and casting issues 9815062 [Deepak Majeti] PARQUET-499 Change-Id: I45b2811e9abc8cad1277a533280d7fc3727d13e7
…ypes and test end to end Includes tests for end to end plain encoding and decoding of all data types. Author: Deepak Majeti <deepak.majeti@hp.com> Closes apache#52 from majetideepak/PARQUET-499 and squashes the following commits: 897859b [Deepak Majeti] minor edits 2067ef5 [Deepak Majeti] renamed a test dfb19f8 [Deepak Majeti] templated all types 059967a [Deepak Majeti] templated int and real tests da86d4d [Deepak Majeti] minor fix 4976bec [Deepak Majeti] include pruning d0f8ab9 [Deepak Majeti] addressed comments 07257c0 [Deepak Majeti] minor format edits 6ca0b30 [Deepak Majeti] fixed formatting and casting issues 9815062 [Deepak Majeti] PARQUET-499 Change-Id: I45b2811e9abc8cad1277a533280d7fc3727d13e7
- added target "make stylecheck" to check style - added target "make stylefix" to check style - fixed README.md - fixed ci script - used stylefix to fix all existing style violations
- added target "make stylecheck" to check style - added target "make stylefix" to check style - fixed README.md - fixed ci script - used stylefix to fix all existing style violations
We can create zero-copy NumPy arrays for floats and ints if we have no
nulls. Each numpy-arrow-view has a reference to the underlying column to
ensure that the Arrow structure lives at least as long as the newly
created NumPy array.