Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-86: [Python] Implement zero-copy Arrow-to-Pandas conversion #52

Closed
wants to merge 4 commits into from

Conversation

xhochy
Copy link
Member

@xhochy xhochy commented Mar 30, 2016

We can create zero-copy NumPy arrays for floats and ints if we have no
nulls. Each numpy-arrow-view has a reference to the underlying column to
ensure that the Arrow structure lives at least as long as the newly
created NumPy array.

@xhochy
Copy link
Member Author

xhochy commented Mar 30, 2016

Performance benchmarks confirm that we do not allocate any additional RAM and the time to convert Arrow->Pandas drops by 50% but still seeing a dependency between size and runtime. So it is zero-copy but still not zero non-constant overhead.

@xhochy
Copy link
Member Author

xhochy commented Mar 30, 2016

Benchmark results for to_pandas. For int64 and float64, we have zero-copy, the other two are copied.

               ========== ========= ========== ============== ==========
               --                             dtype                     
               ---------- ----------------------------------------------
                  size      int64    float64    float64_nans     str    
               ========== ========= ========== ============== ==========
                   1        1.02ms   953.53μs     843.33μs      1.05ms  
                 100000     1.01ms    1.02ms       3.33ms      22.15ms  
                1000000     3.90ms    3.76ms      26.29ms      179.65ms 
                10000000   71.82ms   71.40ms      287.06ms      1.67s   
               ========== ========= ========== ============== ==========

wesm pushed a commit to wesm/arrow that referenced this pull request Apr 1, 2016
After apache#52 is merged, I'd like to split Column and Table into separate .pyx files, array.pyx seems a bit overcrowded.

Author: Uwe L. Korn <uwelk@xhochy.com>

Closes apache#53 from xhochy/arrow-49 and squashes the following commits:

b01b201 [Uwe L. Korn] Use correct number of chunks
e422faf [Uwe L. Korn] Incoportate PR feedback, Add ChunkedArray interface
e8f84a9 [Uwe L. Korn] ARROW-49: [Python] Add Column and Table wrapper interface
@wesm
Copy link
Member

wesm commented Apr 1, 2016

Can you rebase? I'll review this soon

}

// Arrow data is immutable.
PyArray_CLEARFLAGS(out_, NPY_ARRAY_WRITEABLE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this contract should be enforced here (basically a caveat emptor to the caller of this to-Arrow conversion). But I don't feel extremely strongly about it

xhochy added 2 commits April 3, 2016 17:46
We can create zero-copy NumPy arrays for floats and ints if we have no
nulls. Each numpy-arrow-view has a reference to the underlying column to
ensure that the Arrow structure lives at least as long as the newly
created NumPy array.
@wesm
Copy link
Member

wesm commented Apr 3, 2016

lgtm, thank you

@asfgit asfgit closed this in 9d88a50 Apr 3, 2016
@xhochy xhochy deleted the arrow-86 branch March 7, 2017 16:16
praveenbingo pushed a commit to praveenbingo/arrow that referenced this pull request Aug 30, 2018
- added target "make stylecheck" to check style
- added target "make stylefix" to check style
- fixed README.md
- fixed ci script
- used stylefix to fix all existing style violations
praveenbingo pushed a commit to praveenbingo/arrow that referenced this pull request Aug 30, 2018
- added target "make stylecheck" to check style
- added target "make stylefix" to check style
- fixed README.md
- fixed ci script
- used stylefix to fix all existing style violations
praveenbingo pushed a commit to praveenbingo/arrow that referenced this pull request Aug 30, 2018
- added target "make stylecheck" to check style
- added target "make stylefix" to check style
- fixed README.md
- fixed ci script
- used stylefix to fix all existing style violations
praveenbingo pushed a commit to praveenbingo/arrow that referenced this pull request Aug 30, 2018
- added target "make stylecheck" to check style
- added target "make stylefix" to check style
- fixed README.md
- fixed ci script
- used stylefix to fix all existing style violations
wesm pushed a commit to wesm/arrow that referenced this pull request Sep 2, 2018
…ypes and test end to end

Includes tests for end to end plain encoding and decoding of all data types.

Author: Deepak Majeti <deepak.majeti@hp.com>

Closes apache#52 from majetideepak/PARQUET-499 and squashes the following commits:

897859b [Deepak Majeti] minor edits
2067ef5 [Deepak Majeti] renamed a test
dfb19f8 [Deepak Majeti] templated all types
059967a [Deepak Majeti] templated int and real tests
da86d4d [Deepak Majeti] minor fix
4976bec [Deepak Majeti] include pruning
d0f8ab9 [Deepak Majeti] addressed comments
07257c0 [Deepak Majeti] minor format edits
6ca0b30 [Deepak Majeti] fixed formatting and casting issues
9815062 [Deepak Majeti] PARQUET-499
praveenbingo pushed a commit to praveenbingo/arrow that referenced this pull request Sep 4, 2018
- added target "make stylecheck" to check style
- added target "make stylefix" to check style
- fixed README.md
- fixed ci script
- used stylefix to fix all existing style violations
wesm pushed a commit to wesm/arrow that referenced this pull request Sep 4, 2018
…ypes and test end to end

Includes tests for end to end plain encoding and decoding of all data types.

Author: Deepak Majeti <deepak.majeti@hp.com>

Closes apache#52 from majetideepak/PARQUET-499 and squashes the following commits:

897859b [Deepak Majeti] minor edits
2067ef5 [Deepak Majeti] renamed a test
dfb19f8 [Deepak Majeti] templated all types
059967a [Deepak Majeti] templated int and real tests
da86d4d [Deepak Majeti] minor fix
4976bec [Deepak Majeti] include pruning
d0f8ab9 [Deepak Majeti] addressed comments
07257c0 [Deepak Majeti] minor format edits
6ca0b30 [Deepak Majeti] fixed formatting and casting issues
9815062 [Deepak Majeti] PARQUET-499

Change-Id: I45b2811e9abc8cad1277a533280d7fc3727d13e7
wesm pushed a commit to wesm/arrow that referenced this pull request Sep 6, 2018
…ypes and test end to end

Includes tests for end to end plain encoding and decoding of all data types.

Author: Deepak Majeti <deepak.majeti@hp.com>

Closes apache#52 from majetideepak/PARQUET-499 and squashes the following commits:

897859b [Deepak Majeti] minor edits
2067ef5 [Deepak Majeti] renamed a test
dfb19f8 [Deepak Majeti] templated all types
059967a [Deepak Majeti] templated int and real tests
da86d4d [Deepak Majeti] minor fix
4976bec [Deepak Majeti] include pruning
d0f8ab9 [Deepak Majeti] addressed comments
07257c0 [Deepak Majeti] minor format edits
6ca0b30 [Deepak Majeti] fixed formatting and casting issues
9815062 [Deepak Majeti] PARQUET-499

Change-Id: I45b2811e9abc8cad1277a533280d7fc3727d13e7
wesm pushed a commit to wesm/arrow that referenced this pull request Sep 7, 2018
…ypes and test end to end

Includes tests for end to end plain encoding and decoding of all data types.

Author: Deepak Majeti <deepak.majeti@hp.com>

Closes apache#52 from majetideepak/PARQUET-499 and squashes the following commits:

897859b [Deepak Majeti] minor edits
2067ef5 [Deepak Majeti] renamed a test
dfb19f8 [Deepak Majeti] templated all types
059967a [Deepak Majeti] templated int and real tests
da86d4d [Deepak Majeti] minor fix
4976bec [Deepak Majeti] include pruning
d0f8ab9 [Deepak Majeti] addressed comments
07257c0 [Deepak Majeti] minor format edits
6ca0b30 [Deepak Majeti] fixed formatting and casting issues
9815062 [Deepak Majeti] PARQUET-499

Change-Id: I45b2811e9abc8cad1277a533280d7fc3727d13e7
wesm pushed a commit to wesm/arrow that referenced this pull request Sep 8, 2018
…ypes and test end to end

Includes tests for end to end plain encoding and decoding of all data types.

Author: Deepak Majeti <deepak.majeti@hp.com>

Closes apache#52 from majetideepak/PARQUET-499 and squashes the following commits:

897859b [Deepak Majeti] minor edits
2067ef5 [Deepak Majeti] renamed a test
dfb19f8 [Deepak Majeti] templated all types
059967a [Deepak Majeti] templated int and real tests
da86d4d [Deepak Majeti] minor fix
4976bec [Deepak Majeti] include pruning
d0f8ab9 [Deepak Majeti] addressed comments
07257c0 [Deepak Majeti] minor format edits
6ca0b30 [Deepak Majeti] fixed formatting and casting issues
9815062 [Deepak Majeti] PARQUET-499

Change-Id: I45b2811e9abc8cad1277a533280d7fc3727d13e7
praveenbingo pushed a commit to praveenbingo/arrow that referenced this pull request Sep 10, 2018
- added target "make stylecheck" to check style
- added target "make stylefix" to check style
- fixed README.md
- fixed ci script
- used stylefix to fix all existing style violations
praveenbingo pushed a commit to praveenbingo/arrow that referenced this pull request Sep 10, 2018
- added target "make stylecheck" to check style
- added target "make stylefix" to check style
- fixed README.md
- fixed ci script
- used stylefix to fix all existing style violations
jikunshang added a commit to jikunshang/arrow that referenced this pull request May 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants