-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ArrayManager] Enable read_parquet to not create 2D blocks when using ArrayManager #40303
[ArrayManager] Enable read_parquet to not create 2D blocks when using ArrayManager #40303
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty reasonable. cc @jbrockmendel
|
||
# setup engines & skips | ||
@pytest.fixture( | ||
params=[ | ||
pytest.param( | ||
"fastparquet", | ||
marks=pytest.mark.skipif( | ||
not _HAVE_FASTPARQUET, reason="fastparquet is not installed" | ||
not _HAVE_FASTPARQUET or get_option("mode.data_manager") == "array", | ||
reason="fastparquet is not installed or ArrayManager is used", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a "for now" or a "ever"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If your question is about "will ArrayManager be supported with fastparquet engine", that's probably a question for the fastparquet package (and since this is only optional for now, there is still time to discuss that with them)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so not actionable on our end, thanks
small question, LGTM |
xref #39146
I was exploring the Parquet IO, and pyarrow has an option to not created consolidated blocks. If we do this when wanting to create an ArrayManager, we can reduce the memory usage. It's a bit slower, though, because there is still the overhead of creating more blocks (that's something that would need to be changed in pyarrow).
Would still need to add a test that checks the option is honored.