Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support configuring types_mapper in read_gbq #45

Closed
bnaul opened this issue May 4, 2023 · 3 comments · Fixed by #46
Closed

Support configuring types_mapper in read_gbq #45

bnaul opened this issue May 4, 2023 · 3 comments · Fixed by #46

Comments

@bnaul
Copy link
Contributor

bnaul commented May 4, 2023

googleapis/python-bigquery#1529 and googleapis/python-bigquery#1547 have recently added arguments for overriding the default type conversions performed by record_batch.to_pandas(); this allows, for example, loading string data directly into dtype string[pyarrow] (which can be quite a bit more efficient) without doing any expensive conversions after the fact.

I think we could basically just copy the implementation from the above PRs, same kwarg names and everything. Anyone see any potential issues @jrbourbeau @j-bennet @ncclementi?

@j-bennet
Copy link
Contributor

j-bennet commented May 4, 2023

@bnaul You can already provide a custom mapper in read_dbq as part of arrow_options:

https://github.com/coiled/dask-snowflake/blob/42cb99e4e35aeb18f7ede95badba8f20a4113378/dask_snowflake/core.py#L215-L217

@bnaul
Copy link
Contributor Author

bnaul commented May 4, 2023

Ha well this looks great, but unfortunately that's dask-snowflake and this is dask-bigquery 😅 but yeah that's basically what I had in mind! Do you think the same arrow_options approach makes sense here? As opposed to copying the many many kwargs that they expose in google.cloud.bigquery.QueryJob.to_dataframe().

@j-bennet
Copy link
Contributor

j-bennet commented May 4, 2023

:) :) :) scratch that, I'm contributing to dask-snowflake and dask-bigquery right now, and completely confused which one we're talking about.

Yes, I think arrow_kwargs would make sense in read_dbq. It would not take much, we would need to pass those kwargs through to the point where we call to_pandas on a pyarrow record batch:

pyarrow.ipc.read_record_batch(
pyarrow.py_buffer(message.arrow_record_batch.serialized_record_batch),
schema,
).to_pandas()

@j-bennet j-bennet changed the title Support configuring types_mapper in read_gbq Support configuring types_mapper in read_gbq May 4, 2023
@bnaul bnaul closed this as completed in #46 May 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants