Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert df to Spark DataFrame if it is a pandas or pandas-on-Spark DataFrame before writing #317

Merged
merged 3 commits into from
Sep 21, 2022

Conversation

dbeatty10
Copy link
Contributor

@dbeatty10 dbeatty10 commented Sep 20, 2022

resolves #316

Description

Enable pandas and pandas-on-Spark DataFrames for dbt Python models. Convert either of those types to Spark DataFrame before writing.

Checklist

@cla-bot cla-bot bot added the cla:yes label Sep 20, 2022
@github-actions
Copy link
Contributor

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the dbt-bigquery contributing guide.

@dbeatty10 dbeatty10 marked this pull request as ready for review September 20, 2022 21:15
@dbeatty10 dbeatty10 added the triage:ready-for-review Externally contributed PR has functional approval, ready for code review from Core engineering label Sep 20, 2022
Copy link
Contributor

@ChenyuLInx ChenyuLInx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dbeatty10
Copy link
Contributor Author

@ChenyuLInx I'm going to try adding support for importing Koalas (either in this PR or a separate PR) since:

@dbeatty10 dbeatty10 merged commit 1e65581 into main Sep 21, 2022
@dbeatty10 dbeatty10 deleted the dbeatty/pandas-df-pandas-on-spark-df branch September 21, 2022 18:13
@dbeatty10
Copy link
Contributor Author

I'm going to try adding support for importing Koalas (either in this PR or a separate PR)

Opted for "separate PR":

  • so that we could get the bulk of the logic merged now to achieve parity with dbt-spark, and
  • since we'll want to do the same change to dbt-spark, we might as well do them at the same time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:yes triage:ready-for-review Externally contributed PR has functional approval, ready for code review from Core engineering
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-1214] [Feature] Enable pandas-on-Spark DataFrames for BigQuery
2 participants