Convert df to pyspark DataFrame if it is pandas before writing #301

chamini2 · 2022-09-08T17:40:04Z

resolves #312

Description

This adds a check for the df returned from the model function to convert into pyspark DataFrame if it's pandas.

Checklist

I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have run changie new to create a changelog entry

cla-bot · 2022-09-08T17:40:07Z

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, don't hesitate to ping @drewbanin.

CLA has not been signed by users: @chamini2

jtcohen6 · 2022-09-16T11:56:55Z

Original community Slack thread

@lostmygithubaccount I believe this change would resolve the bug that @b-per ran into during yesterday's hackathon, by checking to see if the user is returning a Pandas dataframe and converting it back to a PySpark dataframe before writing it back to the database.

We should make the same change over in dbt-spark as well. (Another argument for finding a place to store PySpark-specific code, so it doesn't need to be copy-pasted between these two.)

chamini2 · 2022-09-16T19:19:41Z

~~Should I open a ticket @jtcohen6 ?~~

I see someone did! Let me know if you need anything else for the PR to be good to go!

lostmygithubaccount · 2022-09-16T20:17:46Z

@chamini2 nothing on your end, appreciate the contribution! we're doing some manual testing before final review/merge

dbeatty10 · 2022-09-17T17:12:21Z

I manually verified that the following didn't work before. Confirmed that it works using @chamini2's fix 👍

import pandas as pd

def model(dbt, session):
    dbt.config(
        materialized="table",
        packages=["pandas"]
    )

    df = pd.DataFrame(
        {'City': ['Buenos Aires', 'Brasilia', 'Santiago', 'Bogota', 'Caracas'],
        'Country': ['Argentina', 'Brazil', 'Chile', 'Colombia', 'Venezuela'],
        'Latitude': [-34.58, -15.78, -33.45, 4.60, 10.48],
        'Longitude': [-58.66, -47.91, -70.66, -74.08, -66.86]}
        )

    return df

ChenyuLInx · 2022-09-19T16:48:04Z

@chamini2 Thanks for contributing this!!!

@dbeatty10 thanks for confirming it! I will add this as a basic tests in core!

Convert df to pyspark DataFrame if it is pandas before writing

a557bc8

cla-bot bot added the cla:yes label Sep 8, 2022

jtcohen6 requested a review from ChenyuLInx September 16, 2022 11:53

jtcohen6 added the triage:ready-for-review Externally contributed PR has functional approval, ready for code review from Core engineering label Sep 16, 2022

Merge branch 'main' into chamini2-patch-1

1e2238f

This was referenced Sep 16, 2022

[CT-1198] [Feature] support python model return pandas dataframe dbt-labs/dbt-spark#468

Closed

Convert df to pyspark DataFrame if it is pandas before writing dbt-labs/dbt-spark#469

Merged

Merge branch 'main' into chamini2-patch-1

0bfb533

dbeatty10 mentioned this pull request Sep 17, 2022

Convert df to pyspark DataFrame if it is pandas before writing databricks/dbt-databricks#180

Closed

3 tasks

ChenyuLInx approved these changes Sep 19, 2022

View reviewed changes

ChenyuLInx merged commit 9699a48 into dbt-labs:main Sep 19, 2022

ChenyuLInx mentioned this pull request Sep 19, 2022

[CT-1206] [Enhancement] test for saving pandas dataframe in python models dbt-labs/dbt-core#5881

Closed

3 tasks

chamini2 deleted the chamini2-patch-1 branch September 19, 2022 17:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert df to pyspark DataFrame if it is pandas before writing #301

Convert df to pyspark DataFrame if it is pandas before writing #301

chamini2 commented Sep 8, 2022 •

edited by dbeatty10

Loading

cla-bot bot commented Sep 8, 2022

jtcohen6 commented Sep 16, 2022

chamini2 commented Sep 16, 2022 •

edited

Loading

lostmygithubaccount commented Sep 16, 2022

dbeatty10 commented Sep 17, 2022

ChenyuLInx commented Sep 19, 2022

Convert df to pyspark DataFrame if it is pandas before writing #301

Convert df to pyspark DataFrame if it is pandas before writing #301

Conversation

chamini2 commented Sep 8, 2022 • edited by dbeatty10 Loading

Description

Checklist

cla-bot bot commented Sep 8, 2022

jtcohen6 commented Sep 16, 2022

chamini2 commented Sep 16, 2022 • edited Loading

lostmygithubaccount commented Sep 16, 2022

dbeatty10 commented Sep 17, 2022

ChenyuLInx commented Sep 19, 2022

chamini2 commented Sep 8, 2022 •

edited by dbeatty10

Loading

chamini2 commented Sep 16, 2022 •

edited

Loading