-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert df to pyspark DataFrame if it is pandas before writing #301
Conversation
Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA. In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, don't hesitate to ping @drewbanin. CLA has not been signed by users: @chamini2 |
Original community Slack thread @lostmygithubaccount I believe this change would resolve the bug that @b-per ran into during yesterday's hackathon, by checking to see if the user is returning a Pandas dataframe and converting it back to a PySpark dataframe before writing it back to the database. We should make the same change over in |
I see someone did! Let me know if you need anything else for the PR to be good to go! |
@chamini2 nothing on your end, appreciate the contribution! we're doing some manual testing before final review/merge |
I manually verified that the following didn't work before. Confirmed that it works using @chamini2's fix 👍 import pandas as pd
def model(dbt, session):
dbt.config(
materialized="table",
packages=["pandas"]
)
df = pd.DataFrame(
{'City': ['Buenos Aires', 'Brasilia', 'Santiago', 'Bogota', 'Caracas'],
'Country': ['Argentina', 'Brazil', 'Chile', 'Colombia', 'Venezuela'],
'Latitude': [-34.58, -15.78, -33.45, 4.60, 10.48],
'Longitude': [-58.66, -47.91, -70.66, -74.08, -66.86]}
)
return df |
@chamini2 Thanks for contributing this!!! @dbeatty10 thanks for confirming it! I will add this as a basic tests in core! |
resolves #312
Description
This adds a check for the
df
returned from themodel
function to convert into pyspark DataFrame if it's pandas.Checklist
changie new
to create a changelog entry