MySQLtoGCSOperator unable to save parquet files with date/datetime fields #17538
Labels
area:providers
kind:bug
This is a clearly a bug
pending-response
provider:google
Google (including GCP) related issues
stale
Stale PRs per the .github/workflows/stale.yml policy file
Apache Airflow version: 2.1.2
Apache Airflow Provider versions
apache-airflow-providers-celery==2.0.0
apache-airflow-providers-ftp==2.0.0
apache-airflow-providers-google==4.0.0
apache-airflow-providers-imap==2.0.0
apache-airflow-providers-mysql==2.0.0
apache-airflow-providers-postgres==2.0.0
apache-airflow-providers-sqlite==2.0.0
What happened:
When trying to export a parquet file to GCS from MySQL containing dates / datetimes the following error occurs:
Which is a result of the following code casting the data to string before storing the row:
This is fine for string based formats such as CSV/JSON but pyarrow should receive the unconverted datetime object to be able to store the row correctly
I think the easiest fix is just to not convert the row before passing it to pyarrow, like this:
Although I'm not sure if it will cause any regression errors
What you expected to happen:
The parquet file should be written to disk and uploaded to GCS
How to reproduce it:
The following code is able to reproduce it 100% of the times, it's a simplified version of the code on the actual Operator:
This fails for the exact same reason the Operator fails:
And returns the following error:
While this works:
The text was updated successfully, but these errors were encountered: