Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PostgresToGCSOperator doesn't convert DATE correctly #8387

Closed
KevinKobi opened this issue Apr 15, 2020 · 8 comments
Closed

PostgresToGCSOperator doesn't convert DATE correctly #8387

KevinKobi opened this issue Apr 15, 2020 · 8 comments
Labels
contributors-workshop Issues that are good for first-time contributor's workshop good first issue kind:bug This is a clearly a bug provider:google Google (including GCP) related issues

Comments

@KevinKobi
Copy link

What happened:

Time datatypes are treated as datetime in the operator this create problems when dumping to csv (so can be loaded to BigQuery) because the data will be date type yet bigquery will expect datetime format.

How to reproduce it:
dump DATE column to csv using the operator and create a table from it in bigquery. The column will not be of DATE format.

Anything else we need to know:

probably need fix at https://github.com/apache/airflow/blob/master/airflow/providers/google/cloud/operators/postgres_to_gcs.py#L46 changing it from TIMESTAMP to DATE as:

  Date                  -> 1082
  Time                  -> 1083
  Timestamp             -> 1114
  TimestampWithTimeZone -> 1184

https://github.com/psycopg/psycopg2/blob/master/psycopg/pgtypes.h#L41

and make some adjustments in convert_type function.

@KevinKobi KevinKobi added the kind:bug This is a clearly a bug label Apr 15, 2020
@mik-laj mik-laj added the provider:google Google (including GCP) related issues label Apr 15, 2020
@mik-laj
Copy link
Member

mik-laj commented Apr 15, 2020

The proposed change looks good. I think it's worth testing this problem with system tests.

A system test is a type of test that uses an example DAG and communicates with a real service. Example DAGs for operators for GCP are available:
https://github.com/apache/airflow/tree/master/airflow/providers/google/cloud/example_dags
The system tests are available:
https://github.com/apache/airflow/tree/master/tests/providers/google/cloud/operators
Files with system test have prefix "_system.py"
Here is a good example of system tests for transfer operator:
https://github.com/apache/airflow/blob/master/tests/providers/google/cloud/operators/test_presto_to_gcs_system.py

Would you like to work on PR? Apache Airflow is an open-source project without paid technical support. Each problem is solved by community members - other Airflow users. If this issue is important to you, it is best to take action yourself.
If you are interested, I invite you to read the following guides:
Contributor Guide: https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst
Development Environment Guide: https://github.com/apache/airflow/blob/master/BREEZE.rst
Testing Guide: https://github.com/apache/airflow/blob/master/TESTING.rst

I hope this information will be helpful for you and I look forward to your PR.

@potiuk potiuk added the contributors-workshop Issues that are good for first-time contributor's workshop label Jul 6, 2021
@RobGoretsky
Copy link

I will take a look!

@vijaya-lakshmi-venkatraman

Hello,
Is this still available?

@eladkal
Copy link
Contributor

eladkal commented Feb 2, 2022

@vijaya-lakshmi-venkatraman yes - assigned to you

@RobGoretsky
Copy link

@vijaya-lakshmi-venkatraman - Thanks for taking this one over! I had started looking at it, but was unclear on proceeding given that it seems the same code in sql_to_gcs.py is used to export CSV, JSON, and Parquet formats. I wasn't sure if changing the way dates are exported for CSV would break export for those other formats.

@eladkal
Copy link
Contributor

eladkal commented Feb 8, 2022

Recently #20807 was merged.
It added a generic SqlToS3Operator and deprecated MySqlToS3Operator.

I would say that probably the best thing to do is to implement similar generic operator for Gcs
cc @mariotaddeucci

@eladkal
Copy link
Contributor

eladkal commented Mar 27, 2022

#22536 might have solved this.
@pierrejeambrun can you confirm?

@pierrejeambrun
Copy link
Member

pierrejeambrun commented Mar 27, 2022

Normally it does indeed solve this issue. In my tests I could export and then load into bigquery each of these different field types.

@eladkal eladkal closed this as completed Mar 27, 2022
@vijaya-lakshmi-venkatraman vijaya-lakshmi-venkatraman removed their assignment Sep 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributors-workshop Issues that are good for first-time contributor's workshop good first issue kind:bug This is a clearly a bug provider:google Google (including GCP) related issues
Projects
None yet
Development

No branches or pull requests

7 participants