Ensure mysql_to_gcs
fully compatible with MySQL and BigQuery for datetime
-related values
#15026
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Compatibility issues for
datetime
-related values:Valid data:
timedelta.total_seconds() > 86399.99999(9)
?This PR implicitly raises an
OverflowError: date value out of range
whentimedelta.total_seconds() < 0
.Supported output format:
When no
schema
provided, the current version of the operator assumes that MySQL's DATETIME and DATE values becoming BigQuery's TIMESTAMP, which doesn't accept Unix Epoch integer unless the downstreamgcs_to_bigquery
actually takes the output of_write_local_schema_file()
.Should there be a
schema
, the current version of the operator sends integers and floats to BigQuery's DATETIME/TIMESTAMP and TIME, respectively. Issues are:calendar.timegm()
will still cause troubles (unless the downstreamgcs_to_bigquery
uses a connection that switches to legacy SQL of BigQuery);As for DATE, if the
schema
is indeed sent to the downstreamgcs_to_bigquery
, then the current version will work. Otherwise, it will still encounter issues of BigQuery's integer-to-TIMESTAMP rejection and negative integers fromtimege()
for edge cases.A side note:
I suppose some usages might have been designed for BigQuery's legacy SQL dialect. However, since
gcs_to_bigquery
doesn't explicitly support it (only one line detects it forescaped_table_name
), and standard SQL dialect is the default for Python library, it might be safe to follow standard SQL dialect's requirements.^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.