-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ADAP-551] [Bug] Datetime incremental tables scanning more of table than expected #717
Comments
Thanks for reaching out @pcreux ! I'm not a seeing a difference between the config you reported in the "Steps To Reproduce" vs. the workout in the "Additional Context" section. Is it as simple as this, or is there more involved? - "data_type": "datetime"
+ "data_type": "timestamp" Could you take a peek and update as needed? |
Hey @dbeatty10 ! You are right, it's indeed |
@pcreux do you have a link to this bug report on the Google side of things? It will be good to hear what Google says about it. If their guidance is to just use dbt-bigquery/dbt/adapters/bigquery/impl.py Lines 123 to 126 in 495566a
Probably something like: if self.data_type_should_be_truncated() and self.data_type == "datetime":
return f"timestamp_trunc({column}, {self.granularity})"
elif self.data_type_should_be_truncated():
return f"{self.data_type}_trunc({column}, {self.granularity})"
else:
return column |
I didn't find one on https://issuetracker.google.com/savedsearches/559654?pli=1&q=(componentid:187149%2B%20status:open)%20OR%20(componentid:187065%2B%20customfield82940:%22BigQuery%22)%20datetime_trunc but I might not be looking at the right place. I saw mentions of this potential bug in dbt-labs/dbt-utils#393 (comment). |
I had a few changes to solve issues with trunc functions in 7c21644 |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
Is this a new bug in dbt-bigquery?
Current Behavior
I have a table partitioned by day on a datetime column. When running an incremental run BigQuery performs a full table scan instead of an optimized partition scan.
This is likely due to a BigQuery bug where the function
datetime_trunc
ignores partitions.Example:
Expected Behavior
Optimized partition scan with 100mb usage but I'm seeing a full table scan with 650gb usage.
Steps To Reproduce
The
events
table is partitioned by day on a datetime column with the following config:Running an incremental sync will use
datetime_trunc
resulting in a full table scan. Here is an extract from therun
code:Relevant log output
No response
Environment
Additional Context
We can work around this issue by setting the
partition_by.data_type
totimestamp
.dbt
will happily usetimestamp_trunc
and BigQuery will only look up the necessary partitions. (Note that this was working in v1.2, it stopped working in v1.4, and it's now working again in v 1.5).A similar issue with
date
partitioning but it doesn't seem to be the same cause: dbt-labs/dbt-adapters#592A mention of the bug where
datetime_trunc
makes BigQuery perform full table scans: dbt-labs/dbt-utils#393The text was updated successfully, but these errors were encountered: