You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a table partitioned by day on a datetime column. When running an incremental run BigQuery performs a full table scan instead of an optimized partition scan.
This is likely due to a BigQuery bug where the function datetime_trunc ignores partitions.
Example:
-- `events` is partitioned by day on the datetime column `created_at`select*from events where datetime_trunc(created_at, DAY) in (current_date) -- 650GBselect*from events where timestamp_trunc(created_at, DAY) in (current_date) -- 100MBselect*from events wheredate(created_at) in (current_date) -- 100MB
Steps to reproduce
The events table is partitioned by day on a datetime column with the following config:
Running an incremental sync will use datetime_trunc resulting in a full table scan. Here is an extract from the run code:
merge into `zipline-datawarehouse`.`analytics`.`events`as DBT_INTERNAL_DEST
using (
--- SQL MODEL CODE
) as DBT_INTERNAL_SOURCE
on FALSE
when not matched by source
and datetime_trunc(DBT_INTERNAL_DEST.created_at, day) in (
current_date, date_sub(current_date, interval 1 day)
)
then delete
when not matched then insert
(COLUMNS)
values
(COLUMNS)
Expected results
Optimized partition scan with 100mb usage.
Actual results
Full table scan with 650gb usage.
System information
Which database are you using dbt with?
bigquery
The output of dbt --version:
dbt --version
Core:
- installed: 1.5.0
- latest: 1.5.0 - Up to date!
Plugins:
- bigquery: 1.5.0 - Up to date!
- postgres: 1.5.0 - Up to date!
Additional context
We can work around this issue by setting the partition_by.data_type to timestamp. dbt will happily use timestamp_trunc and BigQuery will only look up the necessary partitions. (Note that this was working in v1.2, it stopped working in v1.4, and it's now working again in v 1.5).
Describe the bug
I have a table partitioned by day on a datetime column. When running an incremental run BigQuery performs a full table scan instead of an optimized partition scan.
This is likely due to a BigQuery bug where the function
datetime_trunc
ignores partitions.Example:
Steps to reproduce
The
events
table is partitioned by day on a datetime column with the following config:Running an incremental sync will use
datetime_trunc
resulting in a full table scan. Here is an extract from therun
code:Expected results
Optimized partition scan with 100mb usage.
Actual results
Full table scan with 650gb usage.
System information
Which database are you using dbt with?
The output of
dbt --version
:Additional context
We can work around this issue by setting the
partition_by.data_type
totimestamp
.dbt
will happily usetimestamp_trunc
and BigQuery will only look up the necessary partitions. (Note that this was working in v1.2, it stopped working in v1.4, and it's now working again in v 1.5).Related issues
dbt-labs/dbt-core#3386 (comment)
#393
Are you interested in contributing the fix?
I'm using the work around at the moment.
The text was updated successfully, but these errors were encountered: