You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
For timestamp partitioned incremental model, in the merge sql generated by dbt, the condition on the destination table's timestamp column has an extra layer of timestamp (it looks like timestamp(timestamp_trunc(DBT_INTERNAL_DEST.partition_column day)) where the first timestamp function is unnecessary). I have tested in bigquery that this cause bigquery to scan the whole destination table and cause extra data usage. In some extreme case, this merge step would take more data than full refresh.
This does not cause error - just cost extra data quote in bigquery.
Looking at older issues like dbt-labs/dbt-core#3284 - the run target did not contain this extra layer of timestamp. But I cannot be sure when this first appeared.
Expected Behavior
I hope to be able to remove this extra timestamp layer so that bigquery can actually use the partition optimization during the merge operation.
merge into my_table_name as DBT_INTERNAL_DEST
using (
select current_timestamp() as session_start) as DBT_INTERNAL_SOURCE
on FALSE
when not matched by source
and **timestamp**(timestamp_trunc(DBT_INTERNAL_DEST.session_start, day)) in (
timestamp(current_date), timestamp(date_sub(current_date, interval 1 day))
)
then delete
when not matched then insert
(`session_start`)
values
(`session_start`)
The highlighted timestamp is the extra layer causing the cost issue.
Relevant log output
No response
Environment
- OS: MacOS 12.4
- Python: 3.9.16
- dbt: 1.4.4
Which database adapter are you using with dbt?
bigquery
Additional Context
No response
The text was updated successfully, but these errors were encountered:
github-actionsbot
changed the title
[Bug] Extra timestamp function cause extra data cost in incremental merge operation
[CT-2400] [Bug] Extra timestamp function cause extra data cost in incremental merge operation
Apr 11, 2023
github-actionsbot
changed the title
[CT-2400] [Bug] Extra timestamp function cause extra data cost in incremental merge operation
[ADAP-442] [CT-2400] [Bug] Extra timestamp function cause extra data cost in incremental merge operation
Apr 11, 2023
Is this a new bug in dbt-core?
Current Behavior
For timestamp partitioned incremental model, in the merge sql generated by dbt, the condition on the destination table's timestamp column has an extra layer of timestamp (it looks like
timestamp(timestamp_trunc(DBT_INTERNAL_DEST.partition_column day))
where the first timestamp function is unnecessary). I have tested in bigquery that this cause bigquery to scan the whole destination table and cause extra data usage. In some extreme case, this merge step would take more data than full refresh.This does not cause error - just cost extra data quote in bigquery.
Looking at older issues like dbt-labs/dbt-core#3284 - the run target did not contain this extra layer of timestamp. But I cannot be sure when this first appeared.
Expected Behavior
I hope to be able to remove this extra timestamp layer so that bigquery can actually use the partition optimization during the merge operation.
Steps To Reproduce
For testing purpose, I created this model copying the exact static partition config from v1.4 https://docs.getdbt.com/reference/resource-configs/bigquery-configs#the-insert_overwrite-strategy
The dbt run target is
Relevant log output
No response
Environment
Which database adapter are you using with dbt?
bigquery
Additional Context
No response
The text was updated successfully, but these errors were encountered: