-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BQ partition configs support #3386
Conversation
{% set predicates = [] %} | ||
{% if is_partition_filter_required %} | ||
{%- set partition_filter -%} | ||
(DBT_INTERNAL_DEST.{{ partition_by.field }} is not null or DBT_INTERNAL_DEST.{{ partition_by.field }} is null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure if there was a better way to get the DBT_INTERNAL_DEST
alias in there than just typing it out. I tried {{ partition_by.render(alias='DBT_INTERNAL_DEST') }}
but that renders to timestamp_trunc(my_partition_col, date)
when partitioning by date on a timestamp col, and BigQuery doesn't like that as a partition filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and BigQuery doesn't like that as a partition filter
Really? It works as a partition by
expression, but not as a partition-pruning filter? That's... too bad.
In any case, the approach you've taken here seems fine by me
Okay a few notes:
|
@prratek Thanks so much for taking this on! As far as testing this functionality, the right place might just be an extension of @property
def project_config(self):
return {
'config-version': 2,
'seeds': {
'+quote_columns': False,
},
'models': {
'+require_partition_filter': True
},
} If every model / strategy / partition combo runs with that config turned on, we'll know that this change has been successful. (Conversely, without the change in this PR, some of those models should fail.) |
That makes sense! Couple of thoughts:
if config.get('require_partition_filter') and not temporary:
|
I can't see what line you linked to in item 1; I think we're on the same page, but just to be safe, I'll say what I'm thinking. The operative logic is going to be in the BigQuery plugin, rather than the default, since BigQuery implements its own if config.get('require_partition_filter') and not temporary:
opts['require_partition_filter'] = config.get(
'require_partition_filter') There's something funny about the way we've implemented temp tables on BigQuery, because (a) our implementation predated "true" temp tables in BQ, and (b) "true" temp tables are only supported in scripting-style queries, which the For the integration test, rather than turning on Last but not least, as far as that colon in Jinja: I hear you! It works either way, though. {% if True: %} select 1 as fun {% endif %}
{% if True %} select 1 as fun {% endif %} I figure some folks prefer the colon because it looks more like a python conditional: if True:
return "select 1 as fun" |
Sorry it took me a while to get back to this, but I think I'm almost there:
|
Nice work on this! Looks similar to what we've done on other projects/packages for BigQuery. In the "dynamic" version of
I think you've got it covered! |
@prratek I looked a bit more into this, and I've found something pretty wacky: wrapping a datetime-type partition column in -- models/my_model.sql
{{
config(
materialized="incremental",
incremental_strategy="insert_overwrite",
partition_by={
"field": "date_time",
"data_type": "datetime"
},
require_partition_filter = True
)
}}
select 1 as id, cast('2020-01-01' as datetime) as date_time -- excerpted from insert_overwrite script, which I copy-pasted into BigQuery console to confirm
merge into `my_model` as DBT_INTERNAL_DEST
using ( ... ) as DBT_INTERNAL_SOURCE
on FALSE
when not matched by source
and datetime_trunc(DBT_INTERNAL_DEST.date_time, day) in ('2020-01-01'')
then delete
when not matched then insert
(`id`, `date_time`)
values
(`id`, `date_time`)
This sure looks like a BigQuery bug, doesn't it? In the meantime, we could work around it by adding another filter:
This enables the query to succeed, but it will still process more bytes than it should, since |
@prratek Is this something you'd still be interested in contributing? I'd be willing to find a way to work around the failing test re: filtering with |
Closing in favor of dbt-labs/dbt-bigquery#65 |
resolves #3016
Description
#2928 added support for
require_partition_filter
andpartition_expiration_days
in the BigQuery adapter. As outlined in #3016, this PR ensure that the new config options work with all incremental strategies available for BigQuery today.Checklist
CHANGELOG.md
and added information about my change to the "dbt next" section.