Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

require_partition_filter=true with the insert_overwrite strategy doesn't work #64

Closed
yu-iskw opened this issue Nov 11, 2021 · 2 comments · Fixed by #65
Closed

require_partition_filter=true with the insert_overwrite strategy doesn't work #64

yu-iskw opened this issue Nov 11, 2021 · 2 comments · Fixed by #65
Labels
type:bug Something isn't working

Comments

@yu-iskw
Copy link
Contributor

yu-iskw commented Nov 11, 2021

Describe the bug

When running an incremental model with the insert_overwrite strategy and the require_partition_filter=true option, it seems that the require_partition_filter option is applied to the temporary table.

  Query error: Cannot query over table 'sandbox-project.logs.event__dbt_tmp' without a filter over column(s) 'event_time' that can be used for partition elimination at [75:46]

Steps To Reproduce

  1. Create an incremental model with insert_overwrite
# test_insert_overwrite.sql
{{
  config(
    materialized="incremental",
    incremental_strategy='insert_overwrite',
    require_partition_filter=true,
    alias="test_insert_overwrite",
    partition_by={
      "field": "event_time",
      "data_type": "timestamp",
      "granularity": "day",
    },
  )
}}

SELECT
  CURRENT_TIMESTAMP() AS event_time
  1. Execute dbt run -s test_insert_overwrite twice
Completed with 1 error and 0 warnings:

Database Error in model test_insert_overwrite (models/test_insert_overwrite.sql)
  Query error: Cannot query over table 'sandbox-project.jaffle_shop.test_insert_overwrite__dbt_tmp' without a filter over column(s) 'event_time' that can be used for partition elimination at [35:46]
  compiled SQL at target/run/jaffle_shop/models/test_insert_overwrite.sql

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots and log output

The query below was generated when the second time execution of the incremental model. As we can see, the query to create the temporary table test_insert_overwrite__dbt_tmp includes require_partition_filter=true, because the macro to generate is create_table_as. The create_table_as macro automatically applies all the same options to even the temporary table.

      -- generated script to merge partitions into `sandbox-project`.`jaffle_shop`.`test_insert_overwrite`
      declare dbt_partitions_for_replacement array<timestamp>;
      declare _dbt_max_partition timestamp default (
          select max(event_time) from `sandbox-project`.`jaffle_shop`.`test_insert_overwrite`
          where event_time is not null
      );

      
      
        -- 1. create a temp table
        

  create or replace table `sandbox-project`.`jaffle_shop`.`test_insert_overwrite__dbt_tmp`
  partition by timestamp_trunc(event_time, day)
  
  OPTIONS(
      expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 12 hour),
    
      require_partition_filter=True
    )
  as (
    

SELECT
  CURRENT_TIMESTAMP() AS event_time
  );
      

      -- 2. define partitions to update
      set (dbt_partitions_for_replacement) = (
          select as struct
              array_agg(distinct timestamp_trunc(event_time, day))
          from `sandbox-project`.`jaffle_shop`.`test_insert_overwrite__dbt_tmp`
      );

      
      -- 3. run the merge statement
      

    merge into `sandbox-project`.`jaffle_shop`.`test_insert_overwrite` as DBT_INTERNAL_DEST
        using (
        select * from `sandbox-project`.`jaffle_shop`.`test_insert_overwrite__dbt_tmp`
      ) as DBT_INTERNAL_SOURCE
        on FALSE

    when not matched by source
         and timestamp_trunc(DBT_INTERNAL_DEST.event_time, day) in unnest(dbt_partitions_for_replacement) 
        then delete

    when not matched then insert
        (`event_time`)
    values
        (`event_time`)

;

      -- 4. clean up the temp table
      drop table if exists `sandbox-project`.`jaffle_shop`.`test_insert_overwrite__dbt_tmp`

System information

The output of dbt --version:

installed version: 0.21.0
   latest version: 0.21.0

Up to date!

Plugins:
  - bigquery: 0.21.0
  - snowflake: 0.21.0
  - redshift: 0.21.0
  - postgres: 0.21.0

The operating system you're using:

  • Mac OS 11.6 (20G165)

The output of python --version:

  • Python 3.8.8

Additional context

@yu-iskw
Copy link
Contributor Author

yu-iskw commented Nov 11, 2021

Workaround to turn on require_partition_filter

This is a workaround to enable require_partition_filter=true on an incremental model with the insert_overwrite strategy.
We can put the ALTER TABLE statement to set require_partition_filter to true to post_hook.

{{
  config(
    materialized="incremental",
    incremental_strategy='insert_overwrite',
    alias="test_insert_overwrite",
    post_hook="ALTER TABLE IF EXISTS {{ this}} SET OPTIONS ( require_partition_filter = true)",
    partition_by={
      "field": "event_time",
      "data_type": "timestamp",
      "granularity": "day",
    },
  )
}}

SELECT
  CURRENT_TIMESTAMP() AS event_time

@jtcohen6
Copy link
Contributor

Thanks @yu-iskw! This issue is actually a duplicate of one we raised a long time ago: dbt-labs/dbt-adapters#604. We almost had a fix for it way back in May (dbt-labs/dbt-core#3386). We got hung up on a failing test that actually reflects an undocumented BigQuery bug: datetime-type partitions cannot be pruned using datetime_trunc! (dbt-labs/dbt-core#3386 (comment))

I agree that the solution is to:

  • insert_overwrite: avoid applying the require_partition_filter config to temporary tables
  • merge: add a partition filter to the merge statement's ON clause, if require_partition_filter config is enabled

I appreciate you taking the initiative to open a new PR. I'll take a look over there and leave comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants