Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-499] [Bug] UDF don't work with time_ingestion_partitioning #684

Closed
2 tasks done
Kayrnt opened this issue Apr 27, 2023 · 2 comments · Fixed by #683
Closed
2 tasks done

[ADAP-499] [Bug] UDF don't work with time_ingestion_partitioning #684

Kayrnt opened this issue Apr 27, 2023 · 2 comments · Fixed by #683
Labels
feature:partitioning Related to creating, replacing, or pruning partitions to avoid full table scans type:bug Something isn't working

Comments

@Kayrnt
Copy link
Contributor

Kayrnt commented Apr 27, 2023

Is this a new bug in dbt-bigquery?

  • I believe this is a new bug in dbt-bigquery
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Any model using time_ingestion_partitioning and an UDF is failing

Expected Behavior

The model should work as intended like without time_ingestion_partitioning

Steps To Reproduce

Using dbt 1.4+ (for the time ingestion support)

{{
    config(
        materialized="incremental",
        incremental_strategy='insert_overwrite',
        cluster_by="id",
        partition_by={
            "field": "date_time",
            "data_type": "datetime",
            "time_ingestion_partitioning": true
        },
        require_partition_filter=true
    )
}}

{%- call set_sql_header(config) %}
 CREATE TEMP FUNCTION asDateTime(date STRING) AS (
   cast(date as datetime)
 );
{%- endcall %}

with data as (

    {% if not is_incremental() %}

        select 1 as id, asDateTime('2020-01-01') as date_time union all
        select 2 as id, asDateTime('2020-01-01') as date_time union all
        select 3 as id, asDateTime('2020-01-01') as date_time union all
        select 4 as id, asDateTime('2020-01-01') as date_time

    {% else %}

        -- we want to overwrite the 4 records in the 2020-01-01 partition
        -- with the 2 records below, but add two more in the 2020-01-02 partition
        select 10 as id, asDateTime('2020-01-01') as date_time union all
        select 20 as id, asDateTime('2020-01-01') as date_time union all
        select 30 as id, cast('2020-01-02' as datetime) as date_time union all
        select 40 as id, cast('2020-01-02' as datetime) as date_time

    {% endif %}

)

select * from data

{% if is_incremental() %}
where date_time > '2020-01-01'
{% endif %}

Relevant log output

No response

Environment

- OS: MacOs
- Python: 3.11.3
- dbt-core: 1.5.0-b5
- dbt-bigquery: 1.5.0-b5

Additional Context

Current workaround (same as the fix)

Add a macro override in your dbt project

{#
    Temporary fix for UDF usage in dbt with time ingestion tables
#}
{% macro get_columns_with_types_in_query_sql(select_sql) %}
  {% set sql %}
    {%- set sql_header = config.get('sql_header', none) -%}
    {{ sql_header if sql_header is not none }}
    select * from (
      {{ select_sql }}
    ) as __dbt_sbq
    where false
    limit 0
  {% endset %}
  {{ return(adapter.get_columns_in_select_sql(sql)) }}
{% endmacro %}

{% macro bq_insert_into_ingestion_time_partitioned_table_sql(target_relation, sql) -%}
  {%- set sql_header = config.get('sql_header', none) -%}
  {{ sql_header if sql_header is not none }}
  {%- set raw_partition_by = config.get('partition_by', none) -%}
  {%- set partition_by = adapter.parse_partition_by(raw_partition_by) -%}
  {% set dest_columns = adapter.get_columns_in_relation(target_relation) %}
  {%- set dest_columns_csv = get_quoted_csv(dest_columns | map(attribute="name")) -%}

  insert into {{ target_relation }} ({{ partition_by.insertable_time_partitioning_field() }}, {{ dest_columns_csv }})
    {{ wrap_with_time_ingestion_partitioning_sql(partition_by, sql, False) }}

{%- endmacro -%}
@Kayrnt Kayrnt added type:bug Something isn't working triage:product labels Apr 27, 2023
@github-actions github-actions bot changed the title [Bug] UDF don't work with time_ingestion_partitioning [ADAP-499] [Bug] UDF don't work with time_ingestion_partitioning Apr 27, 2023
@dbeatty10
Copy link
Contributor

Thanks for the report, the workaround, and opening a PR @Kayrnt ! 🥇 🥈 🥉

@github-christophe-oudar
Copy link
Contributor

It just requires some Github flag to test the ITs, I updated a test (and it worked locally 🤞 )
Invoking the great @VersusFacit ✨ , could you add the flags? Thanks!

@dbeatty10 dbeatty10 added the feature:partitioning Related to creating, replacing, or pruning partitions to avoid full table scans label May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature:partitioning Related to creating, replacing, or pruning partitions to avoid full table scans type:bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants