Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize deduplicate() implementations #549

Closed

Conversation

judahrand
Copy link
Contributor

@judahrand judahrand commented Apr 14, 2022

This is a:

  • bug fix PR with no breaking changes — please ensure the base branch is main
  • new functionality — please ensure the base branch is the latest dev/ branch
  • a breaking change — please ensure the base branch is the latest dev/ branch

Description & motivation

  • Improve Deduplicate Macro to use QUALIFY #543 made me aware of a way to optimize the Snowflake implementation of this macro. This also made me investigate if there is a better way of doing this in Postgres which it turns out there is!

Checklist

  • I have verified that these changes work locally on the following warehouses (Note: it's okay if you do not have access to all warehouses, this helps us understand what has been covered)
    • BigQuery
    • Postgres
    • Redshift
    • Snowflake
  • I followed guidelines to ensure that my changes will work on "non-core" adapters by:
    • dispatching any new macro(s) so non-core adapters can also use them (e.g. the star() source)
    • using the limit_zero() macro in place of the literal string: limit 0
    • using dbt_utils.type_* macros instead of explicit datatypes (e.g. dbt_utils.type_timestamp() instead of TIMESTAMP
  • I have updated the README.md (if applicable)
  • I have added tests & descriptions to my models (and macros if applicable)
  • I have added an entry to CHANGELOG.md

@judahrand judahrand changed the base branch from next/minor to next/patch April 14, 2022 11:26
@judahrand judahrand changed the title Refactor arguments to deduplicate() Optimize deduplicate() implementations Apr 14, 2022
@judahrand judahrand mentioned this pull request Apr 14, 2022
15 tasks
@judahrand judahrand force-pushed the feature/dedupe-non-breaking branch 2 times, most recently from b23bba4 to f0a556d Compare April 14, 2022 11:32
@judahrand judahrand force-pushed the feature/dedupe-non-breaking branch from f0a556d to 8ec7158 Compare April 14, 2022 11:41
qualify
row_number() over (
partition by {{ group_by }}
{% if order_by is not none -%}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the desire to not make this update a breaking change by requiring a parameter that was previously optional, but if order_by is not specified here, the query will fail on Snowflake.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#548 deals with that half

@judahrand judahrand closed this May 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants