Optimize `deduplicate()` implementations #549

judahrand · 2022-04-14T11:26:46Z

This is a:

bug fix PR with no breaking changes — please ensure the base branch is main
new functionality — please ensure the base branch is the latest dev/ branch
a breaking change — please ensure the base branch is the latest dev/ branch

Description & motivation

Improve Deduplicate Macro to use QUALIFY #543 made me aware of a way to optimize the Snowflake implementation of this macro. This also made me investigate if there is a better way of doing this in Postgres which it turns out there is!

Checklist

codigo-ergo-sum · 2022-04-14T15:38:30Z

macros/sql/deduplicate.sql

+    qualify
+        row_number() over (
+            partition by {{ group_by }}
+            {% if order_by is not none -%}


I understand the desire to not make this update a breaking change by requiring a parameter that was previously optional, but if order_by is not specified here, the query will fail on Snowflake.

#548 deals with that half

judahrand added 4 commits April 14, 2022 10:02

Add Postgres specific deduplicate implementation

20408a5

Add Snowflake specific deduplicate implementation

3d3eb6e

Remove subquery to improve readability

5829c7e

Remove use of dbt_utils.star in deduplicate

ed9c966

judahrand changed the base branch from next/minor to next/patch April 14, 2022 11:26

judahrand changed the title ~~Refactor arguments to deduplicate()~~ Optimize deduplicate() implementations Apr 14, 2022

judahrand mentioned this pull request Apr 14, 2022

Refactor deduplicate() arguments #548

Merged

15 tasks

judahrand force-pushed the feature/dedupe-non-breaking branch 2 times, most recently from b23bba4 to f0a556d Compare April 14, 2022 11:32

Make sure Redshift uses default implementation

04ddbd6

judahrand force-pushed the feature/dedupe-non-breaking branch from f0a556d to 8ec7158 Compare April 14, 2022 11:41

judahrand added 3 commits April 14, 2022 12:44

Improve deduplicate documentation

03bafa4

Use natural join to avoid having to parse expressions

71fb5ce

Update CHANGELOG.md

a576e54

judahrand force-pushed the feature/dedupe-non-breaking branch from 8ec7158 to a576e54 Compare April 14, 2022 11:44

judahrand mentioned this pull request Apr 14, 2022

Improve Deduplicate Macro to use QUALIFY #543

Closed

codigo-ergo-sum reviewed Apr 14, 2022

View reviewed changes

judahrand closed this May 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `deduplicate()` implementations #549

Optimize `deduplicate()` implementations #549

judahrand commented Apr 14, 2022 •

edited

Loading

codigo-ergo-sum Apr 14, 2022

judahrand Apr 14, 2022

Optimize deduplicate() implementations #549

Optimize deduplicate() implementations #549

Conversation

judahrand commented Apr 14, 2022 • edited Loading

Description & motivation

Checklist

codigo-ergo-sum Apr 14, 2022

Choose a reason for hiding this comment

judahrand Apr 14, 2022

Choose a reason for hiding this comment

Optimize `deduplicate()` implementations #549

Optimize `deduplicate()` implementations #549

judahrand commented Apr 14, 2022 •

edited

Loading