Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let there be all time #99

Merged
merged 7 commits into from
Sep 9, 2022
Merged

Let there be all time #99

merged 7 commits into from
Sep 9, 2022

Conversation

callum-mcdata
Copy link
Contributor

@callum-mcdata callum-mcdata commented Sep 6, 2022

What is this PR?

This is a:

  • documentation update
  • bug fix with no breaking changes
  • new functionality
  • a breaking change

All pull requests from community contributors should target the main branch (default).

Description & motivation

Right now, the metrics package will always return a date spined result set that shows the metric over a length of time. But what if the user just wants a single number? For that, we've added all_time as an acceptable input for all metric types. This produces a dataset that is not date spined but instead corresponds to the lifetime of value included in the macro call.

IE - if a user designates start_date and end_date and then uses all_time as the grain, it will return the value of the metric in that time period.

For cleanliness and ease of understanding, we've included metric_start_date and metric_end_date in the returned dataset.

Example output - no dimensions
base_average_metric metric_start_date metric_end_date
0.9 2022-01-06 2022-02-15
Example output - with dimensions
had_discount base_average_metric metric_start_date metric_end_date
true .66667 2022-01-06 2022-02-15
false 1 2022-01-08 2022-02-13

Important Changes

  • Added secondary_calculations to validate_grain. This checks to make sure that there are no secondary_calculations provided if the grain is all_time, as those are not supported.
  • Add a conditional check in build_metric_sql to not run gen_dimensions_cte or gen_spine_time_cte if the grain is all_time
  • Add a conditional check in gen_aggregate_cte to not add any date fields (such as date_day) to the grouping if the grain is all_time
  • Add a conditional check in gen_aggregate_cte to include metric_start_date and metric_end_date if the grain is all_time
  • Changed gen_group_by to support the date field now being an optional input.
  • Altering all downstream gen macros to not include the date field.
  • Changing the logic used to generate the using statement in gen_joined_metrics_cte now that there are conditions in which there would be no fields provided. For those ones, we switch from using a left outer join to a cross join.
  • Casting the input in gen_metric_cte to parent_metric_cte to make it easier to reference fields

Checklist

  • I have verified that these changes work locally on the following warehouses (Note: it's okay if you do not have access to all warehouses, this helps us understand what has been covered)
    • BigQuery
    • Postgres
    • Redshift
    • Snowflake
  • I have updated the README.md (if applicable)
  • I have added tests & descriptions to my models (and macros if applicable)
  • I have added an entry to CHANGELOG.md

Tenets to keep in mind

  • A metric value should be consistent everywhere that it is referenced
  • We prefer generalized metrics with many dimensions over specific metrics with few dimensions
  • It should be easier to use dbt’s metrics than it is to avoid them
  • Organization and discoverability are as important as precision
  • One-off models built to power metrics are an anti-pattern

@cla-bot cla-bot bot added the cla:yes The CLA has been signed label Sep 6, 2022
@callum-mcdata callum-mcdata marked this pull request as ready for review September 6, 2022 14:45
@callum-mcdata callum-mcdata marked this pull request as draft September 6, 2022 14:45
@dave-connors-3
Copy link
Contributor

This looks OK to me so far -- only real comment is that for an all_time grain, I think I would expect to just skip the calendar CTE all together:

select * from 
-- depends on: development.dbt_dconnors.dbt_metrics_default_calendar
    

(
-- this CTE isn't used at all
with calendar as (

    select 
        * 
    from development.dbt_dconnors.dbt_metrics_default_calendar
    where date_day >= cast('2022-05-01' as date)
            and date_day <= cast('2022-06-01' as date) 
)



, total_value__aggregate as (
    select
        sum(property_to_aggregate) as total_value,
        min(metric_date_day) as metric_start_date,
        max(metric_date_day) as metric_end_date
        

    from (

...

I imagine this was to simplify the comma situation for where the metric aggregate CTE shows up and given that it's such a simple small table i'm guessing there's not really any performance concern at all

@callum-mcdata
Copy link
Contributor Author

I imagine this was to simplify the comma situation for where the metric aggregate CTE shows up and given that it's such a simple small table i'm guessing there's not really any performance concern at all

100% this is the main reason - adding conditional logic to handle the first cte with all the options would be a massive pain. But additionally, we also want to support the use of calendar_dimensions provided as a dimension in the macro call. Such as is_weekend. To get those fields we need to have the calendar table because we join in gen_base_query.

@callum-mcdata callum-mcdata marked this pull request as ready for review September 6, 2022 16:03
@callum-mcdata
Copy link
Contributor Author

@joellabes would love to get your 2c here. Generally this keeps to the same framework but it definitely adds some more conditional logic into the macros to account for different behavior. That being said, I think it does so in a way that keeps the macros from trying to do too many things (such as two different paths). By and large it just alters the columns/joins

@callum-mcdata callum-mcdata merged commit b3996e5 into main Sep 9, 2022
@callum-mcdata callum-mcdata deleted the let_there_be_all_time branch September 9, 2022 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:yes The CLA has been signed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants