-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dbt metrics submodule #28
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
864831a
Add dbt metrics submodule
damian3031 ceb805c
Bump Trino and SEP versions in docker-compose files
damian3031 4b16bb3
Add integration tests for dbt-metrics
damian3031 57ffa04
Add trino__fact_orders_source seed, adjust metrics and models to it.
damian3031 87ae9d7
Add trino shim for dbt-metrics gen_calendar_join macro
RobbertDM 1efeddd
Update docker testing images and scripts
hovaesco File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Submodule dbt_metrics
added at
5897ce
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
#!/bin/bash | ||
|
||
# move to wherever we are so docker things work | ||
cd "$(dirname "${BASH_SOURCE[0]}")" | ||
|
||
set -exo pipefail | ||
docker run \ | ||
--network="dbt-net" \ | ||
-v $PWD/dbt:/root/.dbt \ | ||
dbt-trino-utils \ | ||
"cd /opt/dbt_trino_utils/integration_tests/dbt_metrics \ | ||
&& dbt deps \ | ||
&& dbt seed \ | ||
&& dbt run \ | ||
&& dbt test" |
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
name: "trino_utils_dbt_metrics_integration_tests" | ||
version: "1.0.0" | ||
config-version: 2 | ||
|
||
profile: "integration_tests" | ||
|
||
model-paths: ["models"] | ||
analysis-paths: ["analyses"] | ||
test-paths: ["tests"] | ||
seed-paths: ["seeds"] | ||
macro-paths: ["macros"] | ||
snapshot-paths: ["snapshots"] | ||
|
||
target-path: "target" | ||
clean-targets: | ||
- "target" | ||
- "dbt_packages" | ||
- "logs" | ||
|
||
dispatch: | ||
- macro_namespace: metrics | ||
search_order: ['trino_utils_dbt_metrics_integration_tests', 'trino_utils', 'metrics'] | ||
|
||
models: | ||
|
||
trino_utils_dbt_metrics_integration_tests: | ||
metric_testing_models: | ||
+materialized: table | ||
|
||
dbt_metrics_integration_tests: | ||
|
||
# Overridden by trino__custom_calendar | ||
custom_calendar: | ||
+enabled: false | ||
|
||
metric_testing_models: | ||
+materialized: table | ||
|
||
# no median function in Trino | ||
base_median_metric: | ||
+enabled: false | ||
base_median_metric_no_time_grain: | ||
+enabled: false | ||
|
||
# no 'is true' predicate in trino | ||
base_count_distinct_metric: | ||
+enabled: false | ||
derived_metric: | ||
+enabled: false | ||
|
||
# Overridden by trino__develop_metric | ||
develop_metric: | ||
+enabled: false | ||
# Overridden by trino__simple_develop_metric | ||
simple_develop_metric: | ||
+enabled: false | ||
|
||
# TODO: Fix and enable | ||
hovaesco marked this conversation as resolved.
Show resolved
Hide resolved
|
||
base_count_metric__secondary_calculations: | ||
+enabled: false | ||
base_sum_metric__prior: | ||
+enabled: false | ||
multiple_metrics__period_over_period: | ||
+enabled: false | ||
multiple_metrics__period_to_date: | ||
+enabled: false | ||
multiple_metrics__rolling: | ||
+enabled: false | ||
# issue with base_sum_metric.yml | ||
# with config: restrict_no_time_grain | ||
base_sum_metric: | ||
+enabled: false | ||
ratio_metric: | ||
+enabled: false | ||
|
||
materialized_models: | ||
+materialized: table | ||
|
||
vars: | ||
dbt_metrics_calendar_model: trino__custom_calendar | ||
custom_calendar_dimension_list: ["is_weekend"] |
4 changes: 4 additions & 0 deletions
4
integration_tests/dbt_metrics/models/materialized_models/trino__fact_orders.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
select | ||
* | ||
,round(order_total - (order_total/2)) as discount_total | ||
from {{ref('trino__fact_orders_source')}} |
12 changes: 12 additions & 0 deletions
12
integration_tests/dbt_metrics/models/metric_definitions/base_average_metric.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
version: 2 | ||
metrics: | ||
- name: base_average_metric | ||
model: ref('trino__fact_orders') | ||
label: Total Discount ($) | ||
timestamp: order_date | ||
time_grains: [day, week, month, test] | ||
calculation_method: average | ||
expression: discount_total | ||
dimensions: | ||
- had_discount | ||
- order_country |
23 changes: 23 additions & 0 deletions
23
integration_tests/dbt_metrics/models/metric_definitions/base_count_distinct_metric.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
version: 2 | ||
|
||
metrics: | ||
- name: base_count_distinct_metric | ||
model: ref('trino__fact_orders') | ||
label: Count Distinct | ||
timestamp: order_date | ||
time_grains: [day, week, month] | ||
calculation_method: count_distinct | ||
expression: customer_id | ||
dimensions: | ||
- had_discount | ||
- order_country | ||
window: | ||
count: 14 | ||
period: month | ||
filters: | ||
- field: had_discount | ||
operator: 'is' | ||
value: 'true' | ||
- field: order_country | ||
operator: '=' | ||
value: "'CA'" |
12 changes: 12 additions & 0 deletions
12
integration_tests/dbt_metrics/models/metric_definitions/base_count_metric.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
version: 2 | ||
metrics: | ||
- name: base_count_metric | ||
model: ref('trino__fact_orders') | ||
label: Total Discount ($) | ||
timestamp: order_date | ||
time_grains: [day, week, month] | ||
calculation_method: count | ||
expression: order_total | ||
dimensions: | ||
- had_discount | ||
- order_country |
12 changes: 12 additions & 0 deletions
12
integration_tests/dbt_metrics/models/metric_definitions/base_median_metric.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
version: 2 | ||
metrics: | ||
- name: base_median_metric | ||
model: ref('trino__fact_orders') | ||
label: Total Discount ($) | ||
timestamp: order_date | ||
time_grains: [day, week, month, all_time] | ||
calculation_method: median | ||
expression: discount_total | ||
dimensions: | ||
- had_discount | ||
- order_country |
59 changes: 59 additions & 0 deletions
59
integration_tests/dbt_metrics/models/metric_definitions/base_sum_metric.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
version: 2 | ||
metrics: | ||
- name: base_sum_metric | ||
model: ref('trino__fact_orders') | ||
label: Order Total ($) | ||
timestamp: order_date | ||
time_grains: [day, week, month] | ||
calculation_method: sum | ||
expression: order_total | ||
dimensions: | ||
- had_discount | ||
- order_country | ||
config: | ||
restrict_no_time_grain: True | ||
|
||
- name: base_sum_metric_duplicate | ||
model: ref('fact_orders_duplicate') | ||
label: Order Total ($) | ||
timestamp: order_date | ||
time_grains: [day, week, month] | ||
calculation_method: sum | ||
expression: order_total | ||
dimensions: | ||
- had_discount | ||
- order_country | ||
|
||
- name: base_sum_metric__14_day_window | ||
model: ref('trino__fact_orders') | ||
label: Order Total ($) | ||
timestamp: order_date | ||
time_grains: [day, week, month] | ||
calculation_method: sum | ||
expression: order_total | ||
window: | ||
count: 14 | ||
period: month | ||
dimensions: | ||
- had_discount | ||
- order_country | ||
|
||
- name: base_test_metric | ||
model: ref('fact_orders') | ||
label: Order Total ($) | ||
timestamp: order_date | ||
time_grains: [day, week, month] | ||
calculation_method: sum | ||
expression: order_total | ||
dimensions: | ||
- had_discount | ||
- order_country | ||
|
||
- name: base_sum_metric__no_timestamp | ||
model: ref('fact_orders') | ||
label: Order Total ($) | ||
calculation_method: sum | ||
expression: order_total | ||
dimensions: | ||
- had_discount | ||
- order_country |
11 changes: 11 additions & 0 deletions
11
integration_tests/dbt_metrics/models/metric_definitions/case_when_metric.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
version: 2 | ||
metrics: | ||
- name: case_when_metric | ||
model: ref('trino__fact_orders') | ||
label: Order Total ($) | ||
timestamp: order_date | ||
time_grains: [day, week, month] | ||
calculation_method: sum | ||
expression: case when had_discount = true then 1 else 0 end | ||
dimensions: | ||
- order_country |
45 changes: 45 additions & 0 deletions
45
integration_tests/dbt_metrics/models/metric_testing_models/trino__develop_metric.sql
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
{% set my_metric_yml -%} | ||
{% raw %} | ||
|
||
metrics: | ||
- name: develop_metric | ||
model: ref('trino__fact_orders') | ||
label: Total Discount ($) | ||
timestamp: order_date | ||
time_grains: [day, week, month] | ||
calculation_method: average | ||
expression: discount_total | ||
dimensions: | ||
- had_discount | ||
- order_country | ||
|
||
- name: derived_metric | ||
label: Total Discount ($) | ||
timestamp: order_date | ||
time_grains: [day, week, month] | ||
calculation_method: derived | ||
expression: "{{ metric('develop_metric') }} - 1 " | ||
dimensions: | ||
- had_discount | ||
- order_country | ||
|
||
- name: some_other_metric_not_using | ||
label: Total Discount ($) | ||
timestamp: order_date | ||
time_grains: [day, week, month] | ||
calculation_method: derived | ||
expression: "{{ metric('derived_metric') }} - 1 " | ||
dimensions: | ||
- had_discount | ||
- order_country | ||
|
||
{% endraw %} | ||
{%- endset %} | ||
|
||
select * | ||
from {{ metrics.develop( | ||
develop_yml=my_metric_yml, | ||
metric_list=['derived_metric'], | ||
grain='month' | ||
) | ||
}} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could use other function to calculate
median
WDYT?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use approx_percentile function. But in order to calculate median for odd and even number of values in the column, we need to add workaround, as using just
approx_percentile(col_name, 0.50)
for even number of values in the column is not working (workaround source):Create sample table with even nr of rows:
Calculate median:
So, it is a bit of code to calculate median in Trino. For sure the best way would be to implement
MEDIAN
in Trino. There is issue for that, but it is very stale and gained no attention trinodb/trino#6309WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just create a follow-up issue and include this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything wrong with getting median by value_at_quantile(tdigest_agg(col_name), 0.5)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @damian3031
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SeungHuLee your solution has the same downside as using approx_percentile alone - it doesn't work if number of rows is even, and two middle values are not the same numbers. In this case, function should return arithmetic mean of these middle values. In your solution, it returns the bigger value.
So, running this query on table which I defined in my previous comment:
returns
2.0
, but correct result is1.5
.That's the reason why above mentioned workaround is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@damian3031 Thanks for detailed answer! I'm not really a statistics expert, so I was just using tdigest methods according to following:
trinodb/trino#5158
trinodb/trino#4975
https://arxiv.org/pdf/1902.04023.pdf
By the way, when I use the workaround suggested above, for me the median result changes slightly everytime I execute the same query.
Is there more stable or accurate solution suggested?