forked from malloydata/learn
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmalloy_cardio_answers.malloynb
164 lines (147 loc) · 6.24 KB
/
malloy_cardio_answers.malloynb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
>>>markdown
# Malloy Cardio
This set of exercises is sure to get your heartrate up and your brain firing on all cylinders.
Fire up your favorite workout playlist and dive in 👇
>>>markdown
## Step 0: Get Ready
The first step in any Malloy data modeling exercise is to declare your sources. In the following code cell, declare a source for each of the tables in the ecommerce dataset. Refer to the [documentation for the precise syntax](https://malloydata.github.io/documentation/language/source).
>>>malloy
// users table at data/users.parquet
source: users is duckdb.table('data/ecommerce/users.parquet')
// Create a source for the rest of the tables in this dataset:
// products
source: products is duckdb.table('data/ecommerce/products.parquet')
// order_items
source: order_items is duckdb.table('data/ecommerce/order_items.parquet') {
join_one: products on product_id = products.product_id
}
// orders
source: orders is duckdb.table('data/ecommerce/orders.parquet') {
join_many: order_items on order_id = order_items.order_id
join_one: users on user_id = users.user_id
measure:
total_sales is order_items.sum(order_items.sale_price)
query: rev_by_brand is {
group_by: order_items.products.brand
aggregate: total_sales
top: 5
}
}
>>>markdown
## Step 1: Warmup
Simple queries are easy to write in Malloy. The following query looks at the count of users by gender:
>>>malloy
run: users -> {
group_by: gender
aggregate: user_count is count()
}
>>>markdown
Now your turn to try. Write a query that calculates the count of orders by status.
>>>malloy
// Calculate the count of orders by status:
run: orders -> {
group_by: status
aggregate: order_count is count()
}
>>>markdown
## Step 2: Joins
In this data model, `order_items` and `orders` are related fact tables, and the `users` and `products` tables are dimensions that join in. Update the sources above to include these [joins](https://malloydata.github.io/documentation/language/join):
- `orders` has a one-to-many join to `order_items` via `order_id`
- `orders` joins to `users` via `user_id`
- `order_items` joins to `products` via `product_id`
>>>markdown
## Step 3: Using Joins
Now that we have the joins modeled, let's run a query that takes advantage of them. Count the number of orders by user's country.
>>>malloy
// Count the number of orders per user's country:
run: orders -> {
group_by: users.country
aggregate: order_count is count()
}
>>>markdown
## Step 4a: Aggregate Locality
Malloy makes the promise that regardless of what joins you have defined, your aggregate calculations will always be correct. This means you no longer need to worry about SQL "gotchas" like join fan-outs. In Malloy, this concept is called [aggregate locality](https://malloydata.github.io/documentation/language/aggregates#aggregate-locality).
To see this in action, write a Malloy query that calculates:
- The total sales for each product category
- The average age of users who bought products in that product category
After you write the Malloy query, think about how you would accomplish this same task in SQL, and why it might be difficult.
>>>malloy
// Write one query to compute the total sales and average age of purchasing users for each product category
run: orders -> {
group_by: order_items.products.category
aggregate:
total_sales
avg_age is users.avg(users.age)
}
>>>markdown
Now that you have some useful aggregations, it might be helpful to define them as `measure`s in the model, so they can be easily reused.
>>>markdown
## Step 4b: Calculating Percent of Total
Malloy makes "percent of total" calculations very easy. For each product category, what is the total revenue, and calculate the percent of overall revenue.
>>>malloy
// Compute revenue for each product category, as well as percent of revenue relative to overall
run: orders -> {
group_by: order_items.products.category
aggregate:
total_sales
# percent
pct_of_total is total_sales / all(total_sales)
}
>>>markdown
## Step 4c: Number Formatting
[Add a tag](https://malloydata.github.io/documentation/visualizations/numbers#-percent) to the percent of total field so that the values render as percentages.
>>>markdown
## Step 5a: Nested Query
The above query shows us revenue numbers per product category. Let's drill in deeper with a nested query. Can you add a `nest` to this query to show the top 5 brands by revenue for each category?
>>>malloy
// For each product category, show top 5 brands by revenue
run: orders -> {
group_by: order_items.products.category
aggregate:
total_sales
# percent
pct_of_total is total_sales / all(total_sales)
# bar_chart
nest: by_brand is {
group_by: order_items.products.brand
aggregate: total_sales
top: 5
}
top: 5
}
>>>markdown
## Step 5b: Charting
Add a tag to the nested query to [render it as a bar chart](https://malloydata.github.io/documentation/visualizations/bar_charts).
>>>markdown
## Step 6: Saving this query
Update the data model to save the "top brands" query directly onto the source. Now modify what you previously wrote to use this saved query.
>>>malloy
// Update what you previously wrote to use a saved query from the Source
run: orders -> {
group_by: order_items.products.category
aggregate:
total_sales
# percent
pct_of_total is total_sales / all(total_sales)
# bar_chart
nest: rev_by_brand
top: 5
}
>>>markdown
## Step 7: Refining a Saved Query
Try using a query refinement to take the analysis further. Perhaps you can filter to only look at top brands where the `department` is Men or Women? Or perhaps you want to only look at brands with products above a certain `retail_price` threshold?
>>>malloy
// Update this query again to include a refinement
run: orders -> {
group_by: order_items.products.category
aggregate:
total_sales
# percent
pct_of_total is total_sales / all(total_sales)
# bar_chart
nest: rev_by_brand + { where: order_items.products.department = 'Men' }
top: 5
}
>>>markdown
## Step 8: Keep Exploring!
There are a handful of other datasets in this repository that are ready for modeling, exploration, and analysis. Take what you've learned here and try to apply it to another dataset in the `data` subfolder. Or better yet, find a dataset relevant to your day-to-day work and use Malloy to model and analyze it!