Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User single site #355

Merged
merged 4 commits into from
Jan 29, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
add tests
  • Loading branch information
dgitis committed Dec 26, 2024
commit bf3457308790db3342d0dfb169db65d768f9724b
25 changes: 21 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ Features include:
| stg_ga4__event_* | 1 model per event (ex: page_view, purchase) which flattens event parameters specific to that event |
| stg_ga4__event_items | Contains item data associated with e-commerce events (Purchase, add to cart, etc) |
| stg_ga4__event_to_query_string_params | Mapping between each event and any query parameters & values that were contained in the event's `page_location` field |
| stg_ga4__users | User ID table built from the GA4 User export table. Flattens user properties and audiences using the `user_properties` and `audiences` variables in your `dbt_project.yml` file. Disabled by default. |
| stg_ga4__client_keys | Clint key table built from the GA4 User export pseudonymous users table. Flattens user properties and audiences using the `user_properties` and `audiences` variables in your `dbt_project.yml` file. Disabled by default. |
| stg_ga4__user_properties | Finds the most recent occurance of specified user_properties for each user |
| stg_ga4__derived_user_properties | Finds the most recent occurance of specific event_params value and assigns them to a client_key. Derived user properties are specified as variables (see documentation below) |
| stg_ga4__derived_session_properties | Finds the most recent occurance of specific event_params or user_properties value and assigns them to a session's session_key. Derived session properties are specified as variables (see documentation below) |
Expand Down Expand Up @@ -254,17 +256,21 @@ vars:

This package contains two sets of user tables: an original set of user tables implemented from the inception of this package and a new set of user tables designed to use the GA4 BigQuery user export tables that were released after this package was first launched.

The original user tables build one-row-per-user tables and include data like first and last device, first and last geo, user properties, and derived user properties. To build them, they need to process all-time data. Large sites might want to consider disabling these tables to save costs.
The original user tables build one-row-per-user tables and include data like first and last device, first and last geo, user properties, and derived user properties. They need to process all-time data to build these tables. Large sites might want to consider disabling these tables to save costs.

The newer user tables leverage the GA4 user export setting. They are partitioned tables so they are more appropriate for high-traffic sites. They lose the first and last columns and derived user properties, but include user properties, audiences, user LTV, and predictive data.

The GA4 user export tables do not currently support multi-site. There is a multi-site branch that needs testing. If you have a multi-site implementation and wish to use the GA4 user export tables, then please install the [user branch](https://github.com/Velir/dbt-ga4/tree/user) in your development environment, configure the various user-specific settings, run dbt, and report any issues or successes on this [draft PR](https://github.com/Velir/dbt-ga4/pull/317). Reach out on the draft PR if you need help with any of this.

## Settings Common to Both Sets of User Tables

The `user_properties` fields in the `events_*` and `events_intraday_*` tables, and the `users_*` and `pseudonymous_users_*` tables are in different formats. No settings are shared between the two sets of user tables.

## dbt-GA4 Original User Table Settings

### User Properties

User properties are provided by GA4 in the `user_properties` repeated field. The most recent user property for each user will be extracted and included in the `dim_ga4__users` model by configuring the `user_properties` variable in your project as follows:
User properties are provided by GA4 in the `user_properties` repeated field at the event-level in the `events_*` and `events_intraday_*` tables. The most recent user property for each user will be extracted and included in the `dim_ga4__users` model by configuring the `user_properties` variable in your project as follows:

```
vars:
Expand All @@ -276,8 +282,6 @@ vars:
value_type: "string_value"
```

## dbt-GA4 Original User Table Settings

### Derived User Properties

Derived user properties are different from "User Properties" in that they are derived from event parameters. This provides additional flexibility in allowing users to turn any event parameter into a user property.
Expand Down Expand Up @@ -326,6 +330,19 @@ models:
+enabled: true
```

### User Properties

The GA4 User Export includes a user properties repeated record that stores the user property details. User properties are enabled by adding a list of user property names that match values in the `user_properties.value.user_property_name` fields of your `pseudonymous_users_` and `users__` tables as shown below.

```
vars:
ga4:
user_export_user_properties: ['All Users', 'Purchasers']
```

Unlike the `event_params` and `user_properties` event-level fields, the user-level user properties are keyed off of `user_properties.value.user_property_name` rather than `user_properties.key`. Tshe `user_properties.key` in the user tables is the slot that GA4 uses, `slot_01` for example, rather than the name. As a result, `user_properties.value.user_property_name` in the user tables should be the same as `user_properties.key` in the event tables.


### Audiences

The GA4 User Export includes an Audiences repeated record that stores the audience membership details. Audiences are enabled by adding a list of audience names that match values in the `audiences.name` fields of your `psuedonymous_users_` and `users__` tables as shown below.
Expand Down
6 changes: 3 additions & 3 deletions models/staging/stg_ga4__client_keys.sql
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ select
*
, to_base64(md5(concat(pseudo_user_id, stream_id))) as client_key
{% for up in var('user_properties', []) %}
, (select value.string_value from unnest(user_properties) where key = '{{up}}') as {{up | lower | replace(" ", "_")}}_string_value
, (select value.set_timestamp_micros from unnest(user_properties) where key = '{{up}}') as {{up | lower | replace(" ", "_")}}_set_timestamp_micros
, (select value.user_property_name from unnest(user_properties) where key = '{{up}}') as {{up | lower | replace(" ", "_")}}_user_property_name
, (select value.string_value from unnest(user_properties) where value.user_property_name = '{{up}}') as {{up | lower | replace(" ", "_")}}_string_value
, (select value.set_timestamp_micros from unnest(user_properties) where value.user_property_name = '{{up}}') as {{up | lower | replace(" ", "_")}}_set_timestamp_micros
, (select value.user_property_name from unnest(user_properties) where value.user_property_name = '{{up}}') as {{up | lower | replace(" ", "_")}}_user_property_name
{% endfor %}
{% for aud in var('audiences', []) %}
, (select id from unnest(audiences) where name = '{{aud}}') as audience_{{aud | lower | replace(" ", "_")}}_id
Expand Down
127 changes: 123 additions & 4 deletions models/staging/stg_ga4__client_keys.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,131 @@ unit_tests:
- input: ref('base_ga4__pseudonymous_users')
rows:
- audiences: ['struct(111111111 as id, "my_test_audience" as name, 1731573754000000 as membership_start_timestamp_micros, 1731998727000000
as membership_expiry_timestamp_micros, false as npa)', 'struct(222222222 as id, "my_second_audience" as name, 1731573754000000 as membership_start_timestamp_micros, 1731998727000000
as membership_expiry_timestamp_micros, true as npa)']
as membership_expiry_timestamp_micros, false as npa)', 'struct(222222222 as id, "my_second_audience" as name, 1731573754000000 as membership_start_timestamp_micros, 1731998727000000
as membership_expiry_timestamp_micros, true as npa)']
overrides:
vars:
audiences: ['my_test_audience', 'my_second_audience']
expect:
rows:
- {audience_my_test_audience_id: 111111111, audience_my_test_audience_name: 'my_test_audience', audience_my_test_audience_membership_start_timestamp_micros: 1731573754000000, audience_my_test_audience_membership_expiry_timestamp_micros: 1731998727000000, audience_my_test_audience_npa: False}
- {audience_my_second_audience_id: 222222222, audience_my_second_audience_name: 'my_second_audience', audience_my_second_audience_membership_start_timestamp_micros: 1731573754000000, audience_my_second_audience_membership_expiry_timestamp_micros: 1731998727000000, audience_my_second_audience_npa: True}
- {audience_my_test_audience_id: 111111111, audience_my_test_audience_name: 'my_test_audience', audience_my_test_audience_membership_start_timestamp_micros: 1731573754000000, audience_my_test_audience_membership_expiry_timestamp_micros: 1731998727000000, audience_my_test_audience_npa: False, audience_my_second_audience_id: 222222222, audience_my_second_audience_name: 'my_second_audience', audience_my_second_audience_membership_start_timestamp_micros: 1731573754000000, audience_my_second_audience_membership_expiry_timestamp_micros: 1731998727000000, audience_my_second_audience_npa: True}
- name: test_base_to_stg_ga4__client_keys
description: >
Testing that a given row of base_ga4__psuedonymous_users produces the expected output in stg_ga4__client_keys.
model: stg_ga4__client_keys
given:
- input: ref('base_ga4__pseudonymous_users')
format: sql
rows: |
select
'1664444444.1694444444' as pseudo_user_id
, '1234567890' as stream_id
, 1694444444444444 as user_info_last_active_timestamp_micros
, 1664444444444444 as user_info_user_first_touch_timestamp_micros
, 20241201 as user_info_first_purchase_date
, 'web' as device_operating_system
, 'mobile' as device_category
, 'Samsung' as device_mobile_brand_name
, 'SM-J337V' as device_mobile_model_name
, 'My page title' as device_unified_screen_name
, 'Vancouver' as geo_city
, 'Canada' as geo_country
, 'Americas' as geo_continent
, 'British Columbia' as geo_region
, 200.0 as user_ltv_revenue_in_usd
, 3 as user_ltv_sessions
, 346517 as user_ltv_engagement_time_millis
, 1 as user_ltv_purchases
, 3 as user_ltv_engaged_sessions
, 6582608513 as user_ltv_session_duration_micros
, cast(null as float64) as predictions_in_app_purchase_score_7d
, 0.4 as predictions_purchase_score_7d
, 0.08 as predictions_churn_score_7d
, 321.0 as predictions_revenue_28d_in_usd
, false as privacy_info_is_limited_ad_tracking
, false as privacy_info_is_ads_personalization_allowed
, date('2024-12-10') as occurence_date
, date('2024-12-12') as last_updated_date
, array[
struct(
'slot_01' as key
, struct(
'first_prop_val' as string_value
, 1695183380000000 as set_timestamp_micros
, 'First Prop Name' as user_property_name
) as value
)
] as user_properties
, array[
struct(
2366216494 as id
, 'All Users' as name
, 1695183380000000 as membership_start_timestamp_micros
, 1715183380000000 as membership_expiry_timestamp_micros
, false as npa
)
] as audiences
overrides:
vars:
user_properties: ['First Prop Name']
audiences: ['All Users']
expect:
format: sql
rows: |
select
'1664444444.1694444444' as pseudo_user_id
, '1234567890' as stream_id
, 1694444444444444 as user_info_last_active_timestamp_micros
, 1664444444444444 as user_info_user_first_touch_timestamp_micros
, 20241201 as user_info_first_purchase_date
, 'web' as device_operating_system
, 'mobile' as device_category
, 'Samsung' as device_mobile_brand_name
, 'SM-J337V' as device_mobile_model_name
, 'My page title' as device_unified_screen_name
, 'Vancouver' as geo_city
, 'Canada' as geo_country
, 'Americas' as geo_continent
, 'British Columbia' as geo_region
, 200.0 as user_ltv_revenue_in_usd
, 3 as user_ltv_sessions
, 346517 as user_ltv_engagement_time_millis
, 1 as user_ltv_purchases
, 3 as user_ltv_engaged_sessions
, 6582608513 as user_ltv_session_duration_micros
, cast(null as float64) as predictions_in_app_purchase_score_7d
, 0.4 as predictions_purchase_score_7d
, 0.08 as predictions_churn_score_7d
, 321.0 as predictions_revenue_28d_in_usd
, false as privacy_info_is_limited_ad_tracking
, false as privacy_info_is_ads_personalization_allowed
, date('2024-12-10') as occurence_date
, date('2024-12-12') as last_updated_date
, array[
struct(
'slot_01' as key
, struct(
'first_prop_val' as string_value
, 1695183380000000 as set_timestamp_micros
, 'First Prop Name' as user_property_name
) as value
)
] as user_properties
, array[
struct(
2366216494 as id
, 'All Users' as name
, 1695183380000000 as membership_start_timestamp_micros
, 1715183380000000 as membership_expiry_timestamp_micros
, false as npa
)
] as audiences
, 'hhcn7XB3QFPLFh3tf5sZzQ==' as client_key
, 'first_prop_val' as first_prop_name_string_value
, 1695183380000000 as first_prop_name_set_timestamp_micros
, 'First Prop Name' as first_prop_name_user_property_name
, 2366216494 as audience_all_users_id
, 'All Users' as audience_all_users_name
, 1695183380000000 as audience_all_users_membership_start_timestamp_micros
, 1715183380000000 as audience_all_users_membership_expiry_timestamp_micros
, false as audience_all_users_npa
6 changes: 3 additions & 3 deletions models/staging/stg_ga4__users.sql
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@
select
*
{% for up in var('user_properties', []) %}
, (select value.string_value from unnest(user_properties) where key = '{{up}}') as {{up | lower | replace(" ", "_")}}_string_value
, (select value.set_timestamp_micros from unnest(user_properties) where key = '{{up}}') as {{up | lower | replace(" ", "_")}}_set_timestamp_micros
, (select value.user_property_name from unnest(user_properties) where key = '{{up}}') as {{up | lower | replace(" ", "_")}}_user_property_name
, (select value.string_value from unnest(user_properties) where value.user_property_name = '{{up}}') as {{up | lower | replace(" ", "_")}}_string_value
, (select value.set_timestamp_micros from unnest(user_properties) where value.user_property_name = '{{up}}') as {{up | lower | replace(" ", "_")}}_set_timestamp_micros
, (select value.user_property_name from unnest(user_properties) where value.user_property_name = '{{up}}') as {{up | lower | replace(" ", "_")}}_user_property_name
{% endfor %}
{% for aud in var('audiences', []) %}
, (select id from unnest(audiences) where name = '{{aud}}') as audience_{{aud | lower | replace(" ", "_")}}_id
Expand Down
Loading