Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User single site #355

Merged
merged 4 commits into from
Jan 29, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
159 changes: 118 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ Features include:
| stg_ga4__event_* | 1 model per event (ex: page_view, purchase) which flattens event parameters specific to that event |
| stg_ga4__event_items | Contains item data associated with e-commerce events (Purchase, add to cart, etc) |
| stg_ga4__event_to_query_string_params | Mapping between each event and any query parameters & values that were contained in the event's `page_location` field |
| stg_ga4__users | User ID table built from the GA4 User export table. Flattens user properties and audiences using the `user_properties` and `audiences` variables in your `dbt_project.yml` file. Disabled by default. |
| stg_ga4__client_keys | Clint key table built from the GA4 User export pseudonymous users table. Flattens user properties and audiences using the `user_properties` and `audiences` variables in your `dbt_project.yml` file. Disabled by default. |
| stg_ga4__user_properties | Finds the most recent occurance of specified user_properties for each user |
| stg_ga4__derived_user_properties | Finds the most recent occurance of specific event_params value and assigns them to a client_key. Derived user properties are specified as variables (see documentation below) |
| stg_ga4__derived_session_properties | Finds the most recent occurance of specific event_params or user_properties value and assigns them to a session's session_key. Derived session properties are specified as variables (see documentation below) |
Expand Down Expand Up @@ -173,47 +175,6 @@ vars:
value_type: "int_value"
```

### User Properties

User properties are provided by GA4 in the `user_properties` repeated field. The most recent user property for each user will be extracted and included in the `dim_ga4__users` model by configuring the `user_properties` variable in your project as follows:

```
vars:
ga4:
user_properties:
- user_property_name: "membership_level"
value_type: "int_value"
- user_property_name: "account_status"
value_type: "string_value"
```

### Derived User Properties

Derived user properties are different from "User Properties" in that they are derived from event parameters. This provides additional flexibility in allowing users to turn any event parameter into a user property.

Derived User Properties are included in the `dim_ga4__users` model and contain the latest event parameter value per user.

```
derived_user_properties:
- event_parameter: "[your event parameter]"
user_property_name: "[a unique name for the derived user property]"
value_type: "[string_value|int_value|float_value|double_value]"
```

For example:

```
vars:
ga4:
derived_user_properties:
- event_parameter: "page_location"
user_property_name: "most_recent_page_location"
value_type: "string_value"
- event_parameter: "another_event_param"
user_property_name: "most_recent_param"
value_type: "string_value"
```

### Derived Session Properties

Derived session properties are similar to derived user properties, but on a per-session basis, for properties that change slowly over time. This provides additional flexibility in allowing users to turn any event parameter into a session property.
Expand Down Expand Up @@ -290,6 +251,122 @@ vars:
- name: "some_other_parameter"
value_type: "string_value"
```

# User Tables

This package contains two sets of user tables: an original set of user tables implemented from the inception of this package and a new set of user tables designed to use the GA4 BigQuery user export tables that were released after this package was first launched.

The original user tables build one-row-per-user tables and include data like first and last device, first and last geo, user properties, and derived user properties. They need to process all-time data to build these tables. Large sites might want to consider disabling these tables to save costs.

The newer user tables leverage the GA4 user export setting. They are partitioned tables so they are more appropriate for high-traffic sites. They lose the first and last columns and derived user properties, but include user properties, audiences, user LTV, and predictive data.

The GA4 user export tables do not currently support multi-site. There is a multi-site branch that needs testing. If you have a multi-site implementation and wish to use the GA4 user export tables, then please install the [user branch](https://github.com/Velir/dbt-ga4/tree/user) in your development environment, configure the various user-specific settings, run dbt, and report any issues or successes on this [draft PR](https://github.com/Velir/dbt-ga4/pull/317). Reach out on the draft PR if you need help with any of this.

## Settings Common to Both Sets of User Tables

The `user_properties` fields in the `events_*` and `events_intraday_*` tables, and the `users_*` and `pseudonymous_users_*` tables are in different formats. No settings are shared between the two sets of user tables.

## dbt-GA4 Original User Table Settings

### User Properties

User properties are provided by GA4 in the `user_properties` repeated field at the event-level in the `events_*` and `events_intraday_*` tables. The most recent user property for each user will be extracted and included in the `dim_ga4__users` model by configuring the `user_properties` variable in your project as follows:

```
vars:
ga4:
user_properties:
- user_property_name: "membership_level"
value_type: "int_value"
- user_property_name: "account_status"
value_type: "string_value"
```

### Derived User Properties

Derived user properties are different from "User Properties" in that they are derived from event parameters. This provides additional flexibility in allowing users to turn any event parameter into a user property.

Derived User Properties are included in the `dim_ga4__users` model and contain the latest event parameter value per user.

```
derived_user_properties:
- event_parameter: "[your event parameter]"
user_property_name: "[a unique name for the derived user property]"
value_type: "[string_value|int_value|float_value|double_value]"
```

For example:

```
vars:
ga4:
derived_user_properties:
- event_parameter: "page_location"
user_property_name: "most_recent_page_location"
value_type: "string_value"
- event_parameter: "another_event_param"
user_property_name: "most_recent_param"
value_type: "string_value"
```

## GA4 User Export Settings

The GA4 user export models are disabled by default.

Enable them by adding the following model configs:

```
models:
ga4:
staging:
base:
base_ga4__pseudonymous_users:
+enabled: true
base_ga4__users:
+enabled: true
stg_ga4__client_keys:
+enabled: true
stg_ga4__users:
+enabled: true
```

### User Properties

The GA4 User Export includes a user properties repeated record that stores the user property details. User properties are enabled by adding a list of user property names that match values in the `user_properties.value.user_property_name` fields of your `pseudonymous_users_` and `users__` tables as shown below.

```
vars:
ga4:
user_export_user_properties: ['All Users', 'Purchasers']
```

Unlike the `event_params` and `user_properties` event-level fields, the user-level user properties are keyed off of `user_properties.value.user_property_name` rather than `user_properties.key`. Tshe `user_properties.key` in the user tables is the slot that GA4 uses, `slot_01` for example, rather than the name. As a result, `user_properties.value.user_property_name` in the user tables should be the same as `user_properties.key` in the event tables.


### Audiences

The GA4 User Export includes an Audiences repeated record that stores the audience membership details. Audiences are enabled by adding a list of audience names that match values in the `audiences.name` fields of your `psuedonymous_users_` and `users__` tables as shown below.

```
vars:
ga4:
audiences: ['Purchases', 'All Users']
```

This example will add the following columns to the relevant dbt-GA4 models:

- purchases_id
- purchases_name
- purchases_membership_start_timestamp_micros
- purchases_membership_expiry_timestamp_micros
- purchases_npa
- all_users_id
- all_users_name
- all_users_membership_start_timestamp_micros
- all_users_membership_expiry_timestamp_micros
- all_users_npa


# Connecting to BigQuery

This package assumes that BigQuery is the source of your GA4 data. Full instructions for connecting DBT to BigQuery are here: https://docs.getdbt.com/reference/warehouse-profiles/bigquery-profile
Expand Down
38 changes: 37 additions & 1 deletion macros/base_select.sql
Original file line number Diff line number Diff line change
Expand Up @@ -176,4 +176,40 @@
WHEN event_name = 'purchase' THEN 1
ELSE 0
END AS is_purchase
{% endmacro %}
{% endmacro %}


{% macro base_select_usr_source() %}
{{ return(adapter.dispatch('base_select_usr_source', 'ga4')()) }}
{% endmacro %}

{% macro default__base_select_usr_source() %}
, user_info.last_active_timestamp_micros as user_info_last_active_timestamp_micros
, user_info.user_first_touch_timestamp_micros as user_info_user_first_touch_timestamp_micros
, user_info.first_purchase_date as user_info_first_purchase_date
, device.operating_system as device_operating_system
, device.category as device_category
, device.mobile_brand_name as device_mobile_brand_name
, device.mobile_model_name as device_mobile_model_name
, device.unified_screen_name as device_unified_sceen_name
, geo.city as geo_city
, geo.country as geo_country
, geo.continent as geo_continent
, geo.region as geo_region
, user_ltv.revenue_in_usd as user_ltv_revenue_in_usd
, user_ltv.sessions as user_ltv_sessions
, user_ltv.engagement_time_millis as user_ltv_engagement_time_millis
, user_ltv.purchases as user_ltv_purchases
, user_ltv.engaged_sessions as user_ltv_engaged_sessions
, user_ltv.session_duration_micros as user_ltv_session_duration_micros
, predictions.in_app_purchase_score_7d as predictions_in_app_purchase_score_7d
, predictions.purchase_score_7d as predictions_purchase_score_7d
, predictions.churn_score_7d as predictions_churn_score_7d
, predictions.revenue_28d_in_usd as predictions_revenue_28d_in_usd
, privacy_info.is_limited_ad_tracking as privacy_info_is_limited_ad_tracking
, privacy_info.is_ads_personalization_allowed as privacy_info_is_ads_personalization_allowed
, parse_date('%Y%m%d' , occurrence_date) as occurrence_date
, parse_date('%Y%m%d' , last_updated_date) as last_updated_date
, user_properties
, audiences
{% endmacro %}
29 changes: 29 additions & 0 deletions models/staging/base/base_ga4__pseudonymous_users.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
{% set partitions_to_replace = ['current_date'] %}
{% for i in range(env_var('GA4_INCREMENTAL_DAYS')|int if env_var('GA4_INCREMENTAL_DAYS', false) else var('static_incremental_days')) %}
{% set partitions_to_replace = partitions_to_replace.append('date_sub(current_date, interval ' + (i+1)|string + ' day)') %}
{% endfor %}
{{
config(
materialized = 'incremental',
incremental_strategy = 'insert_overwrite',
enabled=false,
partition_by={
"field": "occurrence_date",
"data_type": "date",
},
partitions = partitions_to_replace,
)
}}

with source as (
select
pseudo_user_id
, stream_id
{{ ga4.base_select_usr_source() }}
from {{ source('ga4', 'pseudonymous_users') }}
{% if is_incremental() %}
where parse_date('%Y%m%d', left(_table_suffix, 8)) in ({{ partitions_to_replace | join(',') }})
{% endif %}
)

select * from source
85 changes: 85 additions & 0 deletions models/staging/base/base_ga4__pseudonymous_users.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
version: 2

models:
- name: base_ga4__pseudonymous_users
description: >
Base pseudo-user (client) model that pulls all fields from the pseudonymous user table of the user export. The pseudonymous user table is keyed on
the user_pseudo_id which is the cid parameter in Gtag calls and is the main parameter in the from which the dbt-GA4 client_id is
created. The table is partitioned by occurence_date. This model also flattens some fields.
columns:
- name: pseudo_user_id
description: >
The user_pseudo_id is a unique identifier for a user that is not tied to any personal information. This is the main identifier
used in the GA4 property. This is the cid parameter in Gtag calls and is the main parameter in the from which the dbt-GA4 client_id is
created.
- name: stream_id
description: The numeric ID of the data stream from which the event originated.
- name: user_info_last_active_timestamp_micros
description: Date of the user's last activity (timestamp in microseconds). Flattened version of user_info.last_active_timestamp_micros.
- name: user_info_user_first_touch_timestamp_micros
description: Date of the user's first_open or first_visit event, whichever is earlier (timestamp in microseconds). Flattened version of user_info.user_first_touch_timestamp_micros.
- name: user_info_first_purchase_date
description: Date of the user's first purchase (YYYYMMDD). Flattened version of user_info.first_purchase_date.
- name: device_operating_system
description: Flattened version of device.operating_system.
- name: device_category
description: Category of the device (mobile, tablet, desktop). Flattened version of device.category.
- name: device_mobile_brand_name
description: Flattened version of device.mobile_brand_name.
- name: device_mobile_model_name
description: Flattened version of device.mobile_model_name.
- name: device_unified_sceen_name
description: Flattened version of device.unified_screen_name.
- name: geo_city
description: Flattened version of geo.city.
- name: geo_country
description: Flattened version of geo.country.
- name: geo_continent
description: Flattened version of geo.continent.
- name: geo_region
description: Flattened version of geo.region.
- name: user_ltv_revenue_in_usd
description: Flattened version of user_ltv.revenue_in_usd.
- name: user_ltv_sessions
description: Flattened version of user_ltv.sessions
- name: user_ltv_engagement_time_millis
description: Flattened version of user_ltv.engagement_time_millis
- name: user_ltv_purchases
description: Flattened version of user_ltv.purchases
- name: user_ltv_engaged_sessions
description: Flattened version of user_ltv.engaged_sessions
- name: user_ltv_session_duration_micros
description: Flattened version of user_ltv.session_duration_micros
- name: predictions_in_app_purchase_score_7d
description: >
Probability that a user who was active in the last 28 days will log an in_app_purchase event within the next 7 days.
Flattened ersion of predictions.in_app_purchase_score_7d.
- name: predictions_purchase_score_7d
description: >
Probability that a user who was active in the last 28 days will log a purchase event within the next 7 days.
Flattened version of predictions.purchase_score_7d.
- name: predictions_churn_score_7d
description: >
Probability that a user who was active on your app or site within the last 7 days will not be active within the next 7 days.
Flattened version of predictions.churn_score_7d.
- name: predictions_revenue_28d_in_usd
description: >
Revenue expected (in USD) from all purchase events within the next 28 days from a user who was active in the last 28 days.
Flattened version of predictions.revenue_28d_in_usd.
- name: privacy_info_is_limited_ad_tracking
description: >
The device's Limit Ad Tracking setting. Possible values include: 'true', 'false', and '(not set)'. isLimitedAdTracking returns '(not set)' if Google Analytics is not
currently able to return this device's Limit Ad Tracking setting. Flattened version of privacy_info.is_limited_ad_tracking.
- name: privacy_info_is_ads_personalization_allowed
description: >
If a user is eligible for ads personalization, isAdsPersonalizationAllowed returns 'true'. If a user is not eligible for ads personalization,
isAdsPersonalizationAllowed returns 'false'. isAdsPersonalizationAllowed returns '(not set)' if Google Analytics is not currently able to
return whether this user is eligible for ads personalization; users where isAdsPersonalizationAllowed returns '(not set)' may or may not be
eligible for personalized ads. For personalized ads, you should treat users where isAdsPersonalizationAllowed = '(not set)' as isAdsPersonalizationAllowed = 'false'
because, in the most general case, some of the '(not set)' rows will include users that are not eligible for ads personalization. Users where
isAdsPersonalizationAllowed = 'false' may still be used for non-advertising use cases like A/B testing & data explorations. Flattened version of
privacy_info.is_ads_personalization_allowed.
- name: occurence_date
description: Date when the record change was triggered. This is the partitioning column.
- name: last_updated_date
desctiption: Date when the record was updated in the table.
29 changes: 29 additions & 0 deletions models/staging/base/base_ga4__users.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
{% set partitions_to_replace = ['current_date'] %}
{% for i in range(env_var('GA4_INCREMENTAL_DAYS')|int if env_var('GA4_INCREMENTAL_DAYS', false) else var('static_incremental_days')) %}
{% set partitions_to_replace = partitions_to_replace.append('date_sub(current_date, interval ' + (i+1)|string + ' day)') %}
{% endfor %}
{{
config(
pre_hook="{{ ga4.combine_property_data() }}" if var('combined_dataset', false) else "",
materialized = 'incremental',
incremental_strategy = 'insert_overwrite',
enabled=false,
partition_by={
"field": "occurrence_date",
"data_type": "date",
},
partitions = partitions_to_replace,
)
}}

with source as (
select
user_id
{{ ga4.base_select_usr_source() }}
from {{ source('ga4', 'users') }}
{% if is_incremental() %}
where parse_date('%Y%m%d', left(_table_suffix, 8)) in ({{ partitions_to_replace | join(',') }})
{% endif %}
)

select * from source
Loading
Loading