Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal module development #3

Merged
merged 1 commit into from
Jan 27, 2022

Conversation

emielver
Copy link
Contributor

No description provided.

@emielver emielver requested a review from bill-warner January 18, 2022 10:57
@github-pages github-pages bot temporarily deployed to github-pages January 18, 2022 13:29 Inactive
@emielver emielver force-pushed the internal-module-development branch from 19b1b4e to 248caf2 Compare January 18, 2022 14:23
@github-pages github-pages bot temporarily deployed to github-pages January 18, 2022 14:23 Inactive
@github-pages github-pages bot temporarily deployed to github-pages January 19, 2022 00:03 Inactive
@emielver emielver force-pushed the internal-module-development branch from 5a3874a to d03d51f Compare January 19, 2022 00:05
@github-pages github-pages bot temporarily deployed to github-pages January 19, 2022 00:05 Inactive
Copy link
Contributor

@bill-warner bill-warner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done. Overall looks really good. There are quite a few files throughout without a new line char at the end. I havent left a comment on each but I think you should be able to do a REGEX search for them and add them in.

I think the biggest change is updating the app errors module to be disabled by default (I don't think snowplow__enable_app_error_context currently disables it)

Does the GH pages that point to the current branch? If so, once released we should update it to be based on the main branch so the doc site reflects the latest release.

On this note of the docs, do you have plans to write a similar docsite to the web package. There is likely a lot of overlap here so I am open to ideas as to how we best go about this. I think at the very least we should have a quickstart guide for the package.

Also I apologies for all the formatting comments! Me being neurotic.

.github/workflows/pr_tests.yml Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Show resolved Hide resolved
macros/get_session_id_path_sql.sql Outdated Show resolved Hide resolved
dbt_project.yml Show resolved Hide resolved
dbt_project.yml Outdated Show resolved Hide resolved
dbt_project.yml Outdated Show resolved Hide resolved
@github-pages github-pages bot temporarily deployed to github-pages January 20, 2022 12:57 Inactive
@github-pages github-pages bot temporarily deployed to github-pages January 20, 2022 14:18 Inactive
@github-pages github-pages bot temporarily deployed to github-pages January 21, 2022 10:32 Inactive
@github-pages github-pages bot temporarily deployed to github-pages January 21, 2022 10:54 Inactive
@github-pages github-pages bot temporarily deployed to github-pages January 21, 2022 14:17 Inactive
@github-pages github-pages bot temporarily deployed to github-pages January 21, 2022 14:20 Inactive
@github-pages github-pages bot temporarily deployed to github-pages January 24, 2022 13:42 Inactive
@github-pages github-pages bot temporarily deployed to github-pages January 24, 2022 13:47 Inactive
@emielver
Copy link
Contributor Author

@bill-warner can you give it another look? Should be updated now with everything discussed!

@bill-warner
Copy link
Contributor

Sure will have a look ASAP

Copy link
Contributor

@bill-warner bill-warner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all those changes. It doesn't appear however to be working correctly for RS, I believe due to issues in the sessions aggs table. I have suggested a fix which I think might resolve it for RS. Not sure what affect it will have on Postgres as I don't have a db to test it against. Would be good to work out why we are having to cast values for Postgres.

Also I notice with app errors disables, you get warnings about having tests on a disabled model. Do you know if there is a way to disable these tests as well as the model?

# dbt_project.yml
...
vars:
snowplow_mobile:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth including a note somewhere highlighting the importance of scoping variables to the package rather than globally if you are also using the web package.

https://docs.getdbt.com/docs/building-a-dbt-project/building-models/using-variables#variable-precedence

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in the dbt_project.yml file, wouldn't we just get rid of all confusion/issues by scoping everything on a package level? E.g.

vars:
  snowplow_mobile:
    snowplow_events: "{{....}}"
    ...
    snowplow_allow_refresh: false

At least that should be the case based on the order of operations specified by dbt in the link you provided

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give it a try but I have a feeling that even if you do that, if you set global vars in the root project they will override both the web and mobile packages vars (where we have common vars)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, as far as I understand the order the local project namespace is used before globals (even within the same project), but I'll let you know if this helps resolve conflicts while having both projects working at the same time:

  1. The variables defined on the command line with --vars.
  2. The package-scoped variable declaration in the dbt_project.yml file
  3. The global variable declaration in the dbt_project.yml file.
  4. The variable's default argument (if one is provided).

docs/markdown/snowplow_mobile_overview.md Outdated Show resolved Hide resolved
docs/markdown/snowplow_mobile_overview.md Show resolved Hide resolved
MAX(case when e.event_index_in_session = e.events_in_session then e.version end) as last_version,
MAX(case when e.event_index_in_session = e.events_in_session then e.event_name end) as last_event_name,
{% if target.type == 'postgres' %}
MAX(case when e.event_index_in_session = e.events_in_session then e.event_id::varchar end) as session_last_event_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of interest what data type is the event_id before casting to varchar? Same with session_id. It feels like if they are the wrong data type then we should be correcting this upstream i.e. in event_this_run. That said, we join on event_id and session_id in events_this_run without needing to cast so I am curious as to at what point the data type changes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

event_id and session_id are both uuids and for some reason Postgres can't deal with that. So in the join it can't implictly cast a uuid to a varchar to compare to another varchar, and it also doesn't know a max function for a uuid.
For app_errors, it sees that column as text unless I explicitly cast it. I've been able to cast app_errors upstream in the snowplow_mobile_sessions_this_run model successfully and have it trickle down, I've done the same for session_id in snowplow_mobile_base_session_context and snowplow_mobile_base_sessions_lifecycle_manifest but for event_id I can either do it in snowplow_mobile_sessions_aggs or I'd have to unstar in snowplow_mobile_base_events_this_run and do it there. I've done the former for now but let me know if you prefer the latter

models/sessions/scratch/snowplow_mobile_sessions_aggs.sql Outdated Show resolved Hide resolved
models/users/scratch/snowplow_mobile_users_aggs.sql Outdated Show resolved Hide resolved
selectors.yml Outdated Show resolved Hide resolved
docs/markdown/snowplow_mobile_overview.md Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@bill-warner
Copy link
Contributor

Also have you tested how this functions if you also have the snowplow-web package installed? Might be worth creating a local dbt project and importing both and making sure they play friendly together

@github-pages github-pages bot temporarily deployed to github-pages January 25, 2022 21:19 Inactive
@emielver
Copy link
Contributor Author

@bill-warner We can discuss it in the next meeting but I created a dbt project with both the mobile and web packages installed and ran it with no errors and all tests passing so I guess that's all good? The custom example runs for both Postgres and Redshift but I redid the documentation so it would be great if you could take a look at that as well. Other than that, again I think everything should be covered that you mentioned in the last couple of reviews. Thanks so much for all the detailed feedback, it's been super helpful!!!

Copy link
Contributor

@bill-warner bill-warner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Final few tweaks to make I think. Sorry to go back on my remark around casting values for Postgres. Wasn't aware of that data type. Fully open to other ideas on how to approach that issue, just my 2c.

custom_example/dbt_project.yml Outdated Show resolved Hide resolved
custom_example/dbt_project.yml Outdated Show resolved Hide resolved
- not_null
- not_null:
config:
enabled: '{{var("snowplow__enable_application_context")}}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think you will run into the same bug as I did when using a similar approach with the web package. Worth double checking but I think currently if you have enabled snowplow__enable_application_context within your root project, the model will execute but the tests on it will be disabled still. The work around I chose was to set default values for all these vars inside the actual var rather than in the packages dbt_project.yml file. The user can then override these from their root project. That way I believe both the model and tests should be enabled/disabled together.

https://github.com/snowplow/dbt-snowplow-web/blob/c0adfb1a275bf7dc2a428345b7d87b438cf56718/dbt_project.yml#L37-L42

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why but they seem to be coming through for me without relegating the var definition to the schema.yml files. Here you can see what happens when I enable all of the contexts except the geo one:
image
and here when they are all disabled:
image
Again using only the definition in the dbt_project.yml

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which version of dbt are you using?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also are you importing the package into a dbt project or running directly from within the package?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yep sorry, I was running directly from within the package. I guess it's expected behaviour from dbt's perspective, right? According to this page (https://docs.getdbt.com/docs/building-a-dbt-project/building-models/using-variables#variable-precedence), the package variables have precedence over global variables. But I've now changed them to be defined throughout the project with default value of false, thanks for asking!

Copy link
Contributor

@bill-warner bill-warner Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure what behaviour you are referring to but having the same var return different values within the package is not expected :)

models/sessions/scratch/snowplow_mobile_sessions_aggs.sql Outdated Show resolved Hide resolved
@github-pages github-pages bot temporarily deployed to github-pages January 26, 2022 11:35 Inactive
@github-pages github-pages bot temporarily deployed to github-pages January 26, 2022 14:32 Inactive
Copy link
Contributor

@bill-warner bill-warner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One small thing I noticed, the sessions context table in the base module should only be enabled for Postgres and RS. Doesn't really matter at this point given those are the only 2 supported warehouses atm but worth updating.

I haven't had time to run the package + tests post these changes on RS but assuming they have passed then I think we can merge. The same goes for Postgres as I don't have access.

Also once everything is merged to update the Github pages to refer to master no this branch.

@bill-warner
Copy link
Contributor

Great work on this @emielver & @rahul-snowplow . Thanks for all the effort

@emielver emielver force-pushed the internal-module-development branch from 801018f to 2a5bbb9 Compare January 27, 2022 10:32
@github-pages github-pages bot temporarily deployed to github-pages January 27, 2022 10:32 Inactive
@emielver emielver merged commit c8e7567 into release/snowplow-mobile/0.1.0 Jan 27, 2022
@emielver emielver deleted the internal-module-development branch February 7, 2022 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants