-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Internal module development #3
Internal module development #3
Conversation
19b1b4e
to
248caf2
Compare
5a3874a
to
d03d51f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done. Overall looks really good. There are quite a few files throughout without a new line char at the end. I havent left a comment on each but I think you should be able to do a REGEX search for them and add them in.
I think the biggest change is updating the app errors module to be disabled by default (I don't think snowplow__enable_app_error_context
currently disables it)
Does the GH pages that point to the current branch? If so, once released we should update it to be based on the main branch so the doc site reflects the latest release.
On this note of the docs, do you have plans to write a similar docsite to the web package. There is likely a lot of overlap here so I am open to ideas as to how we best go about this. I think at the very least we should have a quickstart guide for the package.
Also I apologies for all the formatting comments! Me being neurotic.
models/base/manifest/redshift_postgres/snowplow_mobile_base_sessions_lifecycle_manifest.sql
Outdated
Show resolved
Hide resolved
@bill-warner can you give it another look? Should be updated now with everything discussed! |
Sure will have a look ASAP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all those changes. It doesn't appear however to be working correctly for RS, I believe due to issues in the sessions aggs table. I have suggested a fix which I think might resolve it for RS. Not sure what affect it will have on Postgres as I don't have a db to test it against. Would be good to work out why we are having to cast values for Postgres.
Also I notice with app errors disables, you get warnings about having tests on a disabled model. Do you know if there is a way to disable these tests as well as the model?
# dbt_project.yml | ||
... | ||
vars: | ||
snowplow_mobile: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth including a note somewhere highlighting the importance of scoping variables to the package rather than globally if you are also using the web package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in the dbt_project.yml
file, wouldn't we just get rid of all confusion/issues by scoping everything on a package level? E.g.
vars:
snowplow_mobile:
snowplow_events: "{{....}}"
...
snowplow_allow_refresh: false
At least that should be the case based on the order of operations specified by dbt in the link you provided
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give it a try but I have a feeling that even if you do that, if you set global vars in the root project they will override both the web and mobile packages vars (where we have common vars)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, as far as I understand the order the local project namespace is used before globals (even within the same project), but I'll let you know if this helps resolve conflicts while having both projects working at the same time:
- The variables defined on the command line with --vars.
- The package-scoped variable declaration in the dbt_project.yml file
- The global variable declaration in the dbt_project.yml file.
- The variable's default argument (if one is provided).
MAX(case when e.event_index_in_session = e.events_in_session then e.version end) as last_version, | ||
MAX(case when e.event_index_in_session = e.events_in_session then e.event_name end) as last_event_name, | ||
{% if target.type == 'postgres' %} | ||
MAX(case when e.event_index_in_session = e.events_in_session then e.event_id::varchar end) as session_last_event_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of interest what data type is the event_id
before casting to varchar
? Same with session_id
. It feels like if they are the wrong data type then we should be correcting this upstream i.e. in event_this_run. That said, we join on event_id
and session_id
in events_this_run
without needing to cast so I am curious as to at what point the data type changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
event_id
and session_id
are both uuid
s and for some reason Postgres can't deal with that. So in the join it can't implictly cast a uuid
to a varchar
to compare to another varchar
, and it also doesn't know a max
function for a uuid
.
For app_errors
, it sees that column as text
unless I explicitly cast it. I've been able to cast app_errors
upstream in the snowplow_mobile_sessions_this_run
model successfully and have it trickle down, I've done the same for session_id
in snowplow_mobile_base_session_context
and snowplow_mobile_base_sessions_lifecycle_manifest
but for event_id
I can either do it in snowplow_mobile_sessions_aggs
or I'd have to unstar in snowplow_mobile_base_events_this_run
and do it there. I've done the former for now but let me know if you prefer the latter
models/base/scratch/default/contexts/snowplow_mobile_base_app_context.sql
Outdated
Show resolved
Hide resolved
Also have you tested how this functions if you also have the snowplow-web package installed? Might be worth creating a local dbt project and importing both and making sure they play friendly together |
@bill-warner We can discuss it in the next meeting but I created a dbt project with both the mobile and web packages installed and ran it with no errors and all tests passing so I guess that's all good? The custom example runs for both Postgres and Redshift but I redid the documentation so it would be great if you could take a look at that as well. Other than that, again I think everything should be covered that you mentioned in the last couple of reviews. Thanks so much for all the detailed feedback, it's been super helpful!!! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Final few tweaks to make I think. Sorry to go back on my remark around casting values for Postgres. Wasn't aware of that data type. Fully open to other ideas on how to approach that issue, just my 2c.
...wplow_mobile_custom_modules/session_goals/scratch/snowplow_mobile_session_goals_this_run.sql
Outdated
Show resolved
Hide resolved
...xample/models/snowplow_mobile_custom_modules/session_goals/snowplow_mobile_session_goals.sql
Show resolved
Hide resolved
models/base/scratch/base_scratch.yml
Outdated
- not_null | ||
- not_null: | ||
config: | ||
enabled: '{{var("snowplow__enable_application_context")}}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think you will run into the same bug as I did when using a similar approach with the web package. Worth double checking but I think currently if you have enabled snowplow__enable_application_context
within your root project, the model will execute but the tests on it will be disabled still. The work around I chose was to set default values for all these vars inside the actual var
rather than in the packages dbt_project.yml
file. The user can then override these from their root project. That way I believe both the model and tests should be enabled/disabled together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which version of dbt are you using?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also are you importing the package into a dbt project or running directly from within the package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yep sorry, I was running directly from within the package. I guess it's expected behaviour from dbt's perspective, right? According to this page (https://docs.getdbt.com/docs/building-a-dbt-project/building-models/using-variables#variable-precedence), the package variables have precedence over global variables. But I've now changed them to be defined throughout the project with default value of false
, thanks for asking!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure what behaviour you are referring to but having the same var return different values within the package is not expected :)
models/base/scratch/default/contexts/snowplow_mobile_base_app_context.sql
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. One small thing I noticed, the sessions context table in the base module should only be enabled for Postgres and RS. Doesn't really matter at this point given those are the only 2 supported warehouses atm but worth updating.
I haven't had time to run the package + tests post these changes on RS but assuming they have passed then I think we can merge. The same goes for Postgres as I don't have access.
Also once everything is merged to update the Github pages to refer to master no this branch.
Great work on this @emielver & @rahul-snowplow . Thanks for all the effort |
801018f
to
2a5bbb9
Compare
No description provided.