-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jinja functions remain static in YAML files on RPC until modified #2330
Comments
In order for
This becomes one level more complex in the RPC server, where partial-parsing is only triggered by startup + sighup, not by submitting a |
If a volatile Jinja variable is needed at parse time (input to Reproduction case: gist I believe our options here are:
I'm leaning toward the non-solution for this one, by documenting the limitation. |
What we did for the 'doc' calls was to add code in the context 'doc' method to save the node referring to the doc in the doc source_file object. We could do something like that for these calls. It seems like we might need to save the current value of the variables somewhere, then check to see if the variable value has changed and mark all of the nodes for reparsing. So something like: { "var_name": { "value": something, "referenced_by": [ { file_id, yaml_key, name}, or {file_id, unique_id}...]. The variables could be both internal things like 'run_started_at', cli variables, and env vars. Statically parsing models would get more complicated, of course. |
@gshank That feels like a fruitful way to approach this: rather than store the reference in the node itself, store the value and a child map of all nodes that will need updating when the value changes. I bet we could do something similar for env vars (#3885). Then, an additional step in partial parsing looks like: compare these values; if any value has changed, schedule dependent nodes for re-parse. Outside of exceptional circumstances (the RPC server! ugh!), The simple version could be: all nodes that call one of these variables anywhere should be considered dependent on that variable. The cleverer, more precise version: all nodes that depend on one of these variables for a parse-time configuration, i.e. ref/source/config/yaml property. We don't need to re-parse nodes that include What do you think is the difficulty of an approach along those lines? Is it worth pursuing now, or punting to after v1.0 (and clearly documenting the limitations in the meantime)? |
At its heart, this issue has more to do with partial parsing than with the RPC server. I'm going to leave it in this repo |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest; add a comment to notify the maintainers. |
Describe the bug
We leverage various jinja functions heavily in our dbt project (e.g.
run_started_at
,invocation_id
, etc.). In addition to using them inside of dbt models, we also use them inside of YAML files, such asschema.yaml
. A real world example for us is as follows:Using
{{ run_started_at }}
in tandem with Snowflake's time travel in the identifier (shown above) is a workaround we have been using to circumvent the non-atomic nature of dbt snapshots (#1884). It effectively prevents the source data from changing out from under us during a snapshot being taken, which happens incredibly often, leading to snapshots duplication bugs.We recently migrated much of our production dbt workload to RPC server and noticed that this
{{ run_started_at }}
jinja function was not changing between runs in our compiled source definitions. We dug deeper and noticed that restarting the server and even manually re-compiling the project on the server did not solve the issue. However, what did make the function update, was modifying the underlyingschema.yaml
file. What makes this even more curious is that when using jinja functions in dbt models, they do seem to update on every single invocation without needing to modify files, restart the server, or recompile. Also worth noting that this behavior is identical whether partial parsing is turned on or off.I'm curious if this is inherent in the way RPC functions and is thus intended by design, or if this is a bug.
Steps To Reproduce
(see above)
Expected behavior
I would expect these functions to behave similarly as it would when using the CLI. Or at the very least, to be able to update these functions using a manual re-compile.
System information
Which database are you using dbt with?
The output of
dbt --version
:The operating system you're using:
macOS Mojave 10.14.6
The output of
python --version
:The text was updated successfully, but these errors were encountered: