Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1556] Smarter handling of --vars in partial parsing #6323

Open
jtcohen6 opened this issue Nov 28, 2022 · 8 comments
Open

[CT-1556] Smarter handling of --vars in partial parsing #6323

jtcohen6 opened this issue Nov 28, 2022 · 8 comments

Comments

@jtcohen6
Copy link
Contributor

Just like #3885, but for CLI --vars.

This would require us to capture, at parse time, which files depend on which --vars, via calls to the Jinja {{ var() }} function. That would also include macros that call var(), and are then called by models / other macros in turn.

For Python models, if we introduce a built-in dbt.var() function, we'd want to do the same. We're already doing something similar for configs, to power config.get() at runtime.

Whenever the --vars change, instead of triggering a full re-parse, we'd schedule just the files that depend on the var for re-parsing. Of course, if the var is used for a configuration within dbt_project.yml, that could still affect many many nodes.

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label May 28, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Jun 4, 2023

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 4, 2023
@ChenyuLInx
Copy link
Contributor

If possible, can we separate this handling from parsing? What I am thinking is that parsing is going from everything in project file -> a representation, then in actual runtime, we apply configuration to the representation and do the execution.

@ChenyuLInx ChenyuLInx reopened this Aug 6, 2023
@github-actions github-actions bot removed the stale Issues that have gone stale label Aug 6, 2023
@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Aug 6, 2023

@ChenyuLInx Supportive of this line of thinking! The biggest caveats here is that vars can be used to dynamically disable/enable models, or to conditionally affect relationships between models — so it is necessary to resolve some vars during parsing in order to know the shape of the DAG, and to support node selection.

During parsing, we could store pointers to those variables, and then conditionally reevaluate them just before each execution. That feels similar to the approach described in this issue (partial parsing), though with some subtle differences in implementation.

@gshank
Copy link
Contributor

gshank commented Aug 7, 2023

Yeah, there's a difference between vars that are needed at parse time and vars can be resolved at compilation/execution time. Maybe we need some use cases to help think through the different situations. Vars in configs have to be resolved at parse time. Vars in plain sql could be delayed. I'm not sure how we could distinguish between them.

@jtcohen6
Copy link
Contributor Author

@gshank Do you know if the partial parsing manifest (target/partial_parse.msgpack) contains enough information (raw file contents & unrendered yaml configurations/attributions), such that we could support a re-parse when CLI --vars are supplied, without needing to go back to the actual file system?

I'm thinking:

  • we can skip the need to re-read files (sooner)
  • we can also add support for "partial" re-parsing of only those files/nodes which are actually affected by the --vars override (the original scope of this issue)

Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Feb 18, 2024
@jtcohen6 jtcohen6 removed the stale Issues that have gone stale label Feb 19, 2024
@ChenyuLInx ChenyuLInx self-assigned this Apr 3, 2024
@kp-tom-sc
Copy link

We use the cli --vars to pass in airflow datetime variables. These change on each run, so we can't partial parse.
Is there a better way of handling datetime variables?
Can we have an ignorelist of some variable names (so that they don't trigger the partial parse) (or similiar to the secret env var ignore rule, some var prefix like VAR_NO_PARSE_my_datetime)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants