-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
arbitrary configuration #146
Comments
@drewbanin this issue is lit. i spent a lot more time thinking about it and think it is insanely useful, beyond what we specified. I think there are a lot of ways that you could add value with model configuration even within the model itself--it allows the publisher of a model to leave certain choices up to the user of the model, giving both sides quite a lot of control over how these common models behave in different environments. here's one example: it's totally legitimate to define MRR with or without discounts applied. some people measure it one way, some measure it another. it's very possible to, with a single flag, change the behavior of a sql model with a simple if then statement triggered from that. this is amazing, and will unlock a TON of options. |
hahahahahah |
I'm super into this idea and I think it's something we should queue up soon. Do you think these new eg. # ./config/snowplow.yml
"Snowplow Dependency Name":
events:
exclude_ip: 192.168.1.1 Also, what's the recommended way of structuring these config files? Do you cram a bunch of dependency configs into one file? Or do you have one file per dependency? Is the name of the config file significant? What happens if two different config files both define the same configuration? Is that a compilation error? Suppose your project depends on project A which in turn depends on project B. Can you specify config options for project B in your config file? I don't know how likely this is, but what if two different dependencies both require project B? Can you configure each sub-dependency differently? @jthandy you don't have to answer all of these questions, but this is the kind of feature which has a simple version and a "correct" version.... I think the prior is acceptable for right this very moment. We should just be careful to structure this in such a way that we can make it the "correct" version when the time comes without starting over from scratch. |
Yeah. I agree. This is non-trivial, and I don't pretend to have great answers to these questions. They're all the right ones. I like your suggestion about configuration files mapping to specific model contexts; that definitely increases the utility. I had imagined they would all be globally accessible. I think the biggest issue with dependencies that we haven't figured out is scope/namespacing, which is a lot of what you are bringing up. How do we address objects in an arbitrarily nested tree of dependencies? Right now we have a very naive answer to that question, and I think that all of your questions stem from that. I agree with your instinct to continue punting on this hard question and implement a useful version of this function that breaks around the edges. Let's see what those failure modes look like, feel the utility of the core feature, and then sit down and have a think about what the right way is. You know, in like several months :) |
let's simplify this for V1 and just pop configs right into the Ex: models:
Snowplow:
events:
base_table: atomic.events |
An optional parameter 'condition' can be passed to the 'expression_is_true' macro to assert the expression for all records which meet a condition. Closes dbt-labs#146
configuration in dbt is incredibly powerful: it is what allows models to change their behavior without changing their code. currently all configuration is done in parameters that we have specified, but that actually limits the user in the power of configuration. we should allow for arbitrary configuration values.
there are multiple locations where config is currently specified. arbitrary config should be able to be specified at each of these locations:
dbt_project.yml
, any key provided that isn't a model or a config value we've defined should be saved as arbitrary config and scoped at the appropriate level in the tree.additionally, we should invent one additional new way of specifying configuration. there should be a new folder in a dbt project at
./config
(configurable withindbt_project.yml
).this folder should contain yml files that have arbitrary key/value stores in them, defined by the user. these key/value pairs can be called in models like
{{key}}
, or like{{nested.key}}
, arbitrarily deeply (matching the nesting present in the yml file) within models. these config values should be accessible by all models.there will likely end up being numerous uses for this, but a primary one will be to allow for configuration of base model schema and table names within dependencies. projects that depend on them can override the default config to point base models to the appropriate schema and table names.
the order of configuration key/value resolution in the case of collisions should be an extension of what it already is today:
./config
folderdbt_project.yml
in each instance, the lower levels should overwrite the higher levels. this is a feature, not a bug.
as a part of this issue, we should change the syntax in
dbt_project.yml
to bemodel-config
, notmodels
. we're not declaring models, we're declaring config for them.The text was updated successfully, but these errors were encountered: