status | date | deciders | consulted | informed |
---|---|---|---|---|
accepted |
2022-03-01 |
Joel Labes and Jeremy Cohen |
dbt community |
dbt community |
dbt_utils
is the most-used package in the dbt Hub by a wide margin and it is installed in 1/3 of weekly active projects (as-of early 2022). The functionality with this package can be categorized into different use cases (each having their own rate of iteration):
- Cross-database macros serve as a foundation to enable compatibility in other packages
- Macros to abstract away complex work
- Useful tests which aren't built into dbt Core
- A catchall for experiments
The dbt_utils
package is doing a lot, and it could be split up into more logical chunks. If we pull out each category into a stand-alone package, they can each do their own thing without interfering with one another.
How would this affect users, package maintainers, and adapter maintainers?
For each category of functionality, there are four main options for its future home:
- stay in
dbt_utils
- move to its own stand-alone package or another existing repository (e.g. dbt-expectations in the case of tests, and dbt-labs-experimental-features in the case of experiments)
- move to definition in Core, implementation in adapters
- complete abandonment / deprecation
Since there are four categories and 4 possibilities for destinations, that gives 4^4 = 256 unique options. Rather than enumerate all of them, we'll restrict discussion to a shorter list:
- Migrate cross-db functions from
dbt_utils
to definition in Core, implementation in adapters - Split
dbt_utils
into multiple stand-alone packages - Keep
dbt_utils
as-is
Chosen option: "Migrate cross-db functions from dbt_utils
to definition in Core, implementation in adapters", because
that was the consensus that emerged from the discussion in dbt-utils #487.
Passthroughs will be left behind for migrated macros (so that calls to dbt_utils.hash
don't suddenly start failing). New cross-database macros can be added in minor and major releases for dbt Core (but not patch releases). End users will retain the ability to use dispatch
to shim/extend packages to adapters that don't yet support a particular macro.
Each moved macro will be validated by leaving a definition in dbt_utils
and dispatching it to dbt-core
. Independent continuous integration (CI) testing will exist within dbt-core
, adapters, and dbt_utils
using the new pytest framework.
- Good, because common, reusable functionality that differs across databases will work "out of the box"
- Good, because functionality can subjected to more rigorous testing
- Good, because we hope that many package vendors could drop their dependencies on
dbt_utils
altogether, which makes version resolution easier - Good, because it's slightly more convenient to reference the macro as
dateadd
instead ofdbt_utils.dateadd
- Good, because overriding global macros is more simple than overriding package macros
- Good, because changes to macros are more clearly tied to
dbt-core
versions, rather than needing to worry about breaking changes in the matrix ofdbt-core
+dbt_utils
minor versions - Good, because it establishes a precedent and pathway for battle-testing and maturing functionality before being promoted to Core
- Neutral, because new cross-database macros will need to wait for the next minor (or major version) of
dbt-core
-- patch versions aren't an option- End users can use
dispatch
or the macro can be added to a release ofdbt_utils
until it is promoted todbt-core
- End users can use
- Bad, because higher barrier to contribution
- to contribute to
dbt_utils
today, you just need to be a fairly skilled user of dbt. Even the integration tests are "just" a dbt project. To contribute todbt-core
or adapter plugins, you need to also know enough to set up a local development environment, to feel comfortable writing/updating Pythonic integration tests.
- to contribute to
- Bad, because unknown maturity
- adding these macros into
dbt-core
"locks" them in. Changes to any macros may result in uglier code due to our commitment to backwards compatibility (e.g. addition of new arguments)
- adding these macros into
- Bad, because less macro discoverability
- Arguably, the macros in
dbt-core
are less discoverable than the ones indbt_utils
. This can be mitigated somewhat via significant manual effort over at docs.getdbt.com
- Arguably, the macros in
- Bad, because less opportunity to teach users about macros/packages
- The fact that so many projects install
dbt_utils
feels like a good thing — in the process, users are prompted to learn about packages (an essential dbt feature), explore other available packages, and realize that anything written indbt_utils
is something they fully have the power to write themselves, in their own projects. (That's not the case for most code indbt-core
+ adapter plugins). In particular, users can write their own generic tests. We want to empower users to realize that they can write their own and not feel constrained by what's available out of the box.
- The fact that so many projects install
- Good, because all the tests could be in one package, which would make the purpose of each package more clear and logically separated.
- Bad, because it is easier to install a single package and then discover more functionality within it. It is non-trivial to search the whole hub for more packages which is a higher barrier than looking within a single
dbt_utils
package curated by dbt Labs.
- Good, because we wouldn't have to do anything.
- Good, because the user only has to install one package and gets a ton of functionality.
- Bad, because it feels like the
dbt_utils
package is trying to do too much. - Bad, because each category of macros can't target their own users and dictate their own rate of iteration.
The initial public discussion is in dbt-utils #487, and dbt-core #4813 captures the main story.