-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reorganize global project (macros) #4154
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @jtcohen6! I can tell you put a lot of thought into this.
I definitely support this change and that it ships as part of 1.0. I'd like to believe these macros will be more stable in 1.0 so the git history is not as important.
Random thought, but with this new macro taxonomy, I'm imagining just looking at an adapters include/
file tree, would give you the ways that an adapter differs from the core/postgres implementation.
questions
- would now (or at least pre-1.0) be a good time to rename adapter macros based on whether they return sql strings or actually execute SQL (
get_x_sql
vsalter_x_do_y
) (previous discussion in dbt Slack) - what's the ideal ordering of macros?
- alphabetically?
- most used to least used?
- those without
default__
versions first?
- related to above, anyway we can sandbox the macros that are required to be implemented for a custom adapter? or maybe we update the scaffolding to give the filetree and files for macros they have to implement? I guess I'd be interested to see what an adapter's filetree would look like in this new model...
@@ -38,6 +38,6 @@ Note that you can also right-click on models to interactively filter and explore | |||
- [What is dbt](https://docs.getdbt.com/docs/overview)? | |||
- Read the [dbt viewpoint](https://docs.getdbt.com/docs/viewpoint) | |||
- [Installation](https://docs.getdbt.com/docs/installation) | |||
- Join the [chat](https://community.getdbt.com/) on Slack for live questions and support. | |||
- Join the [dbt Community](https://www.getdbt.com/community/) for questions and discussion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
side note: do we need these auto-generated docs here even?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what's included by default in the dbt-docs site Overview
, until/unless users provide a custom overview. It could be simpler / have less info in it — is that what you're thinking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. but does the global_project
need it? global_project
doesn't show up in anyones dbt docs, nor are any of the macros documented? I guess this file is just present in all scaffolded dbt projects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right - this is how it gets into users' project, by default.
It's an opinionated choice to exclude internal/global macros from the dbt-docs site; they're included in manifest.json
, just like every other macro. I'd like to document global macros (perhaps a community-sourced effort??), which we can use to power other cool metadata-y things down the line
{% macro get_columns_in_relation(relation) -%} | ||
{{ return(adapter.dispatch('get_columns_in_relation', 'dbt')(relation)) }} | ||
{% endmacro %} | ||
|
||
{% macro sql_convert_columns_in_relation(table) -%} | ||
{% set columns = [] %} | ||
{% for row in table %} | ||
{% do columns.append(api.Column(*row)) %} | ||
{% endfor %} | ||
{{ return(columns) }} | ||
{% endmacro %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest stronger conformity to the pattern of the default__
macro directly following the parent macro
{% macro get_columns_in_relation(relation) -%} | |
{{ return(adapter.dispatch('get_columns_in_relation', 'dbt')(relation)) }} | |
{% endmacro %} | |
{% macro sql_convert_columns_in_relation(table) -%} | |
{% set columns = [] %} | |
{% for row in table %} | |
{% do columns.append(api.Column(*row)) %} | |
{% endfor %} | |
{{ return(columns) }} | |
{% endmacro %} | |
{% macro sql_convert_columns_in_relation(table) -%} | |
{% set columns = [] %} | |
{% for row in table %} | |
{% do columns.append(api.Column(*row)) %} | |
{% endfor %} | |
{{ return(columns) }} | |
{% endmacro %} | |
{% macro get_columns_in_relation(relation) -%} | |
{{ return(adapter.dispatch('get_columns_in_relation', 'dbt')(relation)) }} | |
{% endmacro %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree!
{# here for backwards compatibility #} | ||
{% macro drop_relation_if_exists(relation) %} | ||
{% if relation is not none %} | ||
{{ adapter.drop_relation(relation) }} | ||
{% endif %} | ||
{% endmacro %} | ||
|
||
{# here for backwards compatibility #} | ||
{% macro load_relation(relation) %} | ||
{% do return(adapter.get_relation( | ||
database=relation.database, | ||
schema=relation.schema, | ||
identifier=relation.identifier | ||
)) -%} | ||
{% endmacro %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in a post 1.0 world, do we need to keep these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are definitely used in other plugins today. My preference would be to:
- call out that they're just pass-throughs for other macros (via these comments)
- avoid calling them in any global materializations/macros
- update our (Labs-maintained) plugins to avoid using them
- not to remove them entirely — I don't see a ton of harm in keeping them around, and it's one more bump in the road for plugins/projects upgrading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In retrospect, load_relation
actually offers a real convenience. I'm going to add it back
{% macro column_list(columns) %} | ||
{%- for col in columns %} | ||
{{ col.name }} {% if not loop.last %},{% endif %} | ||
{% endfor -%} | ||
{% endmacro %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps justifying this PR further, but this is the first time I've seen this macro before... ah! /global_project/macros/materializations/helpers.sql
! guess i never looked there before... now I'm racking my brain for instances of this macro not being used where it should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, we've probably reinvented this wheel a dozen+ times. I didn't realize it existed until I did the work for this PR either. We have some utilities in core that we've used for various materializations, but some of them aren't even being used anymore. At this point, we could delete them entirely, or to keep them around forever—either feels like a valid approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see column_list
or column_list_for_create_table
being used in any of our global or plugin macros/materializations, so I think I'm just going to snip these out
{% macro should_store_failures() %} | ||
{% set config_store_failures = config.get('store_failures') %} | ||
{% if config_store_failures is none %} | ||
{% set config_store_failures = flags.STORE_FAILURES %} | ||
{% endif %} | ||
{% do return(config_store_failures) %} | ||
{% endmacro %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this used outside of the test materialization? for me this configs.sql
is the once file i feel iffy about existing, but I see where you're coming from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, should_store_failures()
feels very closely related to should_full_refresh()
. They have almost identical logic. The should_full_refresh()
macro is used by both the incremental model materialization + seeds. Do you think it would make more sense to move configs.sql
to macros/materializations/
instead of macros/etc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've decided, yes, it would make more sense to have these in macros/materializations/configs.sql
{% macro diff_columns(source_columns, target_columns) %} | ||
|
||
{% set result = [] %} | ||
{% set source_names = source_columns | map(attribute = 'column') | list %} | ||
{% set target_names = target_columns | map(attribute = 'column') | list %} | ||
|
||
{# --check whether the name attribute exists in the target - this does not perform a data type check #} | ||
{% for sc in source_columns %} | ||
{% if sc.name not in target_names %} | ||
{{ result.append(sc) }} | ||
{% endif %} | ||
{% endfor %} | ||
|
||
{{ return(result) }} | ||
|
||
{% endmacro %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious (but no strong opinion), on the home for helper macros that are only currently used for a specific materialization... I like the categorization but maybe unnecessary complexity to go to another file when no other macros are using it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good thought. As above, the question for me is:
- Should we consider these "private" methods that exist for internal (global materialization) use only, not to be documented, and used at users' own risk
- Should dbt-core's global project become more of a global repository of handy, nice-to-have helper macros like this one
- Somewhere in between: these are mostly internal methods, they're useful for adapter plugin maintainers when extending
dbt-core
to new databases, but they shouldn't be called / relied on any old project
When I put it like that, I actually find myself leaning toward 2 or 3, rather than 1. Which is to say, they should probably live in a file like materializations/internal_helpers
{% macro create_table_as(temporary, relation, sql) -%} | ||
{{ adapter.dispatch('create_table_as', 'dbt')(temporary, relation, sql) }} | ||
{%- endmacro %} | ||
|
||
{% macro default__create_table_as(temporary, relation, sql) -%} | ||
{%- set sql_header = config.get('sql_header', none) -%} | ||
|
||
{{ sql_header if sql_header is not none }} | ||
|
||
create {% if temporary: -%}temporary{%- endif %} table | ||
{{ relation.include(database=(not temporary), schema=(not temporary)) }} | ||
as ( | ||
{{ sql }} | ||
); | ||
|
||
{% endmacro %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love that these macros be broken into their own files! nit: can we make it create_table_as.sql
? and do the same thing for create_view_as
macro (create_view_as.sql
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, absolutely!
@@ -574,7 +574,7 @@ def _verify_generic_macro_structure(self, manifest): | |||
if k not in {'macro_sql'} | |||
} | |||
# Windows means we can't hard-code these. | |||
helpers_path = Normalized('macros/materializations/helpers.sql') | |||
helpers_path = Normalized('macros/etc/column_helpers.sql') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this scares me. should I be scared?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No cause for fear. The comment a few lines up:
# just test a known global macro to avoid having to update this every
# time they change.
We want to verify that global/internal macros are included in manifest.json
, and to do that we just test that one of them exists, with all of its properties, including its path.
@swanderz Thank you for the detailed + thoughtful feedback!! Questions:
Yes. I think
Good question! My informal guidelines have been:
Yes! All of these macros are sandboxed in Right now, I have the macros grouped by function ( Of course, some adapters will need to reimplement the core materializations in addition to the macros in the To your earlier comment, all of these tweaks together will enable someone to quickly glance at an adapter's |
9d9cfaf
to
ec98984
Compare
* Use the dbt Events Module * Upgrade dbt-core to 1.0.0 * Add a profile_template.yml file * Reorganize macros following dbt-labs/dbt-core#4154 * Use MANIFEST.in instead of package_data see: https://stackoverflow.com/questions/7522250/how-to-include-package-data-with-setuptools-distutils * Do not quote seed columns see dbt-labs/dbt-core@64fc3a3 * Upgrade to 1.0.1
* Add integration tests * Reorganize + dispatch more global macros * Reorg materializations subdir * Move around + document generic tests * Fix failing tests * Fix merge conflict * Grab fix from #4148 * PR feedback * Fixup * Add load_relation back, it was nice * Last few test fixes * Rm incremental_upsert, now unused * Add changelog entry automatic commit by git-black, original commits: 8442fb6
Picking up from #3858. I got the itch for some autumnal cleaning. It's just too hard today to find the macros I'm looking for, or to even know where they might be. I'd really like this to be better for v1.
Dream scenario: We also document (
description
+args
) the most important global macros, especially ones that users are likely to override/reimplement. That can come after rc1, and it can be a team effort :)Trade-off:
git blame
to understand when + why they've changed in the past (but they don't change so so often, and usually not one critical line at a time, save for pesky syntax errors)Also:
test
part) intotests/generic
, following Er/3250 generic tests #4052, with a comment explaining why most of the actual SQL lives separately — do we like this?No intention for functional change in this PR. Very open to feedback!
Checklist
This PR includes tests, or tests are not required/relevant for this PR— existing tests all passing (🤞)CHANGELOG.md
and added information about my change