Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganize global project (macros) #4154

Merged
merged 13 commits into from
Nov 8, 2021
Merged

Reorganize global project (macros) #4154

merged 13 commits into from
Nov 8, 2021

Conversation

jtcohen6
Copy link
Contributor

@jtcohen6 jtcohen6 commented Oct 28, 2021

Picking up from #3858. I got the itch for some autumnal cleaning. It's just too hard today to find the macros I'm looking for, or to even know where they might be. I'd really like this to be better for v1.

Dream scenario: We also document (description + args) the most important global macros, especially ones that users are likely to override/reimplement. That can come after rc1, and it can be a team effort :)

Trade-off:

  • Much easier to find these macros in the future
  • Much harder to use git blame to understand when + why they've changed in the past (but they don't change so so often, and usually not one critical line at a time, save for pesky syntax errors)

Also:

  • dispatch more macros
  • move generic tests (just the test part) into tests/generic, following Er/3250 generic tests #4052, with a comment explaining why most of the actual SQL lives separately — do we like this?

No intention for functional change in this PR. Very open to feedback!

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR — existing tests all passing (🤞)
  • I have updated the CHANGELOG.md and added information about my change

@cla-bot cla-bot bot added the cla:yes label Oct 28, 2021
@jtcohen6 jtcohen6 marked this pull request as ready for review October 28, 2021 13:04
Copy link
Contributor

@dataders dataders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @jtcohen6! I can tell you put a lot of thought into this.

I definitely support this change and that it ships as part of 1.0. I'd like to believe these macros will be more stable in 1.0 so the git history is not as important.

Random thought, but with this new macro taxonomy, I'm imagining just looking at an adapters include/ file tree, would give you the ways that an adapter differs from the core/postgres implementation.

questions

  • would now (or at least pre-1.0) be a good time to rename adapter macros based on whether they return sql strings or actually execute SQL (get_x_sql vs alter_x_do_y) (previous discussion in dbt Slack)
  • what's the ideal ordering of macros?
    • alphabetically?
    • most used to least used?
    • those without default__ versions first?
  • related to above, anyway we can sandbox the macros that are required to be implemented for a custom adapter? or maybe we update the scaffolding to give the filetree and files for macros they have to implement? I guess I'd be interested to see what an adapter's filetree would look like in this new model...

@@ -38,6 +38,6 @@ Note that you can also right-click on models to interactively filter and explore
- [What is dbt](https://docs.getdbt.com/docs/overview)?
- Read the [dbt viewpoint](https://docs.getdbt.com/docs/viewpoint)
- [Installation](https://docs.getdbt.com/docs/installation)
- Join the [chat](https://community.getdbt.com/) on Slack for live questions and support.
- Join the [dbt Community](https://www.getdbt.com/community/) for questions and discussion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note: do we need these auto-generated docs here even?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what's included by default in the dbt-docs site Overview, until/unless users provide a custom overview. It could be simpler / have less info in it — is that what you're thinking?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. but does the global_project need it? global_project doesn't show up in anyones dbt docs, nor are any of the macros documented? I guess this file is just present in all scaffolded dbt projects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right - this is how it gets into users' project, by default.

It's an opinionated choice to exclude internal/global macros from the dbt-docs site; they're included in manifest.json, just like every other macro. I'd like to document global macros (perhaps a community-sourced effort??), which we can use to power other cool metadata-y things down the line

Comment on lines 1 to 17
{% macro get_columns_in_relation(relation) -%}
{{ return(adapter.dispatch('get_columns_in_relation', 'dbt')(relation)) }}
{% endmacro %}

{% macro sql_convert_columns_in_relation(table) -%}
{% set columns = [] %}
{% for row in table %}
{% do columns.append(api.Column(*row)) %}
{% endfor %}
{{ return(columns) }}
{% endmacro %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest stronger conformity to the pattern of the default__ macro directly following the parent macro

Suggested change
{% macro get_columns_in_relation(relation) -%}
{{ return(adapter.dispatch('get_columns_in_relation', 'dbt')(relation)) }}
{% endmacro %}
{% macro sql_convert_columns_in_relation(table) -%}
{% set columns = [] %}
{% for row in table %}
{% do columns.append(api.Column(*row)) %}
{% endfor %}
{{ return(columns) }}
{% endmacro %}
{% macro sql_convert_columns_in_relation(table) -%}
{% set columns = [] %}
{% for row in table %}
{% do columns.append(api.Column(*row)) %}
{% endfor %}
{{ return(columns) }}
{% endmacro %}
{% macro get_columns_in_relation(relation) -%}
{{ return(adapter.dispatch('get_columns_in_relation', 'dbt')(relation)) }}
{% endmacro %}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree!

Comment on lines 69 to 76
{# here for backwards compatibility #}
{% macro drop_relation_if_exists(relation) %}
{% if relation is not none %}
{{ adapter.drop_relation(relation) }}
{% endif %}
{% endmacro %}

{# here for backwards compatibility #}
{% macro load_relation(relation) %}
{% do return(adapter.get_relation(
database=relation.database,
schema=relation.schema,
identifier=relation.identifier
)) -%}
{% endmacro %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in a post 1.0 world, do we need to keep these?

Copy link
Contributor Author

@jtcohen6 jtcohen6 Nov 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are definitely used in other plugins today. My preference would be to:

  • call out that they're just pass-throughs for other macros (via these comments)
  • avoid calling them in any global materializations/macros
  • update our (Labs-maintained) plugins to avoid using them
  • not to remove them entirely — I don't see a ton of harm in keeping them around, and it's one more bump in the road for plugins/projects upgrading

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In retrospect, load_relation actually offers a real convenience. I'm going to add it back

Comment on lines 1 to 5
{% macro column_list(columns) %}
{%- for col in columns %}
{{ col.name }} {% if not loop.last %},{% endif %}
{% endfor -%}
{% endmacro %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps justifying this PR further, but this is the first time I've seen this macro before... ah! /global_project/macros/materializations/helpers.sql! guess i never looked there before... now I'm racking my brain for instances of this macro not being used where it should be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, we've probably reinvented this wheel a dozen+ times. I didn't realize it existed until I did the work for this PR either. We have some utilities in core that we've used for various materializations, but some of them aren't even being used anymore. At this point, we could delete them entirely, or to keep them around forever—either feels like a valid approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see column_list or column_list_for_create_table being used in any of our global or plugin macros/materializations, so I think I'm just going to snip these out

Comment on lines 15 to 21
{% macro should_store_failures() %}
{% set config_store_failures = config.get('store_failures') %}
{% if config_store_failures is none %}
{% set config_store_failures = flags.STORE_FAILURES %}
{% endif %}
{% do return(config_store_failures) %}
{% endmacro %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used outside of the test materialization? for me this configs.sql is the once file i feel iffy about existing, but I see where you're coming from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, should_store_failures() feels very closely related to should_full_refresh(). They have almost identical logic. The should_full_refresh() macro is used by both the incremental model materialization + seeds. Do you think it would make more sense to move configs.sql to macros/materializations/ instead of macros/etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've decided, yes, it would make more sense to have these in macros/materializations/configs.sql

Comment on lines 27 to 42
{% macro diff_columns(source_columns, target_columns) %}

{% set result = [] %}
{% set source_names = source_columns | map(attribute = 'column') | list %}
{% set target_names = target_columns | map(attribute = 'column') | list %}

{# --check whether the name attribute exists in the target - this does not perform a data type check #}
{% for sc in source_columns %}
{% if sc.name not in target_names %}
{{ result.append(sc) }}
{% endif %}
{% endfor %}

{{ return(result) }}

{% endmacro %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious (but no strong opinion), on the home for helper macros that are only currently used for a specific materialization... I like the categorization but maybe unnecessary complexity to go to another file when no other macros are using it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good thought. As above, the question for me is:

  1. Should we consider these "private" methods that exist for internal (global materialization) use only, not to be documented, and used at users' own risk
  2. Should dbt-core's global project become more of a global repository of handy, nice-to-have helper macros like this one
  3. Somewhere in between: these are mostly internal methods, they're useful for adapter plugin maintainers when extending dbt-core to new databases, but they shouldn't be called / relied on any old project

When I put it like that, I actually find myself leaning toward 2 or 3, rather than 1. Which is to say, they should probably live in a file like materializations/internal_helpers

Comment on lines 1 to 16
{% macro create_table_as(temporary, relation, sql) -%}
{{ adapter.dispatch('create_table_as', 'dbt')(temporary, relation, sql) }}
{%- endmacro %}

{% macro default__create_table_as(temporary, relation, sql) -%}
{%- set sql_header = config.get('sql_header', none) -%}

{{ sql_header if sql_header is not none }}

create {% if temporary: -%}temporary{%- endif %} table
{{ relation.include(database=(not temporary), schema=(not temporary)) }}
as (
{{ sql }}
);

{% endmacro %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love that these macros be broken into their own files! nit: can we make it create_table_as.sql? and do the same thing for create_view_as macro (create_view_as.sql)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, absolutely!

@@ -574,7 +574,7 @@ def _verify_generic_macro_structure(self, manifest):
if k not in {'macro_sql'}
}
# Windows means we can't hard-code these.
helpers_path = Normalized('macros/materializations/helpers.sql')
helpers_path = Normalized('macros/etc/column_helpers.sql')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this scares me. should I be scared?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No cause for fear. The comment a few lines up:

        # just test a known global macro to avoid having to update this every
        # time they change.

We want to verify that global/internal macros are included in manifest.json, and to do that we just test that one of them exists, with all of its properties, including its path.

@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Nov 3, 2021

@swanderz Thank you for the detailed + thoughtful feedback!!

Questions:

would now (or at least pre-1.0) be a good time to rename adapter macros based on whether they return sql strings or actually execute SQL (get_x_sql vs alter_x_do_y)?

Yes. I think create_table_as and create_view_as are the biggest apostates from this convention. I just updated them to be get_create_table_as_sql and get_create_view_as_sql by default, with the old macros serving as passthroughs for backwards compatibility.

what's the ideal ordering of macros? alphabetically? most used to least used? those without default__ versions first?

Good question! My informal guidelines have been:

  • We should split up macros into enough files that they are grouped more meaningfully, and so that you don't need to scroll very much or at all (<100 lines)
  • Materializations should always be alone in their own files
  • Within each (much smaller) file, macros should proceed in as "logical" of an order as we can muster. The dispatched macro should precede the default__ implementation. If macro_a is called by macro_b, macro_a should precede macro_b.
  • Beyond that, I don't have super-strong feelings. I think this reorganization is a very partial step on the way toward real macro discovery/discoverability, and we'll need other tools to help.

related to above, anyway we can sandbox the macros that are required to be implemented for a custom adapter? or maybe we update the scaffolding to give the filetree and files for macros they have to implement? I guess I'd be interested to see what an adapter's filetree would look like in this new model...

Yes! All of these macros are sandboxed in global_project/macros/adapters. These are the macros we expect an adapter plugin will need or want to override, since it's behavior that differs across databases.

Right now, I have the macros grouped by function (columns vs. freshness vs. persist_docs), but we could instead group them by which ones have default implementations vs. which must be implemented for a new custom adapter. I think my preference is, as you say, to update the adapter scaffolding to use the same functional file tree when they override/reimplement, rather than one huge file named adapters.sql.

Of course, some adapters will need to reimplement the core materializations in addition to the macros in the adapters/ subdirectory, but we should aspire to evade that necessity as often as we can.

To your earlier comment, all of these tweaks together will enable someone to quickly glance at an adapter's include/<adapter>/macros directory to see how much it differs from core/postgres/default.

@jtcohen6 jtcohen6 requested a review from leahwicz November 3, 2021 11:54
@jtcohen6 jtcohen6 self-assigned this Nov 7, 2021
@jtcohen6 jtcohen6 force-pushed the reorg/global-macros branch from 9d9cfaf to ec98984 Compare November 7, 2021 18:14
@jtcohen6 jtcohen6 merged commit 8442fb6 into main Nov 8, 2021
@jtcohen6 jtcohen6 deleted the reorg/global-macros branch November 8, 2021 18:09
@courentin courentin mentioned this pull request Dec 19, 2021
5 tasks
courentin added a commit to courentin/dbt-athena that referenced this pull request Dec 20, 2021
courentin added a commit to courentin/dbt-athena that referenced this pull request Dec 21, 2021
courentin added a commit to courentin/dbt-athena that referenced this pull request Dec 21, 2021
courentin added a commit to courentin/dbt-athena that referenced this pull request Dec 21, 2021
courentin added a commit to courentin/dbt-athena that referenced this pull request Dec 21, 2021
Tomme pushed a commit to Tomme/dbt-athena that referenced this pull request Feb 7, 2022
* Use the dbt Events Module

* Upgrade dbt-core to 1.0.0

* Add a profile_template.yml file

* Reorganize macros following dbt-labs/dbt-core#4154

* Use MANIFEST.in instead of package_data
see: https://stackoverflow.com/questions/7522250/how-to-include-package-data-with-setuptools-distutils

* Do not quote seed columns
see dbt-labs/dbt-core@64fc3a3

* Upgrade to 1.0.1
iknox-fa pushed a commit that referenced this pull request Feb 8, 2022
* Add integration tests

* Reorganize + dispatch more global macros

* Reorg materializations subdir

* Move around + document generic tests

* Fix failing tests

* Fix merge conflict

* Grab fix from #4148

* PR feedback

* Fixup

* Add load_relation back, it was nice

* Last few test fixes

* Rm incremental_upsert, now unused

* Add changelog entry

automatic commit by git-black, original commits:
  8442fb6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants