Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CT 2196, CT2121 constraints column order #7161
CT 2196, CT2121 constraints column order #7161
Changes from 2 commits
b3002e1
e257ef1
4d06b6c
d64810b
7780040
bbc18b7
561a4a7
3ea1c0a
39fa7b1
852a4b8
bdd555f
6a98e60
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously column types were compared based on their formatted representation, which could be data platform specific as implemented by the
<adapter>__format_column
macro.For example, BigQuery's
format_column
implementation compares data_type values bycolumn.data_type
as opposed tocolumn.dtype
in order to make comparisons on nested data types.Strictly comparing SQL and yml data_type values by
dtype
would allow the following contract to be accepted in BigQuery:One workaround would be to do this comparison using
format_column
instead, i.e:adapter.dispatch('format_column', 'dbt')(sql_col) != adapter.dispatch('format_column', 'dbt')(yaml_col[0])
. This would also ensure the comparison and error messaging are using consistent logic. An alternative would be to implement adefault__compare_data_types
macro to enable adapter-specific implementations.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at using format_column but the default implementation of that just uses the dtype. Is that different in the other adapters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't actually see any implementations of format_column in the adapters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I was on the wrong bigquery branch. Bigquery is the only adapter that implements it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we include the formatted column in the Column structures returned by "get_column_schema_from_query"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or concatenate the other parts of the column structure for comparison?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a lot of logic to put in a jinja template. Could we put this in a python function and then wrap that python function in this macro? I'm cutting corners, but this is what's in my head:
columns_spec_ddl.sql
dbt/adapters/base/impl.py
(for lack of a better spot)I think the python version would be much easier to unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking through what this might look like across all potential adapters, and if we need to add contract-related items in the future. With that context, I have the following questions:
default__get_select_subquery
if we already validatedget_assert_columns_equivalent
? We validate that the number of columns are the same, and the names are the same. So I think the subquery already limits to just the columns that we want.get_assert_columns_equivalent
might have been renamedassert_columns_equivalent
.With my assumptions (not necessarily true of course):
This would maintain backwards compatibility because we're keeping the macro
create_table_as
, which is what would have been overridden. And if we need to update only one of these in the future, it isolates the updates, instead of impacting all "create table" workflows. I'm open to feedback and I'm just trying to communicate a thought here. Please let me know what you think of these recommendations.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the get_select_subquery in order to put the columns in the right order. We removed validating the order because we added the subquery. As far as splitting out the macro into two, probably @jtcohen6 and @MichelleArk should weigh in on that.