-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for Explicit Schema Definitions (DDLs) when Creating a Table / Model #2191
Comments
Thanks for opening this thoughtful issue @ericxiao251 -- see also a previous discussion over here: #1438 I don't have any problem at all with this on BigQuery and Snowflake. I don't think this should be the default though. We also have an issue for adding a "strict" config to schema.yml files: #1570 . Maybe there's some sensible interop between these two features? Maybe something like:
I think |
We're still very new to DBT, but my initial thoughts were to enable this option in a new version, looking at #1570, we would probably enforce this if |
hi! is there any development on this? thanks |
This would be a positive development for our team. Some positives:
Can take advantage of some features offered by table definitions:
Now, as an alternative for those looking to utilize
some pseudocode:
or
|
I built a working example in BigQuery that @smomen suggested, so let me share it.
|
Friends, worth keeping an eye on #6079 which might satisfy this issue mostly. You'll get not-null enforcement at execution time. @mmiyahara what do you think? |
Is this issue solved by #6271 ? Model contracts are a new feature that dbt-labs has been working on. |
Short answer: yes! |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
Describe the feature
For a given DBT Model, we would like the ability to specify the column names, type, description and nullability explicitly. We are using BigQuery and this is possible within a
CTAS
query.We want this feature mainly so that we can prevent creating any tables with columns that should not be null that have null values in it. We do not want to do a check afterwards with a
dbt test
, as downstream datasets /reports can potentially be using "corrupted" data.i.e.
reference: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#create_table_examples
Describe alternatives you've considered
Adding a nullability check in a model's test, so that
dbt test
can do this check, but there are a lot of downsides to doing it this way:Additional context
We mainly work with BigQuery and we know that this is possible with BigQuery, not sure about the other data warehouses.
I think it would just be overall beneficial to have the check done at creation time, rather than post creation time.
This would give more context to users in the warehouse, when they look at the table schema, ie. what is this column? is this column nullable? etc.
Who will this benefit?
Anyone using BigQuery, this would reduce their warehousing costs significantly.
The text was updated successfully, but these errors were encountered: