-
Notifications
You must be signed in to change notification settings - Fork 1.7k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs+tests directly in a model #2084
Comments
Thanks for opening this issue @dubravcik! I do like the idea of colocating the docs for a model along with its definition, but I have reservations about how well this scheme will work in practice. What if you wanted to add column-level tests in here too - would you want to encode all of that information in the model config? And what if there are dozens and dozens of columns? I think that cramming all of that information into a single |
Theoretically, I'd like to move the yaml inside a model, but it is not possible afaik. We have to write in jinja expression in python, i.e. dictionary. Or is there any other option? If it has to be a dictionary, can we have multiple config expressions? Then it just about preference between the syntax of python or yaml. If we put the docs and tests closer to the output I think it is not so noisy. {{
config(
materialized = "table"
)
}}
WITH cte AS (
....
)
SELECT a.[SK_Account_Master]
,[Account Name] = [NAME_AccountName]
,[Account Level] = [NAME_AccountLevel]
,[Account Type] = [NAME_AccountType]
,[Is Testing Account] = [FLAG_IsTestingAccount]
,[Partner Revenue] = [AMT_PartnerRevenue_cur]
,[Partner Revenue LFY] = [AMT_PartnerRevenue_LFY_cur]
,[Partner Revenue TFY] = [AMT_PartnerRevenue_TFY_cur]
,[Partner Revenue Y] = [AMT_PartnerRevenue_12M_cur]
,a.[SK_User_Master_Inserted]
,a.[SK_User_Master_Modified]
,a.[SK_Account_Master_Parent]
,a.[SK_Account_Master_SuperParent]
,a.[SK_Account_Master_Partner]
,a.[SK_Partners_Partner_Master]
,a.[SK_Geography_Master]
,a.[SK_Currency_Master]
,a.[SK_User_Master_TSM]
,a.[SK_User_Master_AccountManager]
,a.[SK_User_Master_AccountExecutive]
,a.[SK_User_Master_CSM]
,a.[SK_Campaign_Master_SourceCampaign]
,[DTIME Inserted] = a.[DTIME_Inserted]
,[DATE Inserted] = a.[SK_Date_Inserted]
,[DTIME Modified] = a.[DTIME_Modified]
,[DATE Modified] = a.[SK_Date_Modified]
FROM {{ source('account') }} a
LEFT JOIN cte ...
{{
config(
description = "account is company"
columns = [
{
"name": "SK_Account_Master",
"description": "primary key of Account",
"tests": ["unique", "not_null"]
},
{
"name": "Account name",
"description": "business name" },
{
"name": "Account Level",
"description": "basic advanced pro"
},
{
"name": "Account Type",
"description": "partner or client"
},
{
"name": "SK_User_Master_Inserted",
"description": "User who created the Account",
"tests": { "relationships": { "to": "ref('user')", "field": "id" } }
}
]
tests = [
{ "accepted_values": {"column_name": "[account type]", "values": ['partner', 'client']}},
{ "unique": {"column_name": "concat([account name], [account type])"}}
]
)
}} I don't say this is exactly how I want it and I would use stright away, just suggestion for discussion :) |
Honestly I like having it separate. git-based version control systems are more "friendly" with multiple smaller files than they are with fewer larger ones. More, smaller files makes it easier to eyeball quickly where changes have taken place and also makes merge conflicts less likely. |
I am coming with another idea. I don't know whether it is a problem for others, but for me the problem is that the documentation
The idea is to have the documentation inside the model using comments. Other programming languages use comments for documentation, so it could work as well. For example a comment --docs < column > < description > could be parsed and used for documentation. The position of such comment would not be important as it holds the column name, so it would work also in long models with many CTEs. It could be another option to document model and could be overridden by schema.yml SELECT
--docs full_name full name of customer
full_name
--docs country country where customer is located based on IP location
,country
--docs browser_language language set in customer browser
,browser_language
--docs date_created_at date customer created
--this is an unrelated commented to docs
,date_created_at
--docs date_last_login date customer logged in
,date_last_login
FROM analytics.customer I haven't thought about column level tests yet. |
I'm thinking about this in the same way I'm thinking about #2401, which is on our 1.0 to-do list, and could be essentially summarized as: "Reconcile node configs and resource properties, where possible and it makes sense." Today, it's not totally clear what can be defined in one vs. the other—or both ( Personally, I find myself agreeing with @jrandrews's point above. I think that, by and large, we'll want to maintain the functional distinction that exists today, recast as a convention:
|
We had a similar problem and wanted to post an issue. The worst thing for us was that we often forgot to edit the schema file when we added / removed columns. But I think having a description in |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Describe the feature
When I started to learn dbt, I thought that data documentation is part of a model. Which is true, once it is published on the docs server. But when talking about source code, documentation is separated to
![image](https://user-images.githubusercontent.com/2481962/73534389-4aeee080-4421-11ea-8969-41e9ae01645a.png)
schema.yml
which is not intuitive for me. I would expect to have an option to write the docs close to the code. I can imagine a part of the config for this :Who will this benefit?
When one is making changes in a model, he can see the docs and moreover edit it immediately + he sees names of columns produced in the model.
Thanks :)
The text was updated successfully, but these errors were encountered: