-
Notifications
You must be signed in to change notification settings - Fork 1.7k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] dbt should know about Attributes
(foundation for Metrics/Dimensions)
#4090
Comments
At my current company I have a script that generates yaml for a table which has been created via I feel like if we want to solve consistent documentation, we should perhaps consider a new builtin CLI command? Something that could generate docs with existing descriptions, as well as point out inconsistent definitions, and perhaps edit the I'm leaning that way because I think attributes as they are proposed above are sort of trying to be too many things: are we solving metric definitions or consistent docs? Let's be explicit and narrow in on the problem we want to solve 😄 |
I would like to offer another PoV from a GoodData company (I'm an employee there) that has implemented its own ROLAP engine (historically inspired by MicroStrategy). Same as the issue author, we (and our customers) see great value of building metrics on top of an abstraction over the physical DB (instead of directly over DB columns). We would like to offer our longterm experience in this area to help shape the universal semantic layer design. Let me briefly outline our approach. What you outline here is basically what we in GoodData call a Logical Data Model (LDM). The LDM consists of Once an LDM is set up the end users can then define their metrics using facts and attributes via a custom language called MAQL. A typical simple metric could look like
Such a metric could then be used in many different reports, in each of them broken down by any attributes "compatible" with the Price fact, e.g. Product, Customer, Campaign. (For more info about LDM you can read the intro docs). I hope that our approach could be of some inspiration in your proposed design which I generally like quite a lot, especially the overall direction of your thinking. I may be biased, though :-) Couple of questions:
Anyway, thanks for a great kick off! |
Big thank you @aaronsteers for the clear and thorough write-up, and @tnightengale @david-kubecka for the thoughtful replies! I'm way overdue for responding to this one, in part because it sent me down a flurry of mental paths, and it's taken me a while to gather my thoughts together into anything serviceable. I believe the move into defining metrics in Semantically meaningful column properties ought to be inherited, and not re-typed every single time. That inheritance can run in two ways:
The separation/combination I described above is something I want to pursue, and it's something we can do with existing constructs—models and columns—plus a few new capabilities. So, do we need another construct here? What does a notion of an I've become increasingly convinced that there's real value in defining column "types." The closest thing to column "types" in dbt today is a mix of descriptions (achieved with reusable So, imagine a
Then we work toward a synthesized approach whereby:
Lots of questions:
Ok! This is just the beginning. I'm going to turn this into a discussion, in the hope that more folks join in :) |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Is there an existing feature request for this?
Describe the Feature
Following from #4071, I called out in my comment that I believe metrics would be better established on top of "attributes" instead of building directly on "columns" and "models" abstractions in dbt.
As a foundation for deeper metadata understandings within dbt and to unify the documentation effort for existing dbt projects, we should first establish some type of ontological definition of what columns "mean", as they relate to an analytical framing.
Proposal
dimension_key
,dimension_property
,fact
, etc.as_of_date
marked as the primary temporal attribute for the table, this will inform how metrics calculations can be performed - and at what grain they are possible.)sales_revenue
, we could set anauto_map_by_name: true
property to find and map all references of the columnsales_revenue
, or we could explicitly map to models and column references.Benefits
Sample Code
Adapted from my comment on the related metrics topic.
Describe alternatives you've considered
Metrics: The alternatives for metrics is to create mappings directly over all tables and columns. The greater the number of tables of different aggregation levels, and/or projections for query optimization, the larger the redundancy of those metrics mappings will be.
Documentation: This is actually stemming from another inquiry I ran into a couple years ago: how to document all columns in all models, without having to put the same text description on every single instance. (And then, how to keep them up to date as you want to update how "sales revenue" is calculated on all of them.) As far as I'm aware, there isn't yet a good solution for this documentation problem, and so my general guidance has been "don't worry about descriptions on columns" - because it's just too much work and no single-source-of-truth to keep them in sync across tables.
Adding attributes would hopefully change that, since the many places the column exists, it will always carry (or link to) the same text description for users of the project.
Who will this benefit?
This would benefit teams who want column-level descriptions for themselves and their users.
This benefits users, because they can better documentation on columns, and better understand equivalency (or lack thereof) of similarly named column across a project.
Are you interested in contributing this feature?
Sure!
Anything else?
Prior art
Inspired by OLAP platforms and BI layers which support ROLAP capabilities:
The text was updated successfully, but these errors were encountered: