-
-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setting Metabase Metrics using dbt properties #25
Comments
So looking at the above compiler, lets look at the output we are looking for.
With a little deep diving, I am pretty confident we can compile the expected input without needing to use their exact source code. The above code can translate 1 to 1 to python pretty easily. The challenge is translating yml meaningfully to the expectation whilst making it simple and readable to the end user. Kind of spitballing below but it can get a little complicated if we dont approach it from a readability first perspective. So I guess my end opinion is that the compilation in python isnt as much the constraint in my eyes as is the format for defining these in the yml.
|
@z3z1ma Thanks for looking into this. I think the simple column-level aggregations are easy to generate directly on the UI (on the Simple Question sidebar) so I wouldn't worry too much about them. The main problem I would like to solve is to implement Metrics across multiple fields such as:
I think it would be ideal that these expression strings are copy/pastable from the Metabase UI so I would add them directly as strings in dbt's properties. For example:
The compiler you sent seems easy enough to convert to Python but if we try to parse string expressions we would need to implement all of this: https://github.com/metabase/metabase/blob/master/frontend/src/metabase/lib/expressions/parser.js and its dependencies. We need the Lexer, parser and grammar. It seems like too much work. Right? |
Hey @remigabillet. Let me know what you think of this project (not sure if you have seen it)? They interface with BI as a rest service using Presto/Trino configured integrations. The con is the need for an additional service and its still in early stages. Interesting though. Either way, I hear what your saying on the string parsing. Convenient and cool but definitely will take an external dependency like Metabases parser. Outside that, my approach isnt as copy paste easy, you define metrics in a different way, however if you only need to define them in one place and we back out the expected json for API consumption on Metabase's side then it shouldnt matter provided its still intuitive relative to the general flavor of dbt yml files. |
I agree with you that declaring metrics in a tree-format in line with the Metabase API format would definitely be a big step in the right direction. Next we would have to resolve the field references before hitting the Metabase API with updates. Anyway, I tried packaging the metabase expression parser into a JS packages executable with node but after many hours, I don't think it's going to work. I got stuck on browser dependencies which would need to be mocked, so I'll pass on that approach. metriql is very cool. They point to a few really interesting articles in their docs, highlighting how critical it is to solve "Metrics" at the edges. I love it. Another approach is to let metriql lead this work but it would be in their project. They already have a way to define metrics in DBT, and a parser in Kotlin. Next they would need to pull the Metabase schema info like we do here, remap fields, and convert their metrics in the Metabase format. Separately, I'm still curious to explore building a Python parser for Metabase expressions. Maybe it's not as hard as it seems 🤔 |
Thanks for mentioning metriql, @z3z1ma! We're extensively working on the integration with BI tools at the moment and Metabase is on our short-term radar. However; rather than embedding native Metabase expressions, we want to use Jinja just like how you use dbt:
We actually use Python for other BI tool integrations (Tableau, Looker), my plan with Metabase is to parse these Jinja expressions in Python and generate Metabase expressions. Here are the benefits of this approach:
If we manage to do that, the parser can also be used in dbt-metabase without metriql as it will just be a Python module. What do you think? |
Also, I agree that maintaining another service but might an overhead but metriql tries to solve the following problems in addition to the metric definitions: Join relationsWhile Metabase supports JOINs in data models, it's pretty limited. Metriql has MQL, a subset of Trino's SQL dialect. It exposes the semantic models to Metabase through the Trino interface as denormalized tables & columns and generates the JOINs in the engine. PerformanceIf your tables are huge, the performance of ad-hoc queries in Metabase suffers. Looker has Aggregate Awareness to address this problem, and others such as Tableau & PowerBI pull the data into their system. In case your dbt models & sources are huge, metriql can automatically create roll-up tables inside your dbt project as dbt models and make use of them when Metabase queries the data via Aggregates. Central metric definitionsIf you have other data tools in your stack, and you may want to use the data models inside those tools as well. Some other tools don't have a UI. Instead, they need an API (similar to Airbnb's Minerva) to query the data. (Jupyter notebooks, CLIs, etc.) If you're heavily on Metabase and you use dbt models extensively, you might not need these features so I want to make the BI tool integrations separated from metriql if possible. I would love to contribute to dbt-metabase as I will also be using it in metriql for updating the column types via Metabase API. |
Hi ! Concerning metrics, there is another dbt related project called lightdash, that aims to provide dataviz directly ontop of dbt's models. Lightdash reads dbt schemas' meta to infere measures, dimensions and metrics, and propose a way to define adhoc metrics directly in dbt schemas. I thought it could be worth mentioning it here. Thank you for providing us with this dbt/metabase bridge ! |
Thanks for the detailed reply!
Can you elaborate further on what you mean here? What are Metabase expressions in this context? Do you mean to parse the above to a string like this My original thought went to using the Metric feature in Metabase since the metrics become available intuitively through the front end (see below) to both data analysts composing views for the company or for end-users performing self service analytics. The only hurdle to the implementation is compiling the JSON expected: These are simple examples (which in all likelihood cover the 80% use case since our users are using dbt to output nice clean models) but in a perfect world we would include Metabase expressions like Its really not too bad to compile the above once you analyze it- just wrapping my head around how in dbt yamls we can define these intuitively. The dev cycle for the above could actually be pretty short if we find direction with acceptable meta tag yaml definitions. That being said I'm open to suggestions, pros, cons. Very interested in what you have in mind though. |
@buremba what you're proposing sounds amazing. If I understand correctly, you would first build a https://www.metabase.com/docs/latest/users-guide/expressions.html Part of that python module, do you also plan on converting field & table references into their Metabase internal IDs? It would require pulling the schema from Metabase (like this project does). |
@buremba I feel that your work on metriql is super relevant to this issue. Do you mind giving us an update and let me know if I'm understanding it correctly? |
I think I am getting somewhere. Perhaps we can transcribe what you envisioned. Will keep building this out. |
very cool @z3z1ma! So you actually starting write a parser! Very cool! Do you want to open a Draft PR, I'm happy to have a look, play with it and maybe contribute. |
Ok, after a nice chunk of work, I think I have the "money shot" here @remigabillet @gouline . I can open a draft PR later today to start working through and testing further as well as allowing introspection into code. |
Sorry for the delay, I was out of work for a 10 days trip with limited internet access.
Metabase expressions are the expression syntax the Metabase API can understand in this context. metriql requires one of
For the metric definitions, we need to know the aggregation type and aggregation filters separately in metriql as seen above. The Python project that we will develop needs to convert the expression above to a native Metabase expression, and synchronize them via the API. For example:
The JSON blob above will be using as part of the
Our approach is similar to LookML from Looker because the engine needs to understand the metric in a better way for better composability. If you have a filtered measure (your @remigabillet, yes, that's the plan! I will be using dbt-metabase for metriql for that use-cases, implement the metrics and hopefully contribute back to dbt-metabase if there is any interest. @z3z1ma I'm assuming that you're planning to let people embed Metabase expressions into YML files, right? Let me know if you think that this approach is complementary or out of your scope. Also, this compilation will only be one-way so you won't be able to use existing Metabase expressions. Would that be a problem, what do you think? |
dbt now has metrics as well: dbt-labs/dbt-core#4071 (discussion with link to the implementation PR) |
Yeah. Had a look into that, DBT metrics are sort of something different as they result in building pre-aggregated tables in the DW, not just defining metric expressions. IMHO a more limited scope version of the requested feature is valuable - simply being able to put a meta: metabase/metrics list with the Metabase native expressions in the manifest. Although this does not attack the much harder problems discussed above, it would be enough to provide a test -> production workflow and "keep metrics in sync with DBT models" mechanism, which for me at least, would be very valuable and is directly in line with the other functionality this project offers. One awkward thing is that since there is no persistence mechanism from dbt-metabase runs, we can't store the key of the created metric. So we would have to just look for them by name and update or create if missing. This implies renaming or removing metrics can't really be supported. Not the end of the world but thought I'd flag it. |
Significant changes are coming on dbt side so might be worth revisiting: https://www.getdbt.com/blog/dbt-semantic-layer-whats-next/ |
I was very excited to have found metriql through this thread, thank you. However it seems the project is not very active. Are there any similar solutions? |
No current plans to implement this, closing for now. If anyone is interested in contributing an implementation after 1.0 is released, we can re-open and continue the discussion. |
Metabase Metrics (and in particular custom aggregations) are very useful to our end-users. I'm finding myself spending a lot of time in Metabase managing Metrics, the UI is very inconvenient.
I'd like to be able to define a list of Metabase Metrics on DBT models as schema model
meta
properties. I wonder if this fits this project's scope. The primary challenge is parsing the Metabase expressions which are defined as string (ex:Distinct(case([step] = "hired", [candidate_user_id])
) into trees that the API can handle. I think it's easiest to maintain if I compile the JS parser from metabase source code into a small node script that can be committed and called from this project.This would add a good amount of complexity so it's probably beyond the scope of the project. I thought I would share for the sake of discussion.
The text was updated successfully, but these errors were encountered: