-
Notifications
You must be signed in to change notification settings - Fork 926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance support for parameterised SQL query datasets #1089
Comments
Super easy solution: don't use a yaml list, just a literal string which make the SQL query valid:
This is undoubtedly quite ugly and not a very general solution though. However, I think the already existing Jinja 2 support in @datajoely will be able to advise better here, but when it comes to templating SQL queries I think you may be straying into the territory of dbt rather than kedro. This is something they do really well. We're hoping to improve SQL support in kedro, but I think it's very feasible (even recommended?) to use dbt as well as kedro if you're trying to do relatively complex SQL stuff. |
I think currently we're taking a view that SQL in Kedro should be mostly about extraction and saving rather than transformative and business logic. It took me a while to be convinced of the reasoning but it can be best explained as this:
This isn't 100% adhered to since we support things like UPSERTS and INSERTS in some cases in general SQL isn't Kedro's strong point and should be used when you need to use DataFrame like APIs in the Python world. |
Thanks for the tips. I think you're suggestion @AntonyMilneQB should work for my particular use-case. |
Hi @jstammers do you need more help with this issue or can we close it? |
Hi @MerelTheisenQB, please feel free to close this issue. I've been able to resolve my problem elsewhere. |
Okay great! I'm glad it's resolved 🙂 |
Description
It would be useful to extend the ability of SQLQueryDatasets to make use of parameterised queries that can be given parameters at runtime.
Context
With the current
SQLQueryDataset
, parameters can be used in some cases. For example parameters can be passed in fromglobals.yml
However, when these values are loaded from a yaml file, their string representation of the corresponding python object is used. This is a problem for lists as the following would not produce a valid SQL query
Possible Implementation
The jinjasql package provides some utilities for parsing templates using jinja syntax.
This can then be passed into
pandas.read_sql_query
as followsFrom a configuration point of view, it might be useful in add a keyword to
SQLQueryDataset
to make it explicit that the query is a template, rather than a valid SQL string, e.g.I haven't given much thought yet as to how this could take values from runtime parameters. I think it would require some additional validation, e.g. checking that all the parameters in the template have a value
The text was updated successfully, but these errors were encountered: