Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect quoting config in dbt-py models #6620

Merged
merged 3 commits into from
Jan 19, 2023
Merged

Conversation

jtcohen6
Copy link
Contributor

resolves #6103
resolves #6619

Description

From our code comment:

Quoting seems like something very specific to sql so far. For all python implementations we are seeing, there's no quoting.

Initial user feedback has proven otherwise!

  • We do need to respect quoting in methods that return stringified relation names (dbt.ref(), dbt.source(), dbt.this())
  • Because relation names are interpolated into Python dataframe code as strings, we need to escape relation names that contain double quotes ("). This is principally relevant to Snowflake today, but several other data platforms also use " as their quoting character.

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR — relevant tests in adapter repos
  • I have opened an issue to add/update docs, or docs changes are not required/relevant for this PR
  • I have run changie new to create a changelog entry

@jtcohen6 jtcohen6 requested a review from ChenyuLInx January 16, 2023 11:38
@jtcohen6 jtcohen6 requested a review from a team as a code owner January 16, 2023 11:38
@jtcohen6 jtcohen6 requested a review from a team January 16, 2023 11:38
@jtcohen6 jtcohen6 requested a review from a team as a code owner January 16, 2023 11:38
@jtcohen6 jtcohen6 requested a review from VersusFacit January 16, 2023 11:38
@cla-bot cla-bot bot added the cla:yes label Jan 16, 2023
Copy link
Contributor

@ChenyuLInx ChenyuLInx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good over all, thanks for fixing this! Just want to confirm the quoting " is not tied only to snowflake.

# quoating seems like something very specific to sql so far
# for all python implementations we are seeing there's no quating.
# TODO try to find better way to do this, given that
original_quoting = self.config.quoting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am so glad these lines are removed!

Copy link
Contributor

@VersusFacit VersusFacit Jan 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any day we kill a todo is a good day.

@@ -3,7 +3,7 @@
{%- set ref_dict = {} -%}
{%- for _ref in model.refs -%}
{%- set resolved = ref(*_ref) -%}
{%- do ref_dict.update({_ref | join("."): resolved.quote(database=False, schema=False, identifier=False) | string}) -%}
{%- do ref_dict.update({_ref | join("."): resolved | string | replace('"', '\"')}) -%}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the " here tied to specific quoting for snowflake?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it would be relevant to any data platform that uses " as its quoting character (it's our default), or any identifier that contains a quote character (") for any reason

Copy link
Contributor

@VersusFacit VersusFacit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reading the principle change here as quoting all necessary "s very \ escapes within the Python environment.

Will any downstream adapters need to override this or can we trust this behavior holds for all? I'm concerned about the quote rules being different in, idk, Bigquery from the simple transform applied here. Ergo, this PR could introduce a bug for Bigquery (or whatever db has its own quoting rules).

Or is this quoting restricted to the Python environment? [And as a corollary, this is just making sure Python doesn't alter the string before handing it to the underlying db, which will figure out if the relation name string makes sense or not]

@jtcohen6
Copy link
Contributor Author

I'm reading the principle change here as quoting all necessary "s very \ escapes within the Python environment.

Correct! This isn't specific to Snowflake, it would also be relevant on any other data platform that uses " as its quoting character. (As it happens, both BigQuery + Spark/Databricks use backticks, and don't allow other special characters in identifier names.)

Big idea: in Python models, dbt.ref() is really producing this code:

session.table("database.schema.table")

Where the table identifier is a Python string, i.e. quoted. If there are any instances of quotation marks (") in database, schema, or table, we need to escape, just as you would for any Python string containing quotes:

>>> string = "hello, \"world\"!"
>>> print(string)
hello, "world"!

The automated test confirming that this change works is really in dbt-labs/dbt-snowflake#399, which also resolves a dbt-snowflake-specific bug in its own right.

@jtcohen6 jtcohen6 merged commit fa7c4d1 into main Jan 19, 2023
@jtcohen6 jtcohen6 deleted the jerco/fix-python-model-quoting branch January 19, 2023 08:34
jtcohen6 added a commit that referenced this pull request Jan 20, 2023
jtcohen6 added a commit that referenced this pull request Jan 20, 2023
jtcohen6 added a commit that referenced this pull request Jan 20, 2023
database = "{{ this.database }}"
schema = "{{ this.schema }}"
identifier = "{{ this.identifier }}"
{% set this_relation_name = this | string | replace('"', '\\"') %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One potential option is to make this a macro and have adapters overwrite it

Copy link
Contributor

@ChenyuLInx ChenyuLInx Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is related to the change of respecting quoting schema would break bigquery dataproc python model

Copy link
Contributor Author

@jtcohen6 jtcohen6 Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just chatted with @colin-rogers-dbt about this. Simplest solution is to turn this processing into a dispatched macro that can be overridden on dbt-bigquery.

Something like:

{% macro render_relation_for_python_model(relation) %}
  {{ adapter.dispatch('render_relation_for_python_model', 'dbt')(relation)
{% endmacro %}

{% macro default__render_relation_for_python_model(relation) %}
  {{ return(relation | string | replace('"', \\"')) }}
{% endmacro %}

{% macro bigquery__render_relation_for_python_model(relation) %}
  {{ return(relation.quote(database=False, schema=False, identifier=False) | string) }}
{% endmacro %}

And use that macro in all three spots that I changed here (ref, source, this)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants