Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When dbt finds existing approximate matches via adapter.get_relation() use the same quoting when printing error message #4187

Closed
1 task done
dataders opened this issue Nov 2, 2021 · 4 comments
Labels
enhancement New feature or request stale Issues that have gone stale Team:Adapters Issues designated for the adapter area of the code

Comments

@dataders
Copy link
Contributor

dataders commented Nov 2, 2021

Is there an existing feature request for this?

  • I have searched the existing issues

Describe the Feature

A user reported this issue to me (https://github.com/firebolt-analytics/dbt-firebolt/issues/62 private repo). But their error message is copied below and it looks as if there are two models: one that is quoted ("oz_account") and one that isn't (oz_account). In reality, I think there is a namespace overlap? What's driving me crazy is that 4/5 of current users are not having this quoting issue, and one user is having this issue. So I don't understand how this can happen when the adapter code is the same...

My suggestion would be to make them both be quoted so users don't think there's a quoting issue going on, but rather should know that it is indeed a problem with the model name already being used.

Completed with 3 errors and 0 warnings:

Compilation Error in model account (models/adtech/staging/account.sql)
  When searching for a relation, dbt found an approximate match. Instead of guessing 
  which relation to use, dbt will move on. Please delete "oz_account", or rename it to be less ambiguous.
  Searched for: oz_account
  Found: "oz_account"
  
  > in macro materialization_table_firebolt (macros/materializations/table.sql)
  > called by model account (models/adtech/staging/account.sql)

Describe alternatives you've considered

Unless this user is experiencing #3835. I'm going to have the user try the fix proposed in #4076 and see if that fixes their problem. If so, we know what's going on.

If #4076 does not fix the user's issue, it seems I have to deep dive more into adapter.get_relation() to see what's going on

Who will this benefit?

users of dbt-firebolt!

Are you interested in contributing this feature?

yep.GIF

Anything else?

nope.GIF

@jtcohen6
Copy link
Contributor

@swanderz Thanks for the report! This certainly brings me back to an early day of dbt, when quoting, casing, and caching were all the rage.

I haven't been able to reproduce this on other adapters, as much as I mess with quoting, casing, aliasing, etc. Would you be able to create a simple reproduction case that manages to produce this error on the adapter? I'd be curious to step line by line through the matches logic. Ultimately, these methods do live on an adapter object (BaseRelation), so their logic can be overridden + reimplemented in the adapter plugin if needed.

The approximate_relation_match exception is raised in a very specific case: dbt finds a relation that's an approximate_match (case-insensitive comparison), but not an _is_exactish_match. The logic for _is_exactish_match is a bit peculiar for adapters that turn off quoting (e.g. Snowflake, Firebolt), as dbt will perform a case-insensitive comparison, but only for relations it believes it has created:

if self.dbt_created and self.quote_policy.get_part(field) is False:
return self.path.get_lowered_part(field) == value.lower()

How does it know which relations it has created? Based on the relations returned by each materialization:

for relation in self._materialization_relations(result, model):
self.adapter.cache_added(relation.incorporate(dbt_created=True))

(I took a quick glance, and it seems that dbt-firebolt's reimplemented materializations are all doing this properly. I did notice that the test task reimplements execute, and is thereby missing the piece that adds returned relations to the cache... so perhaps it's theoretically possible to trip this error with an ambiguously aliased test + --store-failures, but I haven't been able to do it.)

I believe this is a different error from #3835. In that case, we want to raise the approximate_relation_match error, to help out the end user, but we don't because one relation has an extra set of explicit quotes. In this case, we are raising the error, so it seems like it can't be a quoting issue...

In reality, I think there is a namespace overlap?

Given what I said about the dbt_created property above, you could be right about this!

@jtcohen6 jtcohen6 removed the triage label Nov 17, 2021
@jtcohen6 jtcohen6 added the Team:Adapters Issues designated for the adapter area of the code label Nov 18, 2021
@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label May 18, 2022
@github-actions
Copy link
Contributor

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest; add a comment to notify the maintainers.

@joemirizio
Copy link

I believe we are experiencing this issue with the Netezza adapter, which has an identifier quoting policy of False. When relations are loaded into the cache and then matched against, the BaseRelation._is_exactish_match is failing - this happens in certain macros and dbt server.

After debugging we’ve found that BaseRelation._is_exactish_match is returning False because it is comparing the table identifiers exactly, rather than lowercasing both sides of the comparison - which is ONLY done if the quoting policy is set to False (it is) AND the relations dbt_created field is True .

When we dove into the cache_added method, it appears that the incoming relation’s dbt_created field is correctly set to True, but when retrieving the value from the cache, it is set to False. Due to a relation's field dbt_created not preserving after a cache read, it causes the match to fail.

Steps to reproduce

  1. Using any adapter, set a breakpoint at https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/adapters/base/impl.py#L447
  2. dbt run a project with a model
  3. Evaluate the following in the debugger
> relation.dbt_created
True
> [cache_relation for cache_relation in self.cache.get_relations(relation.database, relation.schema) if cache_relation.identifier == relation.identifier.lower()][0].dbt_created
False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale Issues that have gone stale Team:Adapters Issues designated for the adapter area of the code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants