Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

catch error in multi-query batches #140

Merged
merged 7 commits into from
Jun 22, 2021
Merged

catch error in multi-query batches #140

merged 7 commits into from
Jun 22, 2021

Conversation

dataders
Copy link
Collaborator

Fixes: #139, #135

#124 modified core adapter macros (e.g.create_view_as()) to be multi-query batches that were previously single queries.

This PR attempts to use @jeroen-mostert's solution to iteration through each query in a query batch looking for errors

@dataders
Copy link
Collaborator Author

@jtcohen6, I'm struggling with how to fetch then handle the pyodbc's ProgrammingError exception. More generally I'm trying to understand how the SQLConnectionManager class works with query batches...

current approach to fixing

I've implemented @jeroen-mostert's approach. I'm not exactly sure what cursor.nextset() does, but when you call it on a batch query where the 2nd query has a syntax error, the pyodbc Programming Error exception pops up... This error doesn't surface if you only call query.exectue()

this PR now surfaces an "unhandled" error, but doesn't return the Database Error line about where the error occurred.

image

desired (and previous state)

this is what the output should look like using a version of dbt-sqlserver before the adapter macros had query batches.

image

alternative solution split the queries and add/execute them independently?

One approach would be to implement something like SnowflakeConnectionManager._split_queries(), but my reservations are:

  1. would this work for an adapter macro like sqlserver__create_view_as()
  2. will executing a database switch command in two different batches work (I have to test this w/ a SQL Server instance)
  3. where does snowflake.connector.util_text.split_statements() come from?

@jeroen-mostert
Copy link

jeroen-mostert commented Jun 15, 2021

If it helps (actually I'm pretty sure it doesn't but I'm putting it out here anyway :-)) there's something weird with how pyodbc treats multi-statement batches for SQL Server. Calling .nextset() advances to the next result set of a batch -- this is a general mechanism surfaced also in, for example, .NET's SqlDataReader.NextResult(). However, this should not generally be necessary for statements that do not produce result sets -- USE [database]; SELECT 1 a is an example of a batch with two statements that produces one result set, not two. In .NET you would never need to call .NextResult() to make this batch return, but for some reason pyodbc will treat every statement result as a distinct result set, even if the statement just completed and nothing more. And instead of surfacing execution errors, it will treat an error that results from .nextset() as a programming error (?). Basically, pyodbc demands that you know in advance exactly how many statements you have and what each of them will return even if they don't return anything, which is quite a nasty restriction that I haven't seen in DB libs for other languages. I was unable to make the error handling "neat" in this case; for our purpose we were already happy with just getting the error, even if not as a "proper" database error, but I get that that would be desirable in general.

All this is not to suggest this is something that could or should be changed in the pyodbc layer just to make dbt happy, of course, since that might be quite impactful; executing individual statements is still the safest way to go. A USE database persists on the connection across batches, provided of course that 1) you do not close it and open a new one and 2) you do not use a separate EXEC for the statement, as that gets its own context and the DB is reverted at the end. (EXEC 'USE db; <other stuff>' works, of course, but you'll probably run into the same multi-statement result set gotchas again.)

Parsing batches and splitting them manually on anything but a dedicated batch separator like good old GO requires a T-SQL parser with full fidelity if there's any chance your statements contain escaped or quoted semicolons anywhere; I wouldn't recommend that path. T-SQL's a hideously byzantine language and the fact that the statement separator is mostly optional doesn't help; you really want to split your statements on a higher level if possible.

@semcha
Copy link
Contributor

semcha commented Jun 15, 2021

Awesome fix! 👍 Faced with this today in my work with snapshot functionality.

@jtcohen6
Copy link
Contributor

@jeroen-mostert I really find myself agreeing. It's not good how pyodbc treats errors from multi-statement batches. It's also preferable, wherever possible, to supply individual queries one at a time, so long as they can be executed within a single transaction/session. EXEC perhaps can't be helped. Lacking other alternatives, pushing the error to stdout, even if imperfectly, is still a huge improvement.

where does snowflake.connector.util_text.split_statements() come from?

That comes from the Snowflake python connector here. The logic in this method is somewhere between "just split on semicolon" and "SQL parser with full fidelity." If I remember right, the Snowflake python connector does not properly handle multiple query statements at once (semicolon-delimited), so this is what we have to use instead.

@visch
Copy link

visch commented Jun 17, 2021

Thank you for pushing this up!

Sorry I didn't give better instructions for how to reproduce the error :D

So far (even with the cursor2 oddity) runs with this new setup, and I'm seeing failures where I would expect now. And I'm catching errors I was missing :O

(installed on a new venv with
pip install git+https://github.com/dbt-msft/dbt-sqlserver.git@cb12056e18ad60859eeaa34217bc3a8762551d02 , is there an easier way to install over an existing installation?pip install --upgrade git+https://github.com/dbt-msft/dbt-sqlserver.git@cb12056e18ad60859eeaa34217bc3a8762551d02 wasn't doing it for me, and cloning and running python setup.py install wasn't either (it looked like pip thought the existing package was the most recent)

@visch
Copy link

visch commented Jun 20, 2021

Using this right now in my environment. So far so good, I've caught 20 plus errors I was missing :O

@dataders
Copy link
Collaborator Author

@jeroen-mostert, you are my hero -- not only for the fix but the depth of knowledge you were able to share. You certainly make me feel a lot saner, now that I know that both pyodbc and TSQL are at fault here.

That said, these EXEC statements are only needed to support the multi-database scenario that is only possible with on-premise SQL Server. There's a part of me that wants to kick this behavior into it's own adapter (dbt-sqlserver-multi-db) so single-database SQL Server users and Azure SQL users don't have to experience this workaround stdout behavior. @jtcohen6 @periclesrocha, the database version telemetry MSFT has on connections coming from dbt would greatly inform the right path to take...

For now, we should definitely merge and release this to everyone. @mikaelene would you do the honors of releasing a v0.19.2 once this is merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dbt doesn't properly return error message on wrong syntax
6 participants