catch error in multi-query batches #140

dataders · 2021-06-15T17:29:12Z

#124 modified core adapter macros (e.g.create_view_as()) to be multi-query batches that were previously single queries.

This PR attempts to use @jeroen-mostert's solution to iteration through each query in a query batch looking for errors

dataders · 2021-06-15T19:29:00Z

@jtcohen6, I'm struggling with how to fetch then handle the pyodbc's ProgrammingError exception. More generally I'm trying to understand how the SQLConnectionManager class works with query batches...

current approach to fixing

I've implemented @jeroen-mostert's approach. I'm not exactly sure what cursor.nextset() does, but when you call it on a batch query where the 2nd query has a syntax error, the pyodbc Programming Error exception pops up... This error doesn't surface if you only call query.exectue()

this PR now surfaces an "unhandled" error, but doesn't return the Database Error line about where the error occurred.

desired (and previous state)

this is what the output should look like using a version of dbt-sqlserver before the adapter macros had query batches.

alternative solution split the queries and add/execute them independently?

One approach would be to implement something like SnowflakeConnectionManager._split_queries(), but my reservations are:

would this work for an adapter macro like sqlserver__create_view_as()
will executing a database switch command in two different batches work (I have to test this w/ a SQL Server instance)
where does snowflake.connector.util_text.split_statements() come from?

jeroen-mostert · 2021-06-15T20:51:34Z

If it helps (actually I'm pretty sure it doesn't but I'm putting it out here anyway :-)) there's something weird with how pyodbc treats multi-statement batches for SQL Server. Calling .nextset() advances to the next result set of a batch -- this is a general mechanism surfaced also in, for example, .NET's SqlDataReader.NextResult(). However, this should not generally be necessary for statements that do not produce result sets -- USE [database]; SELECT 1 a is an example of a batch with two statements that produces one result set, not two. In .NET you would never need to call .NextResult() to make this batch return, but for some reason pyodbc will treat every statement result as a distinct result set, even if the statement just completed and nothing more. And instead of surfacing execution errors, it will treat an error that results from .nextset() as a programming error (?). Basically, pyodbc demands that you know in advance exactly how many statements you have and what each of them will return even if they don't return anything, which is quite a nasty restriction that I haven't seen in DB libs for other languages. I was unable to make the error handling "neat" in this case; for our purpose we were already happy with just getting the error, even if not as a "proper" database error, but I get that that would be desirable in general.

All this is not to suggest this is something that could or should be changed in the pyodbc layer just to make dbt happy, of course, since that might be quite impactful; executing individual statements is still the safest way to go. A USE database persists on the connection across batches, provided of course that 1) you do not close it and open a new one and 2) you do not use a separate EXEC for the statement, as that gets its own context and the DB is reverted at the end. (EXEC 'USE db; <other stuff>' works, of course, but you'll probably run into the same multi-statement result set gotchas again.)

Parsing batches and splitting them manually on anything but a dedicated batch separator like good old GO requires a T-SQL parser with full fidelity if there's any chance your statements contain escaped or quoted semicolons anywhere; I wouldn't recommend that path. T-SQL's a hideously byzantine language and the fact that the statement separator is mostly optional doesn't help; you really want to split your statements on a higher level if possible.

semcha · 2021-06-15T21:04:25Z

Awesome fix! 👍 Faced with this today in my work with snapshot functionality.

jtcohen6 · 2021-06-15T21:32:40Z

@jeroen-mostert I really find myself agreeing. It's not good how pyodbc treats errors from multi-statement batches. It's also preferable, wherever possible, to supply individual queries one at a time, so long as they can be executed within a single transaction/session. EXEC perhaps can't be helped. Lacking other alternatives, pushing the error to stdout, even if imperfectly, is still a huge improvement.

where does snowflake.connector.util_text.split_statements() come from?

That comes from the Snowflake python connector here. The logic in this method is somewhere between "just split on semicolon" and "SQL parser with full fidelity." If I remember right, the Snowflake python connector does not properly handle multiple query statements at once (semicolon-delimited), so this is what we have to use instead.

dbt/adapters/sqlserver/connections.py

visch · 2021-06-17T12:28:29Z

Thank you for pushing this up!

Sorry I didn't give better instructions for how to reproduce the error :D

So far (even with the cursor2 oddity) runs with this new setup, and I'm seeing failures where I would expect now. And I'm catching errors I was missing :O

(installed on a new venv with
pip install git+https://github.com/dbt-msft/dbt-sqlserver.git@cb12056e18ad60859eeaa34217bc3a8762551d02 , is there an easier way to install over an existing installation?pip install --upgrade git+https://github.com/dbt-msft/dbt-sqlserver.git@cb12056e18ad60859eeaa34217bc3a8762551d02 wasn't doing it for me, and cloning and running python setup.py install wasn't either (it looked like pip thought the existing package was the most recent)

visch · 2021-06-20T10:58:15Z

Using this right now in my environment. So far so good, I've caught 20 plus errors I was missing :O

…lserver into cursor_conundrum

dataders · 2021-06-22T20:04:28Z

@jeroen-mostert, you are my hero -- not only for the fix but the depth of knowledge you were able to share. You certainly make me feel a lot saner, now that I know that both pyodbc and TSQL are at fault here.

That said, these EXEC statements are only needed to support the multi-database scenario that is only possible with on-premise SQL Server. There's a part of me that wants to kick this behavior into it's own adapter (dbt-sqlserver-multi-db) so single-database SQL Server users and Azure SQL users don't have to experience this workaround stdout behavior. @jtcohen6 @periclesrocha, the database version telemetry MSFT has on connections coming from dbt would greatly inform the right path to take...

For now, we should definitely merge and release this to everyone. @mikaelene would you do the honors of releasing a v0.19.2 once this is merged?

dataders added 2 commits June 15, 2021 13:20

typo

ad854a1

thanks @jeroen-mostert

c001628

add example

cb12056

mikaelene reviewed Jun 16, 2021

View reviewed changes

dbt/adapters/sqlserver/connections.py Outdated Show resolved Hide resolved

Update dbt/adapters/sqlserver/connections.py

0c6f53a

dataders added 2 commits June 22, 2021 13:03

document

b2e18ce

Merge branch 'cursor_conundrum' of https://github.com/dbt-msft/dbt-sq…

ed620e0

…lserver into cursor_conundrum

bump version

37231b1

dataders merged commit 733f572 into master Jun 22, 2021

visch mentioned this pull request Jun 23, 2021

Some SQL commands don't report a failure #135

Closed

sdebruyn deleted the cursor_conundrum branch June 4, 2022 08:37

mvorisek mentioned this pull request Oct 21, 2023

Improve fix exception throwing for MSSQL execute statement atk4/data#1145

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

catch error in multi-query batches #140

catch error in multi-query batches #140

dataders commented Jun 15, 2021

dataders commented Jun 15, 2021

jeroen-mostert commented Jun 15, 2021 •

edited

Loading

semcha commented Jun 15, 2021

jtcohen6 commented Jun 15, 2021

visch commented Jun 17, 2021

visch commented Jun 20, 2021

dataders commented Jun 22, 2021

catch error in multi-query batches #140

catch error in multi-query batches #140

Conversation

dataders commented Jun 15, 2021

dataders commented Jun 15, 2021

current approach to fixing

desired (and previous state)

alternative solution split the queries and add/execute them independently?

jeroen-mostert commented Jun 15, 2021 • edited Loading

semcha commented Jun 15, 2021

jtcohen6 commented Jun 15, 2021

visch commented Jun 17, 2021

visch commented Jun 20, 2021

dataders commented Jun 22, 2021

jeroen-mostert commented Jun 15, 2021 •

edited

Loading