Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For bulk insert, failure in later batch isn't reported to the caller #2347

Closed
hlms opened this issue Mar 14, 2024 · 4 comments
Closed

For bulk insert, failure in later batch isn't reported to the caller #2347

hlms opened this issue Mar 14, 2024 · 4 comments

Comments

@hlms
Copy link

hlms commented Mar 14, 2024

Driver version

Provide the JDBC driver version: mssql-jdbc.12.4.2.jre11.jar

SQL Server version

Provide the output of executing SELECT @@VERSION on your target SQL Server:

Microsoft Azure SQL Edge Developer (RTM) - 16.0.5100.7245 (X64)   Jun 13 2023 12:34:33   Copyright (C) 2022 Microsoft Corporation  Linux (Ubuntu 20.04.6 LTS) <X64>

SQL server is running inside docker 4.27.2.

Client Operating System

Provide the Operating System the client application is running on: Windows 11 enterprise

JAVA/JVM version

Provide the JAVA/JVM version (e.g. java version "1.8.0").

sh-5.2# java -version
openjdk version "11.0.22" 2024-01-16 LTS
OpenJDK Runtime Environment Corretto-11.0.22.7.1 (build 11.0.22+7-LTS)
OpenJDK 64-Bit Server VM Corretto-11.0.22.7.1 (build 11.0.22+7-LTS, mixed mode)

Table schema

Provide the table schema to repro the issue:

CREATE TABLE ST_Orders ([OrderID$] BIGINT,[CustomerID$] NVARCHAR(5),[EmployeeID$] BIGINT,[OrderDate$] DATE,[RequiredDate$] DATE,[ShippedDate$] DATE,[ShipVia$] BIGINT,[Freight$] FLOAT,[ShipName$] NVARCHAR(34),[ShipAddress$] NVARCHAR(46),[ShipCity$] NVARCHAR(15),[ShipRegion$] NVARCHAR(13),[ShipPostalCode$] NVARCHAR(9),[ShipCountry$] NVARCHAR(11),[DW_DM_ORDERS_ORDERS_CKSUM$] INTEGER)

CREATE CLUSTERED COLUMNSTORE INDEX [CS_ST_Orders] ON ST_Orders

ALTER TABLE [ST_Orders]
ADD UNIQUE (OrderID$);

INSERT INTO [ST_Orders] (OrderID$)
VALUES (11065);

Problem description

With bulk insert command execution through jdbc driver, when batchsize is used, if insertion for any batch (probably second batch onwards) fails, the jdbc driver can't detect the failure. It responds as success to the caller.

Steps to reproduce the issue

  1. As mentioned above, create a table, add unique constraint, and add one row manually. (FYI: With the help of unique constraint and already added data, we would get bulk insert fail.)
  2. Use the following files:
    Format.txt
    Orders.txt
    Notice that in Orders.txt, the last record has 11065 as OrderID$, so bulk insert would fail due to unique constraint. 11065 was already added manually above. That's why.
  3. Use following code:
String bulkInsertQuery = "BULK INSERT ST_Orders FROM '/.../data/Orders.txt' WITH (FORMATFILE='/.../data/Format.txt',FIRSTROW=1,ROWS_PER_BATCH=830,BATCHSIZE=400,DATAFILETYPE='widechar')";

// Notice that aforementioned query has BATCHSIZE=400, and the number of records are 830 in the file. So it will execute in three batches: 400 + 400 + 30.

st.executeLargeUpdate(bulkInsertQuery);

Expected behavior

A clear and concise description of what you expected to happen:

  • Insertion of the first two batches should succeed.
  • Insertion of the third batch should fail.
  • The execution should throw exception for the failure.

Actual behavior

Output of what you actually see.
The execution doesn't throw any exception even though the third batch fails. (First two batches are inserted as expected.)

Error message/stack trace

No exception or error message. That's the issue.

Any other details that can be helpful

Add any other context about the problem here.
When we attempt to do the same in SQL Server management studio, it does display the errors:
bulk-insert-failure-in-studio-github

Output:

(400 rows affected)

(400 rows affected)
Msg 2627, Level 14, State 1, Line 19
Violation of UNIQUE KEY constraint 'UQ__ST_Order__E65CA761A9B383C1'. Cannot insert duplicate key in object 'S_bcabbbbf_7118_48dc_a0e0_bbd075ce01de.ST_Orders'. The duplicate key value is (11065).
The statement has been terminated.

So looks like the driver just reads the first message of response and NOT the further messages (response from SQL server).

I haven't yet tried failing the first batch itself.

JDBC trace logs

Provide the JDBC driver trace logs. Instructions can be found here: https://docs.microsoft.com/sql/connect/jdbc/tracing-driver-operation

@hlms
Copy link
Author

hlms commented Mar 14, 2024

Adding some more detail:

st.executeLargeUpdate(bulkInsertQuery) returns a long value.

Attempted to check the behaviour of this when the first batch fails, second batch fails and when the third batch fails.

First batch fails

When the first batch itself fails, the statement throws exception as expected:

2024-03-14 14:18:22 2024-03-14 08:48:22,912+0000 [toe=acgubk80vow677] [Pool Worker - 55] [,,] ERROR - Violation of UNIQUE KEY constraint 'UQ__ST_Order__E65CA761D1FED3F3'. Cannot insert duplicate key in object 'S_bcabbbbf_7118_48dc_a0e0_bbd075ce01de.ST_Orders'. The duplicate key value is (11065). com.microsoft.sqlserver.jdbc.SQLServerException: Violation of UNIQUE KEY constraint 'UQ__ST_Order__E65CA761D1FED3F3'. Cannot insert duplicate key in object 'S_bcabbbbf_7118_48dc_a0e0_bbd075ce01de.ST_Orders'. The duplicate key value is (11065).
2024-03-14 14:18:22     at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:259) ~[mssql-jdbc-12.4.2.jre11.jar:?]
2024-03-14 14:18:22     at com.microsoft.sqlserver.jdbc.TDSTokenHandler.onEOF(tdsparser.java:304) ~[mssql-jdbc-12.4.2.jre11.jar:?]
2024-03-14 14:18:22     at com.microsoft.sqlserver.jdbc.TDSParser.parse(tdsparser.java:137) ~[mssql-jdbc-12.4.2.jre11.jar:?]
2024-03-14 14:18:22     at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1691) ~[mssql-jdbc-12.4.2.jre11.jar:?]
2024-03-14 14:18:22     at com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:920) ~[mssql-jdbc-12.4.2.jre11.jar:?]
2024-03-14 14:18:22     at com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:814) ~[mssql-jdbc-12.4.2.jre11.jar:?]
2024-03-14 14:18:22     at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7675) ~[mssql-jdbc-12.4.2.jre11.jar:?]
2024-03-14 14:18:22     at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:4137) ~[mssql-jdbc-12.4.2.jre11.jar:?]
2024-03-14 14:18:22     at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:272) ~[mssql-jdbc-12.4.2.jre11.jar:?]
2024-03-14 14:18:22     at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:246) ~[mssql-jdbc-12.4.2.jre11.jar:?]
2024-03-14 14:18:22     at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeLargeUpdate(SQLServerStatement.java:774) ~[mssql-jdbc-12.4.2.jre11.jar:?]

When the second batch or third batch fails

st.executeLargeUpdate(bulkInsertQuery) returns 400 (size of first batch).
No exception is thrown.

Successful case

  • Inserted 830 records with batchsize of 400.
  • But the return value is 400 suprisingly.

Question

When does jdbc driver return control to the caller / java application?

  • When the first batch is inserted successful, does the jdbc driver immediately respond with success to the caller (leaving other batches to the SQL Server)?
  • Or when all the batches have been inserted successfully (or with failure), then the jdbc driver return the response to the caller java application?

@Jeffery-Wasty
Copy link
Contributor

Hi @hlms,

I'm not familiar with expected behavior of the driver in this scenario. We'll investigate further and get back to you with an answer/resolution.

@Jeffery-Wasty
Copy link
Contributor

Jeffery-Wasty commented Mar 26, 2024

This is actually a common scenario and we have wiki/MS doc pages for it:
https://docs.microsoft.com/en-us/sql/connect/jdbc/parsing-the-results?view=sql-server-ver16
https://github.com/microsoft/mssql-jdbc/wiki/Handling-SQLExceptions

You can also see previous issues - #367, #399, #826, #937, #995, #1171

Therefore, this is expected driver behavior. Please let us know if you have any further questions, if not we will be closing the issue.

@hlms
Copy link
Author

hlms commented Mar 27, 2024

I'm yet to go through the provided links in detail.

What about introducing a new argument (e.g. RAISE_FAILURES) for Bulk insert (similar to BATCHSIZE argument)? So that if users want bulk insert to surface failure in insertion of any of the batches, then based on this parameter, the driver would raise failures.

The point is - SQL server raises the errors. Shouldn't the driver pass the response as-is instead of swallowing something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Closed Issues
Development

No branches or pull requests

2 participants