Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sendStringParametersAsUnicode not considered for SQLServerBulkCopy? #1181

Closed
pbcornelius opened this issue Nov 8, 2019 · 7 comments
Closed

Comments

@pbcornelius
Copy link

Driver version

7.4.1

Problem description

  1. Expected behaviour:
    SQLServerBulkCopy should treat VARCHAR and LONGVARCHAR like LONGNVARCHAR and NVARCHAR if sendStringParametersAsUnicode = True; or, alternatively, provide a seperate configuration for this (if there is one, I apologise, but I could not find it here or here).

  2. Actual behaviour:
    It does not. As a result, unicode characters in VARCHAR columns are lost during transfer. This means one cannot transfer unicode from databases that do not distinguish between VARCHAR and NVARCHAR (such as H2).

  3. Error message/stack trace:
    None

  4. Any other details that can be helpful:
    A workaround is to wrap the ResultSetMetaData and manually swap VARCHAR for NVARCHAR, e.g.:

public class UnicodeResultSetMetaData implements ResultSetMetaData {

...

private static List<Integer> UNICODE_EQUIVALENT = List.of(Types.VARCHAR, Types.LONGVARCHAR);

private static List<String> UNICODE_EQUIVALENT_NAMES = List.of("VARCHAR",
		"CHARACTER VARYING",
		"LONGVARCHAR",
		"VARCHAR2",
		"VARCHAR_CASESENSITIVE");

public int getColumnType(int column) throws SQLException {
	int type = meta.getColumnType(column);
	if (UNICODE_EQUIVALENT.contains(type)) {
		return Types.NVARCHAR;
	} else {
		return type;
	}
}

public String getColumnTypeName(int column) throws SQLException {
	String type = meta.getColumnTypeName(column);
	if (UNICODE_EQUIVALENT_NAMES.contains(type)) {
		return "NVARCHAR";
	} else {
		return type;
	}
}
@peterbae
Copy link
Contributor

peterbae commented Nov 8, 2019

@pbcornelius thanks for letting us know, the team will look into this.

@ulvii
Copy link
Contributor

ulvii commented Nov 15, 2019

Hi @pbcornelius ,
There are multiple ways of performing bulk copy operation. For me to get a better idea about your scenario, please provide standalone JDBC reproduction code and I will take a look into it. It is fine if your code involves H2 connection.

@pbcornelius
Copy link
Author

pbcornelius commented Nov 15, 2019

Thanks for looking into this. Here's my code. It should reproduce that error with any H2 -> SQL Server 2017 non-UTF CP, VARCHAR to NVARCHAR transfer.

try (Connection con1 = DriverManager.getConnection("h2 url", user1, pw1);
		Connection con2 = DriverManager.getConnection("mssql url", user2, pw2);
		Statement stmt1 = con1.createStatement();
		Statement stmt2 = con2.createStatement();
		SQLServerBulkCopy bulkCopy = new SQLServerBulkCopy(con2)) {
	
	// source data
	CustomResultSet rs1 = new CustomResultSet("SELECT * FROM some_table");
	
	// destination table
	bulkCopy.setDestinationTableName("dest_table");
	
	// other options
	SQLServerBulkCopyOptions options = new SQLServerBulkCopyOptions();
	options.setKeepIdentity(true);
	options.setKeepNulls(true);
	options.setBatchSize(5000);
	options.setBulkCopyTimeout(0);
	bulkCopy.setBulkCopyOptions(options);
	
	// execute
	bulkCopy.writeToServer(rs1);
}

A workaround that's better than the one I posted above is to again wrap the ResultSetMetaData of the source table and use meta data from the destination table to overwrite which columns should be reported back as VARCHAR and which as NVARCHAR:

// destination table unicode columns? manually set ResultSet1 column types to NVARCHAR.
ResultSet rs2 = stmt2.executeQuery("SELECT TOP 0 * FROM dest_table");
ResultSetMetaData rs2Meta = rs2.getMetaData();
for (int i = 1; i <= rs2Meta.getColumnCount(); i++) {
	if (rs2Meta.getColumnType(i) == Types.NVARCHAR || rs2Meta.getColumnType(i) == Types.LONGNVARCHAR) {
		// assumes matching i across tables / can be easily changed to match on column names
		rs1.getMetaData().addUnicodeColumn(i);
	}
}

Where rs1.getMetaData() is a ResultSetMetaData wrapper with custom implementations:

private List<Integer> unicodeColumns = new ArrayList<>();

...

public void addUnicodeColumn(int column) {
	unicodeColumns.add(column);
}

public int getColumnType(int column) throws SQLException {
	if (unicodeColumns.contains(column)) {
		return Types.NVARCHAR;
	} else {
		return meta.getColumnType(column);
	}
}

public String getColumnTypeName(int column) throws SQLException {
	if (unicodeColumns.contains(column)) {
		return "NVARCHAR";
	} else {
		return meta.getColumnTypeName(column);
	}
}

@ulvii
Copy link
Contributor

ulvii commented Nov 26, 2019

Hi @pbcornelius ,
I think I have a fix for your scenario. Please try the attached
jars and let me know if the issue is resolved. I will create a PR as soon as I complete testing.
issue1181.zip

@pbcornelius
Copy link
Author

It seemed to work in a quick test with this as the source SELECT statement:
SELECT 'testabcဦ'

ဦ should be outside of the CP1252 scope.

I did not set sendStringParametersAsUnicode in the connection URL, so I guess that's the default behaviour now for bulk copy?

@ulvii
Copy link
Contributor

ulvii commented Nov 27, 2019

Hi @pbcornelius ,
Thanks for confirming. Yes the behavoiur is by default, if the destination column is unicode, it makes sense to send the data as unicode. sendStringParametersAsUnicode is for PreparedStatement parameters, should not be used for bulk copy.

@ulvii
Copy link
Contributor

ulvii commented Dec 6, 2019

#1193 merged

@ulvii ulvii closed this as completed Dec 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants