Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage indicators for mandatory columns #86

Merged
merged 6 commits into from
Dec 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions src/reader/text.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
use std::{char::decode_utf16, cmp::min, ffi::CStr, num::NonZeroUsize, sync::Arc};

use arrow::array::{ArrayRef, StringBuilder};
use log::warn;
use odbc_api::{
buffers::{AnySlice, BufferDesc},
DataType as OdbcDataType,
Expand Down Expand Up @@ -59,6 +60,11 @@ fn narrow_text_strategy(
assume_indicators_are_memory_garbage: bool,
) -> Box<dyn ReadStrategy> {
if assume_indicators_are_memory_garbage {
warn!(
"Ignoring indicators, because we expect the ODBC driver of your database to return \
garbage memory. We can not distiguish between empty strings and NULL. Everything is \
empty."
);
Box::new(NarrowUseTerminatingZero::new(octet_len))
} else {
Box::new(NarrowText::new(octet_len))
Expand Down Expand Up @@ -172,8 +178,11 @@ impl ReadStrategy for NarrowUseTerminatingZero {
let str = c_str
.to_str()
.expect("ODBC driver had been expected to return valid utf8, but did not.");
let option = (!str.is_empty()).then_some(str);
builder.append_option(option);
// We always assume the string to be non NULL. Original implementation had mapped empty
// strings to NULL, but this of course does not play well with schemas which have
// mandatory values. Better to accept that here empty strings and NULL are
// indistinguishable, and empty strings are the representation that always work.
builder.append_option(Some(str));
}
Ok(Arc::new(builder.finish()))
}
Expand Down
44 changes: 43 additions & 1 deletion tests/integration.rs
Original file line number Diff line number Diff line change
Expand Up @@ -308,7 +308,11 @@ fn fetch_varchar() {
#[test]
fn fetch_varchar_using_terminating_zeroes_to_indicate_string_length() {
let table_name = function_name!().rsplit_once(':').unwrap().1;
let cursor = cursor_over(table_name, "VARCHAR(50)", "('Hello'),('Bonjour'),(NULL)");
let cursor = cursor_over(
table_name,
"VARCHAR(50)",
"('Hello'),('Bonjour'),(NULL)",
);

let mut quirks = Quirks::new();
quirks.indicators_returned_from_bulk_fetch_are_memory_garbage = true;
Expand All @@ -329,7 +333,45 @@ fn fetch_varchar_using_terminating_zeroes_to_indicate_string_length() {
// Assert that the correct values are found within the arrow batch
assert_eq!("Hello", array_vals.value(0));
assert_eq!("Bonjour", array_vals.value(1));

// This workaround is currently only active for UTF-8. Which in turn is only active on Linux
#[cfg(target_os = "windows")]
assert!(array_vals.is_null(2));
// Due to the ambiguity between empty and NULL we map everything to empty. This can not be
// mapped to NULL, due to the fact, that the schema might be a mandatory column. The
// representation is ambigious, because we need to ignore the indicator buffer.
#[cfg(not(target_os = "windows"))]
assert_eq!("", array_vals.value(2));
}

/// A corner case which has not been accounted for in the first workaround for the indicator
/// garbage returned by DB2. Due to the invalid indicators, in DB2 an empty string and a NULL are
/// indistiguishable. The first iteration had been to map empty strings to NULL, yet, of course,
/// this does not work well for mandatory columns. Therfore, we now use empty strings and never
/// map them to NULL.
#[test]
fn fetch_empty_string_from_non_null_varchar_using_terminating_zeroes_to_indicate_string_length() {
let table_name = function_name!().rsplit_once(':').unwrap().1;
let cursor = cursor_over(table_name, "VARCHAR(50) NOT NULL", "('')");

let mut quirks = Quirks::new();
quirks.indicators_returned_from_bulk_fetch_are_memory_garbage = true;
let mut reader = OdbcReaderBuilder::new()
.with_max_num_rows_per_batch(1)
.with_shims(quirks)
.build(cursor)
.unwrap()
.into_concurrent()
.unwrap();
let record_batch = reader.next().unwrap().unwrap();
let array_vals = record_batch
.column(0)
.as_any()
.downcast_ref::<StringArray>()
.unwrap();

// Assert that value is an empty string
assert_eq!("", array_vals.value(0));
}

/// Fill a record batch of Strings from a nvarchar source column
Expand Down
Loading