-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StatisticsConverter::row_group_null_counts
incorrect for missing column
#10926
Comments
take |
@alamb Should we also change the signature and return Something like: let Some(parquet_index) = self.parquet_index else {
return Ok(self
.make_null_array(&DataType::UInt64, metadatas)
.as_any()
.downcast_ref::<UInt64Array>()
.expect("failed to downcast array")
.clone());
}; If we change the signature we also have to make some changes downstream, e.g. in row_groups.rs
EDIT: I guess this answers the second question:
|
Yes I think so
I actually think using I actually hit this issue when working on #10924 And I had to change it to let Some(parquet_index) = self.parquet_index else {
let num_row_groups = metadatas.into_iter().count();
return Ok(Arc::new(UInt64Array::from_iter(
std::iter::repeat(None).take(num_row_groups),
)));
}; To make a test pass |
I passed @alamb just to confirm my understanding (regarding my second question) |
Describe the bug
I noticed this while working on #10852 with @marvinlanhenke
Basially, when generating statistics for a non existent column, the StatisticsExtractor will return a null array of the type of the column not a UInt64Array
Specifically
datafusion/datafusion/core/src/datasource/physical_plan/parquet/statistics.rs
Lines 871 to 886 in 2f43476
The same problem exists for
data_page_null_counts
anddata_page_row_counts
(not forrow_group_row_counts
To Reproduce
Try to call row_group_null_counts for a column that isn't in the parquet file
Expected behavior
UInt64Array
(not anArrayRef
)Additional context
No response
The text was updated successfully, but these errors were encountered: