Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should empty nullable dictionary be parsed as null from arrow-csv? #6821

Closed
edmondop opened this issue Dec 1, 2024 · 2 comments · Fixed by #6830
Closed

Should empty nullable dictionary be parsed as null from arrow-csv? #6821

edmondop opened this issue Dec 1, 2024 · 2 comments · Fixed by #6830
Labels
arrow Changes to the arrow crate bug good first issue Good for newcomers help wanted

Comments

@edmondop
Copy link
Contributor

edmondop commented Dec 1, 2024

This issue in DataFusion apache/datafusion#12041 showcase a scenario where an empty Dictionary is effectively not null, and filtering by null doesn't return that row.

I have tracked the problem up to Arrow CSV and have created a small test case to reproduce it https://github.com/apache/arrow-rs/compare/main...edmondop:arrow-rs:datafusion-12041?expand=1

I am unsure about whether we should close apache/datafusion#12041, change the behavior of arrow-csv, or provide this as an option to the reader maybe?

@tustvold
Copy link
Contributor

tustvold commented Dec 1, 2024

This indeed looks like a bug, and should be a relatively straightforward fix. The issue can be clearly seen when one compares the logic for StringArray with DictionaryArray

rows.iter()
  .map(|row| {
      let s = row.get(i);
      (!null_regex.is_null(s)).then_some(s)
  })
  .collect::<StringArray>(),

vs

rows.iter()
  .map(|row| row.get(i))
  .collect::<DictionaryArray<Int8Type>>(),

@alamb alamb added the arrow Changes to the arrow crate label Dec 17, 2024
@alamb
Copy link
Contributor

alamb commented Dec 17, 2024

label_issue.py automatically added labels {'arrow'} from #6830

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate bug good first issue Good for newcomers help wanted
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants