-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug with handling of null values in dictionaries #70
Conversation
Unfortunately:
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #70 +/- ##
==========================================
+ Coverage 82.80% 83.02% +0.22%
==========================================
Files 15 15
Lines 1128 1143 +15
Branches 1128 1143 +15
==========================================
+ Hits 934 949 +15
Misses 132 132
Partials 62 62 ☔ View full report in Codecov by Sentry. |
Let's add a test case to show that we return nulls in the dictionary keys in cases when we would produce null values. I wonder if we should switch to using dictionary builders immediately if we're going through the effort to re-pack dictionaries at the end anyway. |
I pushed a test and simplified to just set keys to null as you suggested. I can confirm the test used to fail. |
Currently given the query
json_get_text(col, 'a')
on the data['{'x': 0}', '{'x': 0}', '{'a': 1}']
wherecol
is a dictionary encoded column originally with keys[0, 0, 1]
and values['{'x': 0}', '{'x': 1}']
we return a dictionary with keys[0, 0, 1]
and values[null, null, 1]
.But if you look at how
arrow-rs
builds up dictionaries they always put the nulls in the keys, not the values. The spec does not require this, but I think that things inarrow-rs
or DataFusion assume it is so (based on panics I've seen in prod).This PR works around those bugs elsewhere while we investigate them further.