Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(python): Handle boolean comparisons in Iceberg predicate pushdown #18199

Merged
merged 1 commit into from
Jan 25, 2025

Conversation

dimfeld
Copy link
Contributor

@dimfeld dimfeld commented Aug 15, 2024

Filtering Iceberg table boolean columns on boolean literals fails trying to parse pa.compute.scalar(True). This updates the Call predicate parser to handle the scalar type.

I originally posted about this in the Discord here which has a few more details: https://discord.com/channels/908022250106667068/1273409357685592159

table = pl.scan_iceberg(iceberg.load_table("codes.icd10cm"))

# is_header is a boolean column in the Iceberg table
# Both of these statements fail in the same way
table.filter(pl.col("is_header") == False).collect()
table.sql("SELECT count(*) from self where is_header = false").collect()

The error was

File .venv/lib/python3.12/site-packages/polars/lazyframe/frame.py:2027, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, cluster_with_columns, no_optimization, streaming, engine, background, _eager, **_kwargs)
       2025 # Only for testing purposes
       2026 callback = _kwargs.get("post_opt_callback", callback)
    -> 2027 return wrap_df(ldf.collect(callback))
    
    ComputeError: ValueError: Could not convert predicate to PyIceberg: (pa.compute.field('is_header') == pa.compute.scalar(False))

which I traced back to the function that I've modified in this PR.

In addition to the parsing test I added here, it might also advisable to update the Iceberg table fixture as well and modify test_scan_iceberg_filter_on_column. But I'm not really sure what's the proper way to add a column to the fixture.

@github-actions github-actions bot added fix Bug fix python Related to Python Polars labels Aug 15, 2024
@dimfeld
Copy link
Contributor Author

dimfeld commented Aug 15, 2024

CI failures in some Delta table tests which seem unrelated.

@@ -163,3 +163,12 @@ def test_parse_lteq(self) -> None:

expr = _to_ast("(pa.compute.field('ts') <= '2023-08-08')")
assert _convert_predicate(expr) == LessThanOrEqual("ts", "2023-08-08")

def test_compare_boolean(self) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a test to validate you get the correct result?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to, yes. As I mentioned in the last paragraph of the PR description I’m not really sure how to go about that. Is there some other way that doesn’t involve recreating the iceberg table fixture with a new Boolean column?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't really know about iceberg. Ok, let's merge for now.

@ritchie46
Copy link
Member

@dimfeld can you rebase and make CI green?

Filtering columns on boolean literals failed trying to parse
`pa.compute.scalar(True)`. This updates the Call predicate parser to
handle the `scalar` type.
@dimfeld dimfeld force-pushed the iceberg-filter-boolean branch from 261193b to 69fc732 Compare August 27, 2024 18:43
Copy link

codecov bot commented Aug 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.89%. Comparing base (d12131a) to head (69fc732).
Report is 1124 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #18199      +/-   ##
==========================================
- Coverage   79.89%   79.89%   -0.01%     
==========================================
  Files        1495     1495              
  Lines      200214   200216       +2     
  Branches     2867     2868       +1     
==========================================
- Hits       159966   159955      -11     
- Misses      39702    39715      +13     
  Partials      546      546              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dimfeld
Copy link
Contributor Author

dimfeld commented Aug 27, 2024

Looks like we're all good now. Thanks!

@ritchie46
Copy link
Member

Wow, totally missed this one. Merging! 🙈

@ritchie46 ritchie46 merged commit 8552f60 into pola-rs:main Jan 25, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants