Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

join_where ColumnNotFoundError if predicate only uses columns from one side #18745

Closed
2 tasks done
cmdlineluser opened this issue Sep 14, 2024 · 1 comment · Fixed by #19020
Closed
2 tasks done

join_where ColumnNotFoundError if predicate only uses columns from one side #18745

cmdlineluser opened this issue Sep 14, 2024 · 1 comment · Fixed by #19020
Assignees
Labels
accepted Ready for implementation bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@cmdlineluser
Copy link
Contributor

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

left  = pl.DataFrame({"a": [1, 2]}).with_row_index()
right = pl.DataFrame({"b": [1, 2]}).with_row_index()

left.join_where(right, pl.col.index >= pl.col.a)
# ColumnNotFoundError: could not find column a_right in the right table during join operation

Log output

No response

Issue description

It's a nonsense example, but the a_right seems to suggest the order of col() matters in this case.

Reversing the order does produce a result:

>>> left.join_where(right, pl.col.a <= pl.col.index)
# shape: (1, 4)
# ┌───────┬─────┬─────────────┬─────┐
# │ index ┆ a   ┆ index_right ┆ b   │
# │ ---   ┆ --- ┆ ---         ┆ --- │
# │ u32   ┆ i64 ┆ u32         ┆ i64 │
# ╞═══════╪═════╪═════════════╪═════╡
# │ 0     ┆ 1   ┆ 1           ┆ 2   │
# └───────┴─────┴─────────────┴─────┘

Which looks like it actually compared a and index_right?

It seems like a predicate that only refers to columns from one side should be considered Invalid?

Expected behavior

I'm not sure.

Installed versions

--------Version info---------
Polars:               1.7.1
Index type:           UInt32
Platform:             macOS-13.6.1-arm64-arm-64bit
Python:               3.12.2 (main, Feb  6 2024, 20:19:44) [Clang 15.0.0 (clang-1500.1.0.2.5)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
great_tables:         <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
nest_asyncio:         <not installed>
numpy:                1.26.4
openpyxl:             <not installed>
pandas:               2.2.1
pyarrow:              15.0.2
pydantic:             <not installed>
pyiceberg:            <not installed>
sqlalchemy:           <not installed>
torch:                <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
@cmdlineluser cmdlineluser added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Sep 14, 2024
@ritchie46
Copy link
Member

It seems like a predicate that only refers to columns from one side should be considered Invalid?

Yes, it should. We should be able to catch that.

@c-peters c-peters added the accepted Ready for implementation label Oct 6, 2024
@c-peters c-peters moved this from Ready to Done in Backlog Oct 6, 2024
@github-project-automation github-project-automation bot moved this to Ready in Backlog Oct 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants