Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scanning and sinking a file without changing the file name causes a crash #7841

Open
2 tasks done
lucazanna opened this issue Mar 28, 2023 · 4 comments
Open
2 tasks done
Labels
A-io-parquet Area: reading/writing Parquet files bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@lucazanna
Copy link

lucazanna commented Mar 28, 2023

Polars version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Issue description

Scanning and sinking a file with the same name crashes the system.

Reproducible example

import polars as pl

df = pl.DataFrame({
    'a': [1,2,3]
})

df.write_parquet('df.parquet')

# Works
pl.read_parquet('df.parquet').write_parquet('df.parquet')

# Works
pl.scan_parquet('df.parquet').collect(streaming=True).write_parquet('df.parquet')

# Crashes the system
pl.scan_parquet('df.parquet').sink_parquet('df.parquet')

# Works
pl.scan_parquet('df.parquet').sink_parquet('df2.parquet')

Expected behavior

I would expect an Error message specifying that the source and sink cannot have the same name.

Installed versions

---Version info---
Polars: 0.16.16
Index type: UInt32
Platform: Linux-5.10.147+-x86_64-with-glibc2.31
Python: 3.9.16 (main, Dec  7 2022, 01:11:51) 
[GCC 9.4.0]
---Optional dependencies---
numpy: 1.22.4
pandas: 1.4.4
pyarrow: 11.0.0
connectorx: <not installed>
deltalake: <not installed>
fsspec: 2023.3.0
matplotlib: 3.7.1
xlsx2csv: <not installed>
xlsxwriter: <not installed>
@lucazanna lucazanna added bug Something isn't working python Related to Python Polars labels Mar 28, 2023
@lucazanna lucazanna changed the title Same operation works in eager mode and crashes in streaming mode Scanning and sinking a file without changing the file name causes a crash Mar 28, 2023
@ritchie46
Copy link
Member

Yes, the os will complain that we write to file we opened as mmap. We can add a check for this.

@lucazanna
Copy link
Author

sounds good. I think an error message is all that's needed.

It took me some time to understand why streaming was not working. But it makes sense that you cannot stream from and to the same file.

@ritchie46
Copy link
Member

I shall link to this: https://www.urbandictionary.com/define.php?term=Don%27t%20shit%20where%20you%20eat

^^

@lucazanna
Copy link
Author

that would make for a very memorable error message :)

@stinodego stinodego added needs triage Awaiting prioritization by a maintainer A-io Area: reading and writing data labels Jan 13, 2024
@stinodego stinodego added A-io-parquet Area: reading/writing Parquet files and removed A-io Area: reading and writing data labels Jan 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-io-parquet Area: reading/writing Parquet files bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

3 participants