Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): add include_index option on init from pandas frames #6847

Merged
merged 3 commits into from
Feb 14, 2023

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Feb 13, 2023

Closes #6763.

Allows for optional loading of non-default pandas frame indexes as columns; defaults to False to ensure no changes in behaviour - we can revisit that default later, if requested/desired.

Setup

from datetime import datetime
import pandas as pd
import polars as pl

pdf = pd.DataFrame(
  {
    "dtm": [datetime(2023,1,1), datetime(2023,1,2)],
    "val": [100, 200],
    "misc": ["x", "y"],
  }
).set_index( ["dtm"] )

#             val misc
# dtm                 
# 2023-01-01  100    x
# 2023-01-02  200    y

Before (caller has to manually reset the index if they want to load it)

pl.from_pandas( pdf )

# shape: (2, 2)
# ┌─────┬──────┐
# │ val ┆ misc │
# │ --- ┆ ---  │
# │ i64 ┆ str  │
# ╞═════╪══════╡
# │ 100 ┆ x    │
# │ 200 ┆ y    │
# └─────┴──────┘

After (new param allows for easy/optimised index load)

pl.from_pandas( pdf, include_index=True )

# shape: (2, 3)
# ┌─────────────────────┬─────┬──────┐
# │ dtm                 ┆ val ┆ misc │
# │ ---                 ┆ --- ┆ ---  │
# │ datetime[ns]        ┆ i64 ┆ str  │
# ╞═════════════════════╪═════╪══════╡
# │ 2023-01-01 00:00:00 ┆ 100 ┆ x    │
# │ 2023-01-02 00:00:00 ┆ 200 ┆ y    │
# └─────────────────────┴─────┴──────┘

Update

Using this method is now optimal in comparison to the caller using reset_index, as we're able to avoid the pandas-side copy that would trigger.

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Feb 13, 2023
@ritchie46
Copy link
Member

Nice! Shall we immediately make it keyword only?

@alexander-beedie
Copy link
Collaborator Author

alexander-beedie commented Feb 13, 2023

Nice! Shall we immediately make it keyword only?

Sure, let's do it; it is this season's on-trend coding style... :)
Will update that along with tweaking the index reset to guarantee no unnecessary pandas-side copies.

@alexander-beedie
Copy link
Collaborator Author

alexander-beedie commented Feb 13, 2023

@ritchie46: done - include_index is now kwarg-only, and the index load has been memory-optimised 👍

@ritchie46
Copy link
Member

Thanks everybody! For the good reviews and solutions!

@ritchie46 ritchie46 merged commit d9a683a into pola-rs:master Feb 14, 2023
@alexander-beedie alexander-beedie deleted the pandas-frame-indexes branch February 14, 2023 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pl.from_pandas() ignores index
4 participants