Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

from_pandas deletes named index info #12727

Closed
2 tasks done
paddymul opened this issue Nov 27, 2023 · 2 comments
Closed
2 tasks done

from_pandas deletes named index info #12727

paddymul opened this issue Nov 27, 2023 · 2 comments
Labels
bug Something isn't working python Related to Python Polars

Comments

@paddymul
Copy link
Contributor

Checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

named_index_df = pd.DataFrame({'a':[1,2,3]}, index=['foo', 'bar', 'baz'])

results in a single column polars dataframe. The ['foo', 'bar', 'baz'] data is dropped.

Screen Shot 2023-11-27 at 5 14 49 PM

Log output

No response

Issue description

It's surprising behavior that data encoded in a pandas index is dropped by polars. I realize polars doesn't support indexes, but it should be able to have a column named index

Expected behavior

I wrote a quick convience function that does the right thing.

def convert_named_index_df(df):
    pldf = pl.from_pandas(df)
    if type(df.index) == pd.core.indexes.base.Index:
        if pd.api.types.is_object_dtype(df.index):
            return pldf.with_columns(
                pl.Series(name="index", values=df.index.to_list())).select(['index'] + pldf.columns)
    return pdf
Screen Shot 2023-11-27 at 5 15 21 PM

Installed versions

--------Version info---------
Polars:              0.19.14
Index type:          UInt32
Platform:            macOS-11.7.4-arm64-arm-64bit
Python:              3.11.5 (main, Sep 11 2023, 08:31:25) [Clang 14.0.6 ]

----Optional dependencies----
adbc_driver_sqlite:  <not installed>
cloudpickle:         <not installed>
connectorx:          <not installed>
deltalake:           <not installed>
fsspec:              <not installed>
gevent:              <not installed>
matplotlib:          <not installed>
numpy:               1.26.2
openpyxl:            <not installed>
pandas:              2.0.3
pyarrow:             11.0.0
pydantic:            1.10.9
pyiceberg:           <not installed>
pyxlsb:              <not installed>
sqlalchemy:          <not installed>
xlsx2csv:            <not installed>
xlsxwriter:          <not installed>

@paddymul paddymul added bug Something isn't working python Related to Python Polars labels Nov 27, 2023
@mcrumiller
Copy link
Contributor

mcrumiller commented Nov 27, 2023

As per #6847 from way back, you need to set include_index=True:

import pandas as pd
import polars as pl

named_index_df = pd.DataFrame({"a": [1, 2, 3]}, index=["foo", "bar", "baz"])
df = pl.from_pandas(named_index_df, include_index=True)
print(df)
shape: (3, 2)
┌──────┬─────┐
│ None ┆ a   │
│ ---  ┆ --- │
│ str  ┆ i64 │
╞══════╪═════╡
│ foo  ┆ 1   │
│ bar  ┆ 2   │
│ baz  ┆ 3   │
└──────┴─────┘

@paddymul
Copy link
Contributor Author

Sorry. I looked at existing bug reports, but probably not enough documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

2 participants