Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiindex column in to_excel #2701

Closed
hayd opened this issue Jan 15, 2013 · 18 comments
Closed

multiindex column in to_excel #2701

hayd opened this issue Jan 15, 2013 · 18 comments
Labels
Enhancement IO Data IO issues that don't fit into a more specific label IO Excel read_excel, to_excel Output-Formatting __repr__ of pandas objects, to_string
Milestone

Comments

@hayd
Copy link
Contributor

hayd commented Jan 15, 2013

link to #1651

Saving a multiindex column to_excel saves a sparse index.

Migrated from this StackOverflow question, with a smaller DataFrame.

In [1]: m = MultiIndex.from_tuples([(1,1),(1,2)], names=['a','b'])

In [2]: df = DataFrame([[1,2],[3,4]], columns=m)

In [3]: df
Out[3]: 
a  1   
b  1  2
0  1  2
1  3  4

In [4]: df.to_excel('test.xls')

Saves the xls:

    1.1 .2
0   1   2
1   3   4

This differs from how to_csv (which is not sparse):

,"(1L, 1L)","(1L, 2L)"
0,1,2
1,3,4
@joeb1415
Copy link

Both to_excel and to_csv should allow a multiindex on the columns to print on multiple rows

@jreback
Copy link
Contributor

jreback commented Mar 22, 2013

I guess then read_csv would have to take a list for the header argument to reconstruct the mi

@ghost
Copy link

ghost commented Mar 22, 2013

#2478, this is already hiding in the codebase, just needs some TLC.

@jreback
Copy link
Contributor

jreback commented Sep 21, 2013

@hayd @jtratner IIRC this should still be open, right?

@jtratner
Copy link
Contributor

yes, definitely.

@jmcnamara
Copy link
Contributor

I think that has been fixed by #5423:

import pandas as pd

m = pd.MultiIndex.from_tuples([(1,1),(1,2)], names=['a','b'])
df = pd.DataFrame([[1,2],[3,4]], columns=m)

df.to_excel('test.xls')

Output:

screenshot

Can we close this?

@jreback jreback closed this as completed Nov 10, 2013
@mappingvermont
Copy link

Hi all,

I'm trying to write a multiindex dataframe to Excel using the example above, and not getting the same results.

Code:

import pandas as pd

m = pd.MultiIndex.from_tuples([(1,1),(1,2)], names=['a','b'])
df = pd.DataFrame([[1,2],[3,4]], columns=m)

df.to_excel('test.xls')

Output:
capture

There's an extra line being added-- different than the output generated above.

I'm using pandas version 0.19.2 on Ubuntu 14.04 and have also experienced this on Windows 10. I've tried this using pandas version 0.15.0 and it works properly, replicating the output above:

image

Is this a bug? And if so, can we re-open this issue?

Thanks,

Charlie

@juandavid-bolanos
Copy link

@mappingvermont I'm getting the same result with a real example. I think it is a bug. Using latest version pandas 0.19.2

@ronanpaixao
Copy link

@jreback Can this be reopened? I also get this problem.

I've commented this on #6618, which seems related. From what I can see, the problem is that pandas is reserving the first column of the header rows to the MultiIndex, and creates a new line for the row index name even if it is unnamed:

>>> df = pd.DataFrame([[1,2,3],[4,5,6]], columns=pd.MultiIndex.from_tuples([('A'
,''),('B','C'),('B','D')]))
>>> df.to_excel("out.xlsx", index_label="Foo")

pandas_to_excel_bug2

This bug was fixed some time ago for simple columns, but it was probably not a very good solution, since it is still buggy for MultiIndex columns:

>>> df = pd.DataFrame([[1,2,3],[4,5,6]], columns=['A','B','C'])
>>> df.to_excel("out.xlsx", index_label="Foo")

pandas_to_excel_bug3

Version information:

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.1.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.1
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@Tangjiahui26
Copy link

@jreback still got the same problem for the multi-index, wish this issue can be reopened.

@etiennecaldo
Copy link

etiennecaldo commented May 9, 2019

Same here, I will open a new issue and try to find the problem

@Ronkiro
Copy link

Ronkiro commented Jun 6, 2019

@etiennecaldo Did you open the issue?
I'm having the same problem with the latest version of pandas.

@etiennecaldo
Copy link

@Ronkiro no I workarounded it on my side...!

@ferdiaoh
Copy link

This still appears to be an issue for multi-index columns (pandas 0.24.2) - is there any way this can be re-opened?

data = [[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]]
multi_index = pd.MultiIndex.from_product([['a', 'b'], ['one', 'two', 'three']])
df = pd.DataFrame(data, columns=multi_index_one, index=[1, 2])

df.to_excel('test_files/test.xlsx')

Screenshot 2019-07-23 at 10 32 53

pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Darwin
OS-release: 18.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: en_IE.UTF-8
pandas: 0.24.2
pytest: 4.1.1
pip: 18.1
setuptools: 40.6.3
Cython: 0.28.2
numpy: 1.15.4
scipy: 1.1.0
pyarrow: 0.11.1
xarray: None
IPython: 7.2.0
sphinx: None
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: 1.2.0
xlwt: None
xlsxwriter: 1.1.8
lxml.etree: 4.3.0
bs4: 4.7.1
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: 0.2.0
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@Cattes
Copy link

Cattes commented Jul 23, 2019

I got the same issue with a dataset with multiple column and row indizes and would really appreciate a solution.

pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.7.final.0
python-bits: 64
OS: Linux
OS-release: 4.18.0-25-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.4
pytest: 4.5.0
pip: 19.0.3
setuptools: 41.0.0
Cython: None
numpy: 1.15.2
scipy: None
pyarrow: None
xarray: None
IPython: 7.5.0
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: 1.2.0
xlwt: None
xlsxwriter: 1.1.2
lxml: None
bs4: None
html5lib: None
sqlalchemy: 1.3.3
pymysql: 0.9.3
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@stefanopassador-bip
Copy link

Same issue with pandas 0.25.0

pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.24.2
pytest: None
pip: 19.0.3
setuptools: 40.8.0
Cython: None
numpy: 1.16.4
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: 2.6.2
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: 0.2.3

@jmcnamara
Copy link
Contributor

Rather than replying to a feature request issue that was implemented and closed 6 years ago it would probably be best to open a new issue with an example(s) that demonstrate the issue.

@Ronkiro
Copy link

Ronkiro commented Aug 6, 2019

For reference, this issue has been created and can be found at #27772 .

For any more comments, please post there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO Data IO issues that don't fit into a more specific label IO Excel read_excel, to_excel Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

No branches or pull requests