-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
to_excel with MultiIndex adds a blank line #27772
Comments
Note, there is a typo in the import pandas as pd
data = [[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]]
multi_index = pd.MultiIndex.from_product([['a', 'b'], ['one', 'two', 'three']])
df = pd.DataFrame(data, columns=multi_index, index=[1, 2])
df.to_excel('test.xlsx') And I suppose the expected out should be: The blank line is probably there to allow for the index label. For example: import pandas as pd
data = [[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]]
multi_index = pd.MultiIndex.from_product([['a', 'b'], ['one', 'two', 'three']])
df = pd.DataFrame(data, columns=multi_index, index=[1, 2])
df.to_excel('test.xlsx', index_label="Foo") Output: This matches the repl output if the the index name is set: >>> df.index.name = 'foo'
>>> df
a b
one two three one two three
foo
1 1 2 3 4 5 6
2 7 8 9 10 11 12 It is probably a separate question/feature request on whether it should be something like: |
I edited my code snippet to fix the typo. About my problem, how should I remove that line? |
@stefanopassador |
I have tried to_csv just now, and the blank row disappear by setting index name to None. |
@yuylyp I haven't had any luck about it. |
I might get the reason, I found some code in /pandas/io/formats/excel.py as below.
|
@stefanopassador
|
I managed to get a workaround to this problem using quite a mix between offsets and writing the DataFrame by segments. May Extracting the level_0 column names and writing the index you can overcome the blank line problem
and then you can write the DataFrame splitting by key:
However I found a problem, This is the expected output: But what I get is this: You can see that the month are not in order, this problem comes from the line
This gets you the level_0 columns names in alphabetical order instead of the original order and this I haven't been able to overcome. However if your keys do not depend on orders I see no problem using this approach.
|
@yuylyp |
@bartkim0426 |
I dug into this issue for a while and I found out that this code can be intended because of index name. If I add the condition, the problem above is solved. However, some multiple multiindex tests failed (in While Example below: Nan for first row Add blank row
As you can see, Not add blank row
I figured out that this code is written 4 years ago by Chris, and Cleanup by Will 5 months ago. (#26473) So I think this issue can be closed without considering an ambiguous Summary
@WillAyd Can you give an idea about this? |
@bartkim0426 Thanks. That is a good summary of the problems involved here and why the solution isn't straightforward. Another solution might be to have an option to turn off the blank line for cases where the user isn't worried about round-tripping the file through Pandas. In which case the existing tests/behaviour could still be the same. |
@jmcnamara Good idea! |
+1 |
This has always done it, I can remember all the way back to 0.23.4, that blank line is for the names of the multi index levels |
I was debugging it yesterday out of curiosity and can confirm it's due to the name of MultiIndex. I'll tackle this bug sometime in the future. |
I was under the impression it was intended behavior |
For now, here is a quick and dirty workaraound after you have done the export:
|
It looks xlwings can't work under Linux system, is there any workaround for Linux? |
I guess |
Thanks your tips. I found a workaround as writer = pd.ExcelWriter('test.xlsx', engine='xlsxwriter') |
Hi is there a workaround when writing multiple dataframes into one excel sheet? Thanks Edit:
Thanks! |
For some reason this solution only worked once for me. The second time I run the code (regenerating the same df and exporting the file) the code gets stuck in running, and nothing happens. Is there any update on this issue in general, is there any solutions/fixes planned for this? |
In the workaround you mention, there might have been a problem that sometimes an invisible excel process in the background stays open, I have updated the above code, so that it is more stable. The problem should be solved now. |
Hi @tomwojcik is there any update on this issue? :) |
I stopped working on it after someone mentioned that's the expected behavior. Even if it wasn't I'm not interested in fixing it anymore. |
I resolved this problem by overriding these classes(pandas 2.0.3): from pandas.io.formats.style import (
Styler,
Sequence,
Hashable,
IndexLabel,
StorageOptions,
)
import numpy as np
from pandas.io.formats.excel import (
ExcelFormatter,
ExcelCell,
Index,
MultiIndex,
PeriodIndex,
CssExcelCell,
com,
get_level_lengths,
)
from typing import Iterable
class CustomStyler(Styler):
def to_excel(
self,
excel_writer,
sheet_name: str = "Sheet1",
na_rep: str = "",
float_format: str | None = None,
columns: Sequence[Hashable] | None = None,
header: Sequence[Hashable] | bool = True,
index: bool = True,
index_label: IndexLabel | None = None,
startrow: int = 0,
startcol: int = 0,
engine: str | None = None,
merge_cells: bool = True,
encoding: str | None = None,
inf_rep: str = "inf",
verbose: bool = True,
freeze_panes: tuple[int, int] | None = None,
storage_options: StorageOptions = None,
) -> None:
formatter = CustomExcelFormatter(
self,
na_rep=na_rep,
cols=columns,
header=header,
float_format=float_format,
index=index,
index_label=index_label,
merge_cells=merge_cells,
inf_rep=inf_rep,
)
formatter.write(
excel_writer,
sheet_name=sheet_name,
startrow=startrow,
startcol=startcol,
freeze_panes=freeze_panes,
engine=engine,
storage_options=storage_options,
)
class CustomExcelFormatter(ExcelFormatter):
def _format_regular_rows(self) -> Iterable[ExcelCell]:
if self._has_aliases or self.header:
self.rowcounter += 1
# output index and index_label?
if self.index:
# check aliases
# if list only take first as this is not a MultiIndex
if self.index_label and isinstance(
self.index_label, (list, tuple, np.ndarray, Index)
):
index_label = self.index_label[0]
# if string good to go
elif self.index_label and isinstance(self.index_label, str):
index_label = self.index_label
else:
index_label = self.df.index.names[0]
# if isinstance(self.columns, MultiIndex):
# self.rowcounter += 1
if index_label and self.header is not False:
yield ExcelCell(self.rowcounter - 1, 0, index_label, self.header_style)
# write index_values
index_values = self.df.index
if isinstance(self.df.index, PeriodIndex):
index_values = self.df.index.to_timestamp()
for idx, idxval in enumerate(index_values):
yield CssExcelCell(
row=self.rowcounter + idx,
col=0,
val=idxval,
style=self.header_style,
css_styles=getattr(self.styler, "ctx_index", None),
css_row=idx,
css_col=0,
css_converter=self.style_converter,
)
coloffset = 1
else:
coloffset = 0
yield from self._generate_body(coloffset)
def _format_hierarchical_rows(self) -> Iterable[ExcelCell]:
if self._has_aliases or self.header:
self.rowcounter += 1
gcolidx = 0
if self.index:
index_labels = self.df.index.names
# check for aliases
if self.index_label and isinstance(
self.index_label, (list, tuple, np.ndarray, Index)
):
index_labels = self.index_label
# MultiIndex columns require an extra row
# with index names (blank if None) for
# unambiguous round-trip, unless not merging,
# in which case the names all go on one row Issue #11328
# if isinstance(self.columns, MultiIndex) and self.merge_cells:
# self.rowcounter += 1
# if index labels are not empty go ahead and dump
if com.any_not_none(*index_labels) and self.header is not False:
for cidx, name in enumerate(index_labels):
yield ExcelCell(self.rowcounter - 1, cidx, name, self.header_style)
if self.merge_cells:
# Format hierarchical rows as merged cells.
level_strs = self.df.index.format(
sparsify=True, adjoin=False, names=False
)
level_lengths = get_level_lengths(level_strs)
for spans, levels, level_codes in zip(
level_lengths, self.df.index.levels, self.df.index.codes
):
values = levels.take(
level_codes,
allow_fill=levels._can_hold_na,
fill_value=levels._na_value,
)
for i, span_val in spans.items():
mergestart, mergeend = None, None
if span_val > 1:
mergestart = self.rowcounter + i + span_val - 1
mergeend = gcolidx
yield CssExcelCell(
row=self.rowcounter + i,
col=gcolidx,
val=values[i],
style=self.header_style,
css_styles=getattr(self.styler, "ctx_index", None),
css_row=i,
css_col=gcolidx,
css_converter=self.style_converter,
mergestart=mergestart,
mergeend=mergeend,
)
gcolidx += 1
else:
# Format hierarchical rows with non-merged values.
for indexcolvals in zip(*self.df.index):
for idx, indexcolval in enumerate(indexcolvals):
yield CssExcelCell(
row=self.rowcounter + idx,
col=gcolidx,
val=indexcolval,
style=self.header_style,
css_styles=getattr(self.styler, "ctx_index", None),
css_row=idx,
css_col=gcolidx,
css_converter=self.style_converter,
)
gcolidx += 1
yield from self._generate_body(gcolidx)
def pd_to_excel(
self,
excel_writer,
sheet_name: str = "Sheet1",
na_rep: str = "",
float_format: str | None = None,
columns: Sequence[Hashable] | None = None,
header: Sequence[Hashable] | bool = True,
index: bool = True,
index_label: IndexLabel | None = None,
startrow: int = 0,
startcol: int = 0,
engine: str | None = None,
merge_cells: bool = True,
encoding: str | None = None,
inf_rep: str = "inf",
verbose: bool = True,
freeze_panes: tuple[int, int] | None = None,
storage_options: StorageOptions = None,
) -> None:
formatter = CustomExcelFormatter(
self,
na_rep=na_rep,
cols=columns,
header=header,
float_format=float_format,
index=index,
index_label=index_label,
merge_cells=merge_cells,
inf_rep=inf_rep,
)
formatter.write(
excel_writer,
sheet_name=sheet_name,
startrow=startrow,
startcol=startcol,
freeze_panes=freeze_panes,
engine=engine,
storage_options=storage_options,
) |
It's not working now. |
@secsilm It should still work. You just need to change import pandas as pd
data = [[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]]
multi_index = pd.MultiIndex.from_product([["a", "b"], ["one", "two", "three"]])
df = pd.DataFrame(data, columns=multi_index, index=[1, 2])
writer = pd.ExcelWriter("test.xlsx", engine="xlsxwriter")
df.to_excel(writer, sheet_name="test1")
writer.sheets["test1"].set_row(2, None, None, {"hidden": True})
writer.close() Output (note that row 3 is hidden): Tested with |
Thanks. I tried your code, it works. Maybe there was some mistakes in my code. |
Code Sample
Problem description
When exporting an xlsx file from a dataframe with MultiIndex, a blank line is added.
Here a screenshot representing the problem:
![image](https://user-images.githubusercontent.com/6884584/62526705-161b1b80-b83a-11e9-885d-6ed742cb497e.png)
This is happening with pandas 0.25.0.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 0.25.0
numpy : 1.16.4
pytz : 2019.1
dateutil : 2.8.0
pip : 19.2
setuptools : 40.8.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : 0.2.3
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
The text was updated successfully, but these errors were encountered: