Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Support for Excel features when writing #45572

Closed
rhshadrach opened this issue Jan 23, 2022 · 9 comments · Fixed by #45795
Closed

ENH: Support for Excel features when writing #45572

rhshadrach opened this issue Jan 23, 2022 · 9 comments · Fixed by #45795
Labels

Comments

@rhshadrach
Copy link
Member

rhshadrach commented Jan 23, 2022

Context

Currently DataFrame.to_excel supports freeze_panes, and there are requests to support other Excel features (#42560). I've gone through the features of openpyxl and Xlsxwriter, here are other Excel features that pandas might support:

  • autofilter
  • hiding rows/column/worksheets
  • password protection
  • conditional formatting
  • data validation
  • dropdown lists
  • defined names
  • cell comments

Not listed here are styles as I believe the styler supports most of these (cc @ahawryluk).

It seems difficult to justify pandas supporting some of these, but not others. On the other hand, implementing all or even just some of them for each engine adds a lot of complexity and maintenance burden to pandas. I haven't checked, but it may also be the case that some features are supported by certain engines but not by others.

While pandas documents the writer of ExcelWriter as having no public attributes, third party engines document using writer.book e.g. xlsxwriter.

Proposal 1: Add more arguments to write_excel.

Add other arguments to to_excel, such as autofilter, hiding row/columns/worksheets, conditional formatting. At least these three I perceive to be commonly used.

This is the most user friendly option in that it allows users access to these features with a single function call and the ability to switch engines without having to refactor their code. However as mentioned above, this adds complexity to pandas and is a maintenance burden.

Proposal 2: Make writer.book public, prefix all or most other attributes with an _.

This would allow users to use the features that the engine supports directly with ExcelWriter, e.g.

with ExcelWriter("test.xlsx") as writer: 
    df.to_excel(writer)
    worksheet = writer.book.get_worksheet_by_name("Sheet1")
    worksheet...

This would require refactoring the writer.sheets as in current state it can get out of sync with writer.book if e.g. a user does writer.book.add_sheet(...).

This proposal is less complex/maintenance than Proposal 1, more user friendly than Proposal 3.

Proposal 3: Don't support any Excel features.

Users wanting these features would either be required to (a) write their DataFrames directly with the engine of choice or (b) write their DataFrames with pandas, then open a book with the engine of their choice if the engine supports modifying existing Excel workbooks.

Less complex/maintenance than Proposal 2, but also less user friendly.

I am currently in favor of Proposal 2. I also see a mix of Proposal 1 and Proposal 2 as viable, where some features are supported directly as arguments in to_excel with other features require the user to use writer.book for others.

cc @pandas-dev/pandas-core

@twoertwein
Copy link
Member

might also help #44868

@twoertwein
Copy link
Member

Proposal 2 would go well together with making .book a property: There are quite a few issues where users replace .book with an externally loaded workbook.

@WillAyd
Copy link
Member

WillAyd commented Jan 23, 2022

What would prevent us from extending the styler class to support most of these?

@bashtage
Copy link
Contributor

bashtage commented Jan 23, 2022

To me 2 makes more sense as it avoids a lot of complexity while allowing end-users to customize to their heart's content. In the past I have used to_excel followed by my own custom code to add features that cannot be directly accomplished using pandas., essentially 3(b)

@jreback
Copy link
Contributor

jreback commented Jan 23, 2022

+1 on option 2

@ahawryluk
Copy link
Contributor

ahawryluk commented Jan 23, 2022 via email

@attack68
Copy link
Contributor

What would prevent us from extending the styler class to support most of these?

I think some of these would just require a lot of complicated, and non-universal, code to get it working. And it would have have to be coded for Styler, and then parsed by the DataFrame.to_excel, method, duplicating the work.

At the moment styles and number formats to excel is a hacked on component. Styler does not have its own renderer for excel like it does for HTML, LaTeX and String.

For a user keen to do these things, I think its better to just open up some generic API and let them customise. Providing a user guide with good examples is probably worth a lot more. +1 for Proposal 2.

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Jan 24, 2022

+1 for Proposal 2. I also think we should expose writer.sheets in addition to writer.book based on some code I wrote a while back.

@m12t
Copy link

m12t commented Oct 20, 2022

I don't think this merits a new issue on its own just yet, so I'll ask it here as a comment:

tldr: how to set the ExcelWriter.book and .sheets attributes in 1.5.0 and beyond

I'm one of the users using the .book and .sheets attributes before they became public in 1.5.0. The following code broke with the change from 1.4.4 to 1.5.0

book = load_workbook(template_path)  # use the openpyxl function, load_workbook() to grab the template book
writer = pd.ExcelWriter(out_path, engine="openpyxl")
writer.book = book
writer.sheets = {ws.title: ws for ws in book.worksheets}

After updating to 1.5.0, this causes two AttributeErrors, one on book and the other on sheets

After updating to 1.5.1, setting the book attribute no longer errors, but now gives:

  1. FutureWarning: Setting the `book` attribute is not part of the public API, usage can give unexpected or corrupted results and will be removed in a future version

  2.   ....
        self.writer.sheets = {ws.title: ws for ws in self.file.worksheets}
    AttributeError: can't set attribute
    

In the What’s new in 1.5.0 there's a quote

The following attributes are now public and considered safe to access.
book
...
sheets
...

Does this mean book and sheets are read-only?

In the source code for 1.5.1, I noticed that there's a setter for book (which explains why no AttributeError was raised there), but no similar setter exists for sheets. Is the public book setter being deprecated already as is alluded to in the FutureWarning?

In essence, what's the recommended process to set the book and sheet attributes on ExcelWriter?

I've found a solution to my problem, and described it on stack overflow for anybody in a similar situation. It doesn't access any private attributes or use deprecated setters, so should be pretty safe going forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants