Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Added autofilter parameter to Excel output for both xlsxwriter and openpyxl engines #42560

Closed
wants to merge 4 commits into from

Conversation

Upabjojr
Copy link

@Upabjojr Upabjojr commented Jul 16, 2021

Added autofilter parameter to Excel output for both xlsxwriter and openpyxl engines

@Upabjojr Upabjojr changed the title Added autofilter parameter to Excel output for both xlsxwriter and openpyxl engines ENH: Added autofilter parameter to Excel output for both xlsxwriter and openpyxl engines Jul 19, 2021
@Upabjojr
Copy link
Author

Upabjojr commented Jul 22, 2021

Hi @jreback , I've seen you have followed the freeze_pane discussion for the Excel output some years ago. Could you provide some feedback on this PR?

@simonjayhawkins simonjayhawkins added Enhancement IO Excel read_excel, to_excel labels Jul 22, 2021
@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Aug 22, 2021
@Upabjojr
Copy link
Author

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

It would be nice to have this PR merged, it's a fairly simple addition that adds a useful feature to the excel exporter.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is only a feature of openpyxl, let's just do this via the engine_kwargs option. a test with docs would be sufficient. -1 on adding additional kwargs to the already long list.

@Upabjojr
Copy link
Author

Upabjojr commented Sep 9, 2021

since this is only a feature of openpyxl, let's just do this via the engine_kwargs option. a test with docs would be sufficient.

Well, this PR adds a backend for both openpyxl and xlsxwriter. It's not limited to openpyxl only, and could potentially be extended...

engine_kwargs will not work because there is no parameter in both libraries to add autofilters. In fact, we have to use multiple operations in both xlsxwriter and openpyxl in order to add autofilters (they need to be added to every column independently).

-1 on adding additional kwargs to the already long list.

Consider also the simplicity of looking up the documentation with df.to_excel? in Jupyter/IPython. This feature is analogous to freeze_panes and perfectly fits together.

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Oct 18, 2021
@Upabjojr Upabjojr requested a review from jreback October 18, 2021 05:26
@Upabjojr
Copy link
Author

I think this PR is fine the way it is. Don't agree with the previous review at all.

@jreback
Copy link
Contributor

jreback commented Nov 28, 2021

@rhshadrach thoughts here

@rhshadrach
Copy link
Member

rhshadrach commented Dec 1, 2021

This feature is analogous to freeze_panes and perfectly fits together.

I find this reasoning compelling. But it is also compelling for the numerous other features of excel, all with slightly different APIs that might or might not be implemented in different engines. Going down this road seems to me like a maintenance nightmare. Instead, I find your reasoning above to make a compelling case to remove freeze_panes.

It is easy for the user to add an autofilter today:

df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})

with pd.ExcelWriter('test.xlsx', engine='openpyxl') as writer:
    df.to_excel(writer, sheet_name='test')
    writer.sheets['test'].auto_filter.ref = "B1:C3"
    

with pd.ExcelWriter('test2.xlsx', engine='xlsxwriter') as writer:
    df.to_excel(writer, sheet_name='test')
    writer.sheets['test'].autofilter(0, 1, 2, 2)

This is why I think we should implement #43088 - make all protected attributes prefixed with _ and make attributes such as sheets public. Numerous third party libraries have already documented using writer.sheets.

@jreback
Copy link
Contributor

jreback commented Jan 16, 2022

agree with @rhshadrach here, let' privatize things more.

closing this in light.

@jreback jreback closed this Jan 16, 2022
@Upabjojr
Copy link
Author

It is easy for the user to add an autofilter today:

I wouldn't call those examples "easy".

@Upabjojr
Copy link
Author

Instead, I find your reasoning above to make a compelling case to remove freeze_panes.

I just know that an easier API for excel output formatting would make life easier to lots of pandas users.

@rhshadrach
Copy link
Member

I wouldn't call those examples "easy".

Can you expand on this? What do you find difficult?

@Upabjojr
Copy link
Author

Can you expand on this? What do you find difficult?

  • XlsxWriter and OpenPyXL have two different APIs requiring the range to be specified for adding autofilter. In Microsoft Excel and LibreOffice that's just a single operation (no need to specify the range). That is a very simple operation vs a complicated one.
  • End-users of Pandas may not be familiar with programming, let alone backends such as XlsxWriter or OpenPyXL. At this point it's way faster to open the file in Microsoft Excel and add the autofilters there rather than by using python. I see this operation all too often.
  • It's easy to make mistakes when specifying the range of the autofilter and potentially leaving some rows out.

I understand you are not happy with having too many parameters in the .to_excel( ) method and the consequent maintainence problems, but I believe this parameter would ease the life to many people.

I would even go as far as suggesting that the autofilter should be on by default when producing excel outputs.

@rhshadrach
Copy link
Member

Thanks for the response @Upabjojr. I don't believe autofilter should be considered in isolation, so I've opened #45572. Please feel free to share any thoughts there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Add AutoFilter option to to_excel
5 participants