-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling kwargs to be passed to pandas.ExcelFile #877
Handling kwargs to be passed to pandas.ExcelFile #877
Conversation
Detects kwargs passed to `IamDataFrame.__init__` and to `read_pandas` that need to be passed to `pandas.ExcelFile.__init__` to be handled properly.
Thanks! One suggestion for adding a test: add a few more keyword arguments, like header, usecols, skiprows, nrows, then make a copy of Line 116 in bc56c53
import_df = IamDataFrame(TEST_DATA_DIR / "test_df.xls", nrows=2)
assert_iamframe_equal(test_df_year.filter(scenario="scen_a"), import_df) |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #877 +/- ##
=====================================
Coverage 95.0% 95.0%
=====================================
Files 64 64
Lines 6134 6216 +82
=====================================
+ Hits 5828 5910 +82
Misses 306 306 ☔ View full report in Codecov by Sentry. |
Added a test that kwargs to `IamDataFrame.__init__` when reading from an Excel file are passed on to `pandas.read_excel` and `pandas.ExcelFile`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but you need to add the pandas-arguments to make that test strategy work.
I guess that the pytest-legacy test (with pandas 2.1.2) is failing because |
Yes. In fact, both the added tests fail with pandas < 2.2 for different reasons. |
Pandas < 2.2.0 does not support calamine as an engine, and has hard-coded some engine_kwargs and therefore does not fully support specifying a custom engine_kwargs keyword argument to `pandas.ExcelFile`.
Updated workflows to include python-calamine in the testing environment.
I pushed a new version now, with a warning if anyone tries to use In order to actually test the new functionality in CI, I also added a new optional dependency group |
Only warn about `engine_kwargs` parameter not being fully supported at pandas < 2.2.0 for users who both have a lower pandas versions *and* actually attempt to use the engine_kwargs keyword. Co-authored-by: Daniel Huppmann <dh@dergelbesalon.at>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much, very nice improvement!
Detects kwargs passed to
IamDataFrame.__init__
and toread_pandas
that need to be passed topandas.ExcelFile.__init__
to be handled properly.This PR closes #876
Please confirm that this PR has done the following:
[ ] Tests Added[ ] Documentation Added[ ] Name of contributors Added to AUTHORS.rstTests: I have run existing tests, and tested locally that the new code reads Excel files correctly with
engine="calamine"
as a kwarg topyam.IamDataFrame
. But a new test under/tests/
that tests the added/fixed functionality (rather than just regression testing) would require that probably aren't intended to be part of the core requirements. If the lead developers would like me to do this, I would appreciate guidance on how to add a test that requires extra optional packages, without breaking the existing test suite.Documentation: As far as I can tell, this is a bug fix, and shouldn't require new documentation. The one new helper function,
utils.get_excel_file_with_kwargs
, has a docstring that explains how it's used.Name in AUTHORS.rst: This is a fairly trivial addition, and I don't need my name in the AUTHORS file for it. But I can add it in a update commit if you do want it there.
Description of PR
The PR modifies lines in
pyam.core.IamDataFrame.__init__
andpyam.utils.read_pandas
that create apandas.ExcelFile
instance to read an Excel file. The modifications ensure that keyword arguments that are recognized bypandas.ExcelFile.__init__
are passed on to thepandas.ExcelFile
instance and removed from calls to later methods (which would cause, e.g.,pandas.read_excel
to raise an exception).The modifications add a new helper function
get_excel_file_with_kwargs
topyam.utils
. The function takes an excel file path and arbitrary kwargs, extracts the keyword arguments that are accepted bypandas.ExcelFile.__init__
, creates a newExcelFile
instance with those keyword arguments, and returns it along with a dict of the other keyword arguments that were not used.The new functionality makes it possible to use the keyword arguments
engine
,storage_options
andengine_kwargs
withIamDataFrame.__init__
andutils.read_pandas
when reading Excel files, which could cause an exception to be raised in the previous version.