DEPS: require updated python-dateutil, openpyxl #18182

jreback · 2017-11-08T22:23:10Z

closes #15184

reminder to change: https://github.com/conda-forge/pandas-feedstock on release

jreback · 2017-11-08T22:24:44Z

cc @neirbowj since you wrote most of the code I just ripped out :>

neirbowj · 2017-11-08T23:21:59Z

👍 It had a good run. I hope folks found it useful. I'm glad to see it retired.

And incidentally, no concerns for the FreeBSD port because the applicable dependencies are already at:

textproc/py-openpyxl 2.4.9
deve/py-dateutil 2.6.1

jreback · 2017-11-08T23:27:12Z

should we bump openpxl to even newer?

neirbowj · 2017-11-09T00:25:32Z

I'm probably not the right person to answer that question. I'm not an openpyxl user and haven't studied the codebase since the last time I got roped into responding to one of that project's minor updates that broke backwards compatibility. Therefore I cannot say whether a more aggressive version bump would be a good idea, bad idea, or indifferent.

From a port maintainer perspective, is there anything you'd like me to investigate?

jreback · 2017-11-09T00:35:10Z

i just want to make sure that i didn’t remove any relevant tests (that were not tied to earlier versions)

neirbowj · 2017-11-09T01:12:30Z

That would be non-trivial for me to verify, and I'm afraid can't justify spending time on it at the moment unless the test removal impacts the FreeBSD port.

codecov · 2017-11-09T22:27:59Z

Codecov Report

Merging #18182 into master will increase coverage by 0.15%.
The diff coverage is 87.5%.

@@            Coverage Diff             @@
##           master   #18182      +/-   ##
==========================================
+ Coverage    91.4%   91.55%   +0.15%     
==========================================
  Files         163      162       -1     
  Lines       50064    49947     -117     
==========================================
- Hits        45759    45730      -29     
+ Misses       4305     4217      -88

Flag	Coverage Δ
#multiple	`89.36% <87.5%> (+0.16%)`	⬆️
#single	`40.42% <62.5%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/io/excel.py	`90.13% <100%> (+9.73%)`	⬆️
pandas/compat/__init__.py	`58.71% <66.66%> (+0.6%)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.8% <0%> (-0.1%)`	⬇️
pandas/io/formats/format.py	`96.01% <0%> (ø)`	⬆️
pandas/core/config_init.py	`98.34% <0%> (ø)`	⬆️
pandas/core/generic.py	`95.72% <0%> (ø)`	⬆️
pandas/core/groupby.py	`92.04% <0%> (+0.01%)`	⬆️
pandas/core/indexes/datetimes.py	`95.48% <0%> (+0.09%)`	⬆️
pandas/plotting/_converter.py	`65.2% <0%> (+1.81%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2f9d4fb...e2d6b53. Read the comment docs.

codecov · 2017-11-09T22:28:10Z

Codecov Report

Merging #18182 into master will increase coverage by 0.14%.
The diff coverage is 87.5%.

@@            Coverage Diff             @@
##           master   #18182      +/-   ##
==========================================
+ Coverage   91.44%   91.59%   +0.14%     
==========================================
  Files         157      156       -1     
  Lines       51449    51328     -121     
==========================================
- Hits        47048    47013      -35     
+ Misses       4401     4315      -86

Flag	Coverage Δ
#multiple	`89.45% <87.5%> (+0.15%)`	⬆️
#single	`40.66% <62.5%> (-0.06%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/excel.py	`90.13% <100%> (+9.73%)`	⬆️
pandas/compat/__init__.py	`58.77% <66.66%> (+0.58%)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.81% <0%> (-0.1%)`	⬇️
pandas/util/testing.py	`82.08% <0%> (-0.1%)`	⬇️
pandas/plotting/_converter.py	`65.25% <0%> (+1.81%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bd9a3e0...f04637b. Read the comment docs.

Themanwithoutaplan · 2017-11-14T13:00:52Z

My suggestion would be to go with >= 2.4.0 as this includes utilities and styles for working with Pandas dataframes which should make integration code a lot easier and reliable. NB. Merged cells are still a problem but openpyxl 2.5 introduces something that, in time, will make these easier to work with.

2.4 also gives you the ws.values property which makes reading XLSX files a lot easier than the current xlrd spaghetti. xlrd is slightly faster on really big files due to a lower memory overhead but my suggestion would be to allow openpyxl in read-only mode (parallelisation of worksheets should be possible).

Would be happy to discuss this with anyone but haven't time to do it myself at the moment.

2.4 is stable and I haven't seen anything 2.3 specific issues for a good long time.

jreback · 2017-11-14T13:05:27Z

@Themanwithoutaplan thanks for the suggestion.

Themanwithoutaplan · 2017-11-14T13:06:52Z

http://openpyxl.readthedocs.io/en/default/pandas.html explains everything (based on the work we did together in Montreal).

Themanwithoutaplan · 2017-11-14T13:10:36Z

Also, support for a dedicated datetime celltype is coming 2.5. Removes some of the problems associated with Excel's moronic handling of dates and times.

jreback · 2017-11-14T13:24:51Z

@Themanwithoutaplan is this api supported in >= 2.4? (would you do a PR to change the pandas code)?

Themanwithoutaplan · 2017-11-14T14:45:09Z

New datetimes are 2.5 only. Not sure when I can do a PR. We are having a sprint here in Düsseldorf next week but I was hoping to be able work on something else…

I would like to get 2.5 out the door. It's just missing the code for adding/removing rows and columns.

jreback · 2017-11-16T11:02:45Z

any comments @jorisvandenbossche @TomAugspurger

jorisvandenbossche

For openpyxl, it seems all code you removed was for versions < 2.2. Is there a reason to not just bump the minimum required to 2.2 instead of directly 2.4 in the PR?

For dateutil, do we win (code cleanup wise) a lot by not leaving the minimum version at eg 2.4 ?

Further:

Can you also update the io.rst docs? (there is a mention of the needed openpyxl version somewhere I think)

jorisvandenbossche · 2017-11-16T12:33:48Z

For dateutil, do we win (code cleanup wise) a lot by not leaving the minimum version at eg 2.4 ?

Although, by the time 0.22 will be released, dateutil 2.5 will probably be almost 2 years old, so that is probably fine.

jreback · 2017-11-16T13:11:08Z

For openpyxl, it seems all code you removed was for versions < 2.2. Is there a reason to not just bump the minimum required to 2.2 instead of directly 2.4 in the PR?

why? that's a pretty old version, I don't see a reason to not require newer things in developing libraries (esp there is a speedup coming), see the above comments.

jorisvandenbossche · 2017-11-16T13:13:58Z

openpyxl 2.4 is just 1 year old, which is not that much. If it doesn't cost us anything to provide users the ability to use older version, I would do that (at the moment it is simply the check of the version that would need to be changed no?). If we would start using those newer features, then sure.

jreback · 2017-11-27T11:36:50Z

@jorisvandenbossche @TomAugspurger

jorisvandenbossche · 2017-11-27T14:21:59Z

doc/source/whatsnew/v0.22.0.txt

+   | openpyxl        | 2.4.0           |          |
+   +-----------------+-----------------+----------+
+
+- :func:`Series.fillna` now raises a ``TypeError`` instead of a ``ValueError`` when passed a list, tuple or DataFrame as a ``value`` (:issue:`18293`)


merge issue here

jorisvandenbossche · 2017-11-27T14:27:05Z

Please also update io.rst (there are several mentions of openpyxl and its supported version)

What is the advantage of hard dropping support for older pytz versions? It does not cost us anything? (I mean in code complexity)

Definitely drop support for openpyxl < 2.4. There were changes in styles in all 2.x releases. 2.4 was built to provide an API for Pandas so that the spaghetti code could be cleared out. < 2.4 is not maintained.

But we do not use this API at this point? As we only have special cased code for openpyxl < 2.2. If there have been many changes in style, for me that is a reason more to not drop support (as long as we don't use features of the newer releases), if people want to preserve the actual styling they have right now.

jreback · 2017-11-28T11:39:25Z

What is the advantage of hard dropping support for older pytz versions? It does not cost us anything? (I mean in code complexity)

this is true, yeah just wanted to set some sort of minimum, though I guess its not really needed.

But we do not use this API at this point? As we only have special cased code for openpyxl < 2.2. If there have been many changes in style, for me that is a reason more to not drop support (as long as we don't use features of the newer releases), if people want to preserve the actual styling they have right now.

not the point, there was too much churn in the < 2.4 versions.

Themanwithoutaplan · 2017-11-28T11:44:20Z

openpyxl used to have unreliable and incorrect style implementation so backwards-incompatible changes were essential. Not doing users any favours by keeping support for these around, especially as code coverage was never complete I've just managed to update my fork and hope to work on the > 2.4 implementation in the near future. Using the API will make it much easier in future to decide who's responsible.

pytest pandas/tests/io/test_excel.py is all I need to worry about, right?

Themanwithoutaplan · 2017-11-29T08:31:43Z

I'm not having a lot of fun getting tests to run: segfault with Python 3.6 on Mac OS and unable to run test_excel.py in isolation. I'd better check the docs.

jreback · 2017-11-29T13:28:32Z

@jorisvandenbossche

Themanwithoutaplan · 2017-11-29T18:32:49Z

I've started work on this.

Themanwithoutaplan · 2017-11-30T10:55:42Z

Looking at the implementation: is it going to be possible to work directly with a dataframe? openpyxl doesn't need the intermediate ExcelCell object.

jreback · 2017-11-30T11:44:23Z

@Themanwithoutaplan the implementation is orthogonal to us changing versions
what would hapoen if you had a version of openpyxl that can process things more efficiently
then we could dispatch it that way, keeping existing code for older versions
if it passes all tests then that would be fine

so i assume this is not possible currently?

Themanwithoutaplan · 2017-11-30T13:36:06Z

I'm going to look at interfacing directly from df.to_excel() using openpyxl.utils.dataframe.dataframe_to_rows() This removes the need for an ExcelWriter class entirely. The older code could then be binned entirely.

closes pandas-dev#15184

Themanwithoutaplan · 2017-12-04T16:13:17Z

After some further investigation It looks to me that the best fit for the modern openpyxl API would be have a drop-in replacement for ExcelFormatter class.

jreback added Dependencies Required and optional dependencies IO Excel read_excel, to_excel labels Nov 8, 2017

jreback added this to the 0.22.0 milestone Nov 8, 2017

jreback force-pushed the versions branch from 56d0dc7 to e2d6b53 Compare November 9, 2017 22:27

jreback force-pushed the versions branch 6 times, most recently from 8d28316 to 034528a Compare November 13, 2017 11:29

jreback force-pushed the versions branch from 034528a to 81d05dc Compare November 15, 2017 11:05

jorisvandenbossche reviewed Nov 16, 2017

View reviewed changes

jreback changed the title ~~DEPS: require updated python-dateutil & openpyxl~~ DEPS: require updated python-dateutil, openpyxl, pytz Nov 23, 2017

jreback force-pushed the versions branch 3 times, most recently from 258f5e9 to b653b47 Compare November 27, 2017 00:38

jorisvandenbossche reviewed Nov 27, 2017

View reviewed changes

jreback force-pushed the versions branch from b653b47 to 6cf1dd2 Compare November 28, 2017 12:11

jreback force-pushed the versions branch 3 times, most recently from 25594ac to 38a339e Compare December 3, 2017 18:49

jreback changed the title ~~DEPS: require updated python-dateutil, openpyxl, pytz~~ DEPS: require updated python-dateutil, openpyxl Dec 4, 2017

jreback added 7 commits December 4, 2017 06:34

BLD: require python-dateutil>=2.5.0

371a257

DEPS: require openpyxl >= 2.3.2

3fad668

closes pandas-dev#15184

require pytz >= 2016.1

c8d18a3

more build deps

d2f2e48

revert pytz

07b4895

update setup.py

468b028

edit whatsnew

a0730b5

jreback force-pushed the versions branch from f04637b to a0730b5 Compare December 4, 2017 11:35

jreback merged commit 3e4e4b3 into pandas-dev:master Dec 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEPS: require updated python-dateutil, openpyxl #18182

DEPS: require updated python-dateutil, openpyxl #18182

jreback commented Nov 8, 2017 •

edited

Loading

jreback commented Nov 8, 2017

neirbowj commented Nov 8, 2017

jreback commented Nov 8, 2017

neirbowj commented Nov 9, 2017

jreback commented Nov 9, 2017

neirbowj commented Nov 9, 2017

codecov bot commented Nov 9, 2017

codecov bot commented Nov 9, 2017 •

edited

Loading

Themanwithoutaplan commented Nov 14, 2017

jreback commented Nov 14, 2017

Themanwithoutaplan commented Nov 14, 2017

Themanwithoutaplan commented Nov 14, 2017

jreback commented Nov 14, 2017

Themanwithoutaplan commented Nov 14, 2017

jreback commented Nov 16, 2017

jorisvandenbossche left a comment

jorisvandenbossche commented Nov 16, 2017

jreback commented Nov 16, 2017

jorisvandenbossche commented Nov 16, 2017 •

edited

Loading

jreback commented Nov 27, 2017

jorisvandenbossche Nov 27, 2017

jorisvandenbossche commented Nov 27, 2017

jreback commented Nov 28, 2017

Themanwithoutaplan commented Nov 28, 2017 •

edited

Loading

Themanwithoutaplan commented Nov 29, 2017

jreback commented Nov 29, 2017

Themanwithoutaplan commented Nov 29, 2017

Themanwithoutaplan commented Nov 30, 2017

jreback commented Nov 30, 2017

Themanwithoutaplan commented Nov 30, 2017

Themanwithoutaplan commented Dec 4, 2017

DEPS: require updated python-dateutil, openpyxl #18182

DEPS: require updated python-dateutil, openpyxl #18182

Conversation

jreback commented Nov 8, 2017 • edited Loading

jreback commented Nov 8, 2017

neirbowj commented Nov 8, 2017

jreback commented Nov 8, 2017

neirbowj commented Nov 9, 2017

jreback commented Nov 9, 2017

neirbowj commented Nov 9, 2017

codecov bot commented Nov 9, 2017

Codecov Report

codecov bot commented Nov 9, 2017 • edited Loading

Codecov Report

Themanwithoutaplan commented Nov 14, 2017

jreback commented Nov 14, 2017

Themanwithoutaplan commented Nov 14, 2017

Themanwithoutaplan commented Nov 14, 2017

jreback commented Nov 14, 2017

Themanwithoutaplan commented Nov 14, 2017

jreback commented Nov 16, 2017

jorisvandenbossche left a comment

Choose a reason for hiding this comment

jorisvandenbossche commented Nov 16, 2017

jreback commented Nov 16, 2017

jorisvandenbossche commented Nov 16, 2017 • edited Loading

jreback commented Nov 27, 2017

jorisvandenbossche Nov 27, 2017

Choose a reason for hiding this comment

jorisvandenbossche commented Nov 27, 2017

jreback commented Nov 28, 2017

Themanwithoutaplan commented Nov 28, 2017 • edited Loading

Themanwithoutaplan commented Nov 29, 2017

jreback commented Nov 29, 2017

Themanwithoutaplan commented Nov 29, 2017

Themanwithoutaplan commented Nov 30, 2017

jreback commented Nov 30, 2017

Themanwithoutaplan commented Nov 30, 2017

Themanwithoutaplan commented Dec 4, 2017

jreback commented Nov 8, 2017 •

edited

Loading

codecov bot commented Nov 9, 2017 •

edited

Loading

jorisvandenbossche commented Nov 16, 2017 •

edited

Loading

Themanwithoutaplan commented Nov 28, 2017 •

edited

Loading