Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fixes #12405 by eliding values index by NaT in MPLPlot._get_xticks #14540

Merged
merged 1 commit into from
May 20, 2017

Conversation

tsdlovell
Copy link
Contributor

@tsdlovell tsdlovell commented Oct 30, 2016

Proposed solution: elide rows where x value would be NaT in MPLPlot._get_xticks (which seems like a minor misnomer: its getting x values, not what will ultimately be ticks).

Reordering occurs in MPLPlot._get_xticks and it has a block that is conditional on use_index and is_datetype, so seems like a reasonable place to elide the values. Mutation of self.data already occurs there (reordering) and a read through suggests nothing has saved info related to self.data's shape before this point.

@tsdlovell
Copy link
Contributor Author

Will add a test in test_datetimelike.py but it will probably just repeat the sorting and eliding that occurs in the use_index/is_datetype branch of _get_xticks. Alternatively, could check that all the xvalues are within the nan{min,max} bounds of the original index.

@codecov-io
Copy link

codecov-io commented Oct 30, 2016

Codecov Report

Merging #14540 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #14540      +/-   ##
==========================================
+ Coverage   90.41%   90.41%   +<.01%     
==========================================
  Files         161      161              
  Lines       50997    50999       +2     
==========================================
+ Hits        46107    46109       +2     
  Misses       4890     4890
Flag Coverage Δ
#multiple 88.24% <100%> (ø) ⬆️
#single 40.19% <50%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/plotting/_core.py 81.89% <100%> (+0.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0f55de1...eae5c0e. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Oct 30, 2016

i would add some edge cases
e.g. all nans, single nan first, and single nan last

@jorisvandenbossche jorisvandenbossche added Bug Datetime Datetime data dtype Visualization plotting labels Oct 30, 2016
@tsdlovell
Copy link
Contributor Author

@jreback , I think all NaTs should be a failure: pd.Series().plot() (data is all NaTs and we use the elide NaTs methodology) results in TypeError: Empty 'DataFrame': no numeric data to plot. Perhaps we should alert that we've elided values and that's why there is no data left.

Otherwise, all cases are the same since sort_values() results in the same series regardless of number of non-NaTs, barring all NaTs.

@tsdlovell
Copy link
Contributor Author

tsdlovell commented Oct 30, 2016

incidentally, re @jorisvandenbossche's comment in the orignal bug report, matplotlib complains if you try to plot with NaT as an xvalue in a new plot. But once you're plotting on an axis that has already plotted with a datetime on the index, it accepts NaT values (as the min datetime value).

plt.clf()
ax = plt.gca()

s = (tm.makeTimeSeries()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is overly complicated, can you simplify the creation?

.pipe(inject_nat_into_index_col)
.set_index('index', drop=True))
s.plot(ax=ax)
xdata = ax.get_lines()[0].get_xdata()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a _check_plot_works call

@jreback
Copy link
Contributor

jreback commented Oct 31, 2016

pls add a release note, 0.19.1 bug fixes is fine.

@tsdlovell tsdlovell force-pushed the fix-gh12405 branch 2 times, most recently from 7963fea to 72f9d88 Compare October 31, 2016 15:44
@jorisvandenbossche
Copy link
Member

The way matplotlib deals with missing values (for numerical data) is to leave a gap in the line. So matplotlib can actually deal with nan values. So therefore, we could also opt for this behaviour, rather than just removing those NaN values from the data before plotting.

@tsdlovell
Copy link
Contributor Author

@jorisvandenbossche , we sort the index before we plot the data. Where would we insert the NaT index value?

@@ -39,6 +39,7 @@ Bug Fixes
- Bug in ``pd.read_csv`` for Python 2.x in which Unicode quote characters were no longer being respected (:issue:`14477`)
- Bug in localizing an ambiguous timezone when a boolean is passed (:issue:`14402`)
- Bug in ``TimedeltaIndex`` addition with a Datetime-like object where addition overflow in the negative direction was not being caught (:issue:`14068`, :issue:`14453`)
- Bug in ``plot`` where ``NaT`` in ``DatetimeIndex`` results in ``Timestamp.min`` (:issue: `12405`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move to 0.20.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved

@jreback
Copy link
Contributor

jreback commented Dec 16, 2016

@tsdlovell can you rebase / update

@jreback
Copy link
Contributor

jreback commented Jan 21, 2017

status of this?

@jreback
Copy link
Contributor

jreback commented Mar 14, 2017

@tsdlovell ping!

@jreback
Copy link
Contributor

jreback commented Apr 18, 2017

@tsdlovell can you rebase?

@tsdlovell
Copy link
Contributor Author

@jreback , I had some issues rebase'ing onto master. I ended up just applying the changes manually to the current head and creating a new branch: fix-gh12405-2.

Can I just push that to the original fork branch for this PR or will that mess something up?

@jreback
Copy link
Contributor

jreback commented Apr 21, 2017

yes i think u can just force push it and will work

@jreback
Copy link
Contributor

jreback commented May 13, 2017

@TomAugspurger how does this look?

@@ -1643,6 +1643,7 @@ Plotting
- Bug in ``DataFrame.boxplot`` where ``fontsize`` was not applied to the tick labels on both axes (:issue:`15108`)
- Bug in the date and time converters pandas registers with matplotlib not handling multiple dimensions (:issue:`16026`)
- Bug in ``pd.scatter_matrix()`` could accept either ``color`` or ``c``, but not both (:issue:`14855`)
- Bug in ``plot`` where ``NaT`` in ``DatetimeIndex`` results in ``Timestamp.min`` (:issue: `12405`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move to 0.20.2

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few changes @tsdlovell and can you move the release note to 0.20.2? Thanks!

@@ -815,6 +815,23 @@ def test_mixed_freq_shared_ax(self):
# self.assertEqual(ax1.lines[0].get_xydata()[0, 0],
# ax2.lines[0].get_xydata()[0, 0])

def test_nat_handling(self):

import matplotlib.pyplot as plt # noqa
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this import and use self.plt, which comes from the base class


import matplotlib.pyplot as plt # noqa

fig = plt.gcf()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary for the test to work? Our base class should take care of closing all the figures, so there shouldn't be any hanging around from a previous test.

s.plot(ax=ax)
xdata = ax.get_lines()[0].get_xdata()
# plot x data is bounded by index values
self.assertLessEqual(s.index.min(), Series(xdata).min())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just switched over to pytest, so this can be assert s.index.min() <= Series(xdata).min(). Same thing with the next line.

# plot x data is bounded by index values
self.assertLessEqual(s.index.min(), Series(xdata).min())
self.assertLessEqual(Series(xdata).max(), s.index.max())
_check_plot_works(s.plot)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove this line. It's just calling s.plot() again.

…._get_xticks

TST: add test for fix of pandas-dev#12405
DOC: update whatsnew/v0.20.0.txt
@tsdlovell
Copy link
Contributor Author

@TomAugspurger , let me know if there's something I still need to do.

@TomAugspurger TomAugspurger merged commit a6fcec6 into pandas-dev:master May 20, 2017
@TomAugspurger
Copy link
Contributor

Thanks!

pcluo pushed a commit to pcluo/pandas that referenced this pull request May 22, 2017
…._get_xticks (pandas-dev#14540)

TST: add test for fix of pandas-dev#12405
DOC: update whatsnew/v0.20.2.txt
stangirala pushed a commit to stangirala/pandas that referenced this pull request Jun 11, 2017
…._get_xticks (pandas-dev#14540)

TST: add test for fix of pandas-dev#12405
DOC: update whatsnew/v0.20.2.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Visualization plotting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: plotting with DatetimeIndex containing NaT
6 participants