Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COMPAT: map infers all-nan / empty correctly #18491

Merged
merged 5 commits into from
Nov 26, 2017
Merged

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Nov 25, 2017

xref #18482

@jreback jreback added Compat pandas objects compatability with Numpy or Python functions Dtype Conversions Unexpected or buggy dtype conversions labels Nov 25, 2017
@jreback jreback added this to the 0.22.0 milestone Nov 25, 2017
@jreback jreback mentioned this pull request Nov 25, 2017
3 tasks
@codecov
Copy link

codecov bot commented Nov 26, 2017

Codecov Report

❗ No coverage uploaded for pull request base (master@38f41e6). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #18491   +/-   ##
=========================================
  Coverage          ?   91.32%           
=========================================
  Files             ?      163           
  Lines             ?    49791           
  Branches          ?        0           
=========================================
  Hits              ?    45473           
  Misses            ?     4318           
  Partials          ?        0
Flag Coverage Δ
#multiple 89.12% <100%> (?)
#single 40.71% <0%> (?)
Impacted Files Coverage Δ
pandas/core/indexes/base.py 96.42% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 38f41e6...9447d2a. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 26, 2017

Codecov Report

Merging #18491 into master will increase coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18491      +/-   ##
==========================================
+ Coverage    91.3%   91.32%   +0.02%     
==========================================
  Files         163      163              
  Lines       49781    49791      +10     
==========================================
+ Hits        45451    45473      +22     
+ Misses       4330     4318      -12
Flag Coverage Δ
#multiple 89.12% <100%> (+0.02%) ⬆️
#single 40.71% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/indexes/base.py 96.42% <100%> (+0.01%) ⬆️
pandas/plotting/_converter.py 65.25% <0%> (+1.81%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 38f41e6...9447d2a. Read the comment docs.

@jreback jreback merged commit 68b66ab into pandas-dev:master Nov 26, 2017
@jorisvandenbossche
Copy link
Member

I am not sure I find this special casing a good idea. I would rather keep .map 'dummy' and simply create a new index from the newly created values, without trying to infer anything more than pd.Index(..) already does.
Also if we do this, we should do it consistently and always try to use the original dtype (eg also in Series.map)

In [12]: pd.date_range("2012-01-01", periods=3).map({})
Out[12]: DatetimeIndex(['NaT', 'NaT', 'NaT'], dtype='datetime64[ns]', freq=None)

In [13]: pd.Series(pd.date_range("2012-01-01", periods=3)).map({})
Out[13]: 
0   NaN
1   NaN
2   NaN
dtype: float64

Do you have a specific use case for this? (for wanting to be smart?)
I think the user can always change the dtype after doing the map if it wants something specific or to preserve the index.

(also please try to keep PRs with substantial changes open for a bit longer)

@jreback
Copy link
Contributor Author

jreback commented Nov 27, 2017

@jorisvandenbossche this is not special casing at all, rather the reverse. We DO try to infer based on the return values. However nan by-definition a case that needs handling. It defaults to all float which is kind of arbitrary. Having this retain the character of the original is pretty reasonable here.

[13] is a bug I think.

@jschendel
Copy link
Member

Unless I'm misunderstanding, it doesn't look like this fully hits all of the different types of index. Each index that can hold NA should return the same index class with all NA values, right?

On master:

In [2]: pd.__version__
Out[2]: '0.22.0.dev0+241.gf745e52'

In [3]: pd.date_range('20170101', periods=4).map({})
Out[3]: DatetimeIndex(['NaT', 'NaT', 'NaT', 'NaT'], dtype='datetime64[ns]', freq=None)

IntervalIndex, CategoricalIndex, and Index with object dtype get coerced to Float64Index:

In [4]: pd.interval_range(0, 5).map({})
Out[4]: Float64Index([nan, nan, nan, nan, nan], dtype='float64')

In [5]: pd.CategoricalIndex(list('abca')).map({})
Out[5]: Float64Index([nan, nan, nan, nan], dtype='float64')

In [6]: pd.Index(list('abca')).map({})
Out[6]: Float64Index([nan, nan, nan, nan], dtype='float64')

PeriodIndex and TimedeltaIndex get coerced DatetimeIndex:

In [7]: pd.period_range('2017Q1', periods=4, freq='Q').map({})
Out[7]: DatetimeIndex(['NaT', 'NaT', 'NaT', 'NaT'], dtype='datetime64[ns]', freq=None)

In [8]: pd.timedelta_range(1, periods=4).map({})
Out[8]: DatetimeIndex(['NaT', 'NaT', 'NaT', 'NaT'], dtype='datetime64[ns]', freq=None)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants