Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add type hints to dtypes/dtypes.py (CategoricalDtype) #26327

Merged
merged 5 commits into from
Jun 28, 2019

Conversation

topper-123
Copy link
Contributor

@topper-123 topper-123 commented May 9, 2019

This PR adds type hints to pandas/core/dtypes/dtypes.py down to and including the CategoricalDtype part of the module. I'm more familiar with Categoricals than DateTimeArrays etc. and because I think there are some difficulties in implementing type hints, I think it would make sense to add a general ruleset for adding type hints to pandas, before too much work is put into this.

I'm putting some thoughts up for discussion in #25601 about how we add type hints to pandas, that probably should be discussed before this goes further.

@codecov
Copy link

codecov bot commented May 9, 2019

Codecov Report

Merging #26327 into master will decrease coverage by 51.29%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26327      +/-   ##
==========================================
- Coverage   92.04%   40.74%   -51.3%     
==========================================
  Files         175      175              
  Lines       52297    52304       +7     
==========================================
- Hits        48138    21313   -26825     
- Misses       4159    30991   +26832
Flag Coverage Δ
#multiple ?
#single 40.74% <100%> (-0.14%) ⬇️
Impacted Files Coverage Δ
pandas/core/arrays/categorical.py 41.36% <100%> (-54.6%) ⬇️
pandas/core/dtypes/dtypes.py 73.7% <100%> (-22.72%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/core/groupby/categorical.py 0% <0%> (-100%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
pandas/io/formats/html.py 0% <0%> (-99.37%) ⬇️
pandas/io/sas/sas7bdat.py 0% <0%> (-91.16%) ⬇️
pandas/io/sas/sas_xport.py 0% <0%> (-90.1%) ⬇️
... and 132 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 17247ed...69b4bb1. Read the comment docs.

@codecov
Copy link

codecov bot commented May 9, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@de0867f). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #26327   +/-   ##
=========================================
  Coverage          ?   92.04%           
=========================================
  Files             ?      175           
  Lines             ?    52304           
  Branches          ?        0           
=========================================
  Hits              ?    48142           
  Misses            ?     4162           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.59% <100%> (?)
#single 40.75% <100%> (?)
Impacted Files Coverage Δ
pandas/core/arrays/categorical.py 95.96% <100%> (ø)
pandas/core/dtypes/dtypes.py 96.71% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update de0867f...8f6c2e0. Read the comment docs.

"""
Parameters
----------
dtype : ExtensionDtype
"""
if not issubclass(dtype, (PandasExtensionDtype, ExtensionDtype)):
if not issubclass(dtype, ExtensionDtype):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All sublcasses of PandasExtensionDtype are now also ExtensionDtype, so I think this check is not needed anymore.

@TomAugspurger, do you agree?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds correct.

@topper-123 topper-123 force-pushed the typing_dtypes.py branch 2 times, most recently from 3580266 to ea08c99 Compare May 9, 2019 20:37
@WillAyd WillAyd added the Typing type annotations, mypy/pyright type checking label May 9, 2019
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! FYI this is the first PR to add annotations without fixing an existing type failure, so might take a few iterations to work out both the contribution and review side of things

@@ -1,13 +1,15 @@
""" define extension dtypes """
import re
from typing import Any, Dict, Optional, Tuple, Type
import typing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preferred idiom is to just use from typing import ... so can add cast to the below line (if needed)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Not even sure using cast is correct, it feels ugly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the typing, just import as indicated

@@ -18,7 +20,8 @@
str_type = str


def register_extension_dtype(cls):
def register_extension_dtype(cls: Type['ExtensionDtype'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment on referencing object instead of using a reference

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, pls change this

return self.name

def __str__(self):
@property
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What error was this causing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was an AttributeError, caused by the call to .name in def __unicode__

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do subclasses assign name as a property or a class variable? If not the former then doing it this way will cause typing issue in the subclasses

raise NotImplementedError("sub-classes should implement an __hash__ "
"method")

def __getstate__(self):
def __getstate__(self) -> Dict[str_type, Any]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to minimize the use of Any as much as possible. Can't the value here not be str as well? May need to add type to self._metadata to really make use of this

Copy link
Contributor

@TomAugspurger TomAugspurger May 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Dict[str, Union[Optional[Index], bool]] will do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__getstate__ is also used by subclasses, so the strings in _metadata that I find are needed are (with associated guess for types for attributes):

  • "categories": Optional[Index]
  • "ordered": Optional[bool]
  • "unit": str (+other?)
  • "tz": str, I think (+ others?)
  • "freq": Not sure ATM
  • "subtype": Not sure ATM (_typing.Dtype?)

If the type hints are important to be specific to the ExtensionsDtype, maybe subclass __getstate__ in order to set the correct type hints? (in different PR IMO)?

self._finalize(categories, ordered, fastpath=False)

@classmethod
def _from_fastpath(cls, categories=None, ordered=None):
def _from_fastpath(cls,
categories=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you annotate categories?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only imprecisely. The most similar would be a plain Iterable, but that would accept e.g. "categories='abc'", which would ideally flag as a wrong type in mypy, but that's not possible ATM.

I'm a bit apprehensive about accepting wrong types, but maybe that would be the correct compromise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

categories should be ListLike which is ArrayLike + list + tuple + Iterable (define it in _typing)

def _from_values_or_dtype(cls, values=None, categories=None, ordered=None,
dtype=None):
def _from_values_or_dtype(cls,
values=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you annotate values / categories?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above. The most precise here would again be Iterable, but same problems as above.


@property
def ordered(self):
def ordered(self) -> Optional[bool]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this optional? I thought ordered was always sure or values (categories are optional).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes:

In [1]: pd.CategoricalDtype()
Out[1]: CategoricalDtype(categories=None, ordered=None)

I think this has somwthing to to with updating dtypes (don't update None, but do update False).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grr, this isn't needed, and isn't how we document things. #26336

@@ -1,13 +1,15 @@
""" define extension dtypes """
import re
from typing import Any, Dict, Optional, Tuple, Type
import typing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the typing, just import as indicated

@@ -18,7 +20,8 @@
str_type = str


def register_extension_dtype(cls):
def register_extension_dtype(cls: Type['ExtensionDtype'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, pls change this

@@ -126,16 +131,20 @@ class PandasExtensionDtype(_DtypeOpsMixin):
isnative = 0
_cache = {} # type: Dict[str_type, 'PandasExtensionDtype']

def __unicode__(self):
def __unicode__(self) -> str_type:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what this is, str should work here

def __str__(self):
@property
def name(self) -> str_type:
raise AbstractMethodError(self)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you using str_type? just str

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think str_type came about in a previous PR because the class has a variable also called str which was conflicting with the builtin.

@gwrome

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is not a good idea at all. pls revert that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the other option that was proposed was to namespace it as builtins.str within the scope of the class - do you prefer that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The discussions about str getting shadowed were in #26029, for reference. I still prefer the explicit builtins.str approach.

self._finalize(categories, ordered, fastpath=False)

@classmethod
def _from_fastpath(cls, categories=None, ordered=None):
def _from_fastpath(cls,
categories=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

categories should be ListLike which is ArrayLike + list + tuple + Iterable (define it in _typing)

@jreback
Copy link
Contributor

jreback commented Jun 8, 2019

@topper-123 can you merge master

@@ -126,28 +130,28 @@ class PandasExtensionDtype(ExtensionDtype):
isnative = 0
_cache = {} # type: Dict[str_type, 'PandasExtensionDtype']

def __str__(self):
def __str__(self) -> str_type:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd on your how-to-type-annotate can you add when / why to use str_type

@jreback
Copy link
Contributor

jreback commented Jun 28, 2019

@WillAyd ok with merging this, though maybe should fix the import itself, you commented?) anything else?

@jreback jreback added this to the 0.25.0 milestone Jun 28, 2019
@WillAyd
Copy link
Member

WillAyd commented Jun 28, 2019

taking a look

@WillAyd
Copy link
Member

WillAyd commented Jun 28, 2019

Simplified a few things and removed the casts. OK to merge if green

@jreback jreback merged commit be4b48e into pandas-dev:master Jun 28, 2019
@jreback
Copy link
Contributor

jreback commented Jun 28, 2019

thanks @topper-123

I think we can deal with the .ordered annotation later, cc @jschendel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants