-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add type hints to dtypes/dtypes.py (CategoricalDtype) #26327
Conversation
2c32937
to
01b3ab8
Compare
01b3ab8
to
69b4bb1
Compare
Codecov Report
@@ Coverage Diff @@
## master #26327 +/- ##
==========================================
- Coverage 92.04% 40.74% -51.3%
==========================================
Files 175 175
Lines 52297 52304 +7
==========================================
- Hits 48138 21313 -26825
- Misses 4159 30991 +26832
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #26327 +/- ##
=========================================
Coverage ? 92.04%
=========================================
Files ? 175
Lines ? 52304
Branches ? 0
=========================================
Hits ? 48142
Misses ? 4162
Partials ? 0
Continue to review full report at Codecov.
|
pandas/core/dtypes/dtypes.py
Outdated
""" | ||
Parameters | ||
---------- | ||
dtype : ExtensionDtype | ||
""" | ||
if not issubclass(dtype, (PandasExtensionDtype, ExtensionDtype)): | ||
if not issubclass(dtype, ExtensionDtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All sublcasses of PandasExtensionDtype are now also ExtensionDtype, so I think this check is not needed anymore.
@TomAugspurger, do you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds correct.
3580266
to
ea08c99
Compare
ea08c99
to
8be18a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! FYI this is the first PR to add annotations without fixing an existing type failure, so might take a few iterations to work out both the contribution and review side of things
pandas/core/dtypes/dtypes.py
Outdated
@@ -1,13 +1,15 @@ | |||
""" define extension dtypes """ | |||
import re | |||
from typing import Any, Dict, Optional, Tuple, Type | |||
import typing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The preferred idiom is to just use from typing import ...
so can add cast to the below line (if needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Not even sure using cast
is correct, it feels ugly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the typing, just import as indicated
pandas/core/dtypes/dtypes.py
Outdated
@@ -18,7 +20,8 @@ | |||
str_type = str | |||
|
|||
|
|||
def register_extension_dtype(cls): | |||
def register_extension_dtype(cls: Type['ExtensionDtype'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment on referencing object instead of using a reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, pls change this
pandas/core/dtypes/dtypes.py
Outdated
return self.name | ||
|
||
def __str__(self): | ||
@property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What error was this causing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was an AttributeError
, caused by the call to .name in def __unicode__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do subclasses assign name
as a property or a class variable? If not the former then doing it this way will cause typing issue in the subclasses
raise NotImplementedError("sub-classes should implement an __hash__ " | ||
"method") | ||
|
||
def __getstate__(self): | ||
def __getstate__(self) -> Dict[str_type, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would like to minimize the use of Any
as much as possible. Can't the value here not be str
as well? May need to add type to self._metadata
to really make use of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Dict[str, Union[Optional[Index], bool]]
will do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__getstate__
is also used by subclasses, so the strings in _metadata that I find are needed are (with associated guess for types for attributes):
- "categories": Optional[Index]
- "ordered": Optional[bool]
- "unit": str (+other?)
- "tz": str, I think (+ others?)
- "freq": Not sure ATM
- "subtype": Not sure ATM (
_typing.Dtype
?)
If the type hints are important to be specific to the ExtensionsDtype, maybe subclass __getstate__
in order to set the correct type hints? (in different PR IMO)?
self._finalize(categories, ordered, fastpath=False) | ||
|
||
@classmethod | ||
def _from_fastpath(cls, categories=None, ordered=None): | ||
def _from_fastpath(cls, | ||
categories=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you annotate categories?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only imprecisely. The most similar would be a plain Iterable
, but that would accept e.g. "categories='abc'", which would ideally flag as a wrong type in mypy, but that's not possible ATM.
I'm a bit apprehensive about accepting wrong types, but maybe that would be the correct compromise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
categories should be ListLike which is ArrayLike + list + tuple + Iterable (define it in _typing)
def _from_values_or_dtype(cls, values=None, categories=None, ordered=None, | ||
dtype=None): | ||
def _from_values_or_dtype(cls, | ||
values=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you annotate values / categories?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above. The most precise here would again be Iterable
, but same problems as above.
|
||
@property | ||
def ordered(self): | ||
def ordered(self) -> Optional[bool]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this optional? I thought ordered was always sure or values (categories are optional).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes:
In [1]: pd.CategoricalDtype()
Out[1]: CategoricalDtype(categories=None, ordered=None)
I think this has somwthing to to with updating dtypes (don't update None, but do update False).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grr, this isn't needed, and isn't how we document things. #26336
pandas/core/dtypes/dtypes.py
Outdated
@@ -1,13 +1,15 @@ | |||
""" define extension dtypes """ | |||
import re | |||
from typing import Any, Dict, Optional, Tuple, Type | |||
import typing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the typing, just import as indicated
pandas/core/dtypes/dtypes.py
Outdated
@@ -18,7 +20,8 @@ | |||
str_type = str | |||
|
|||
|
|||
def register_extension_dtype(cls): | |||
def register_extension_dtype(cls: Type['ExtensionDtype'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, pls change this
pandas/core/dtypes/dtypes.py
Outdated
@@ -126,16 +131,20 @@ class PandasExtensionDtype(_DtypeOpsMixin): | |||
isnative = 0 | |||
_cache = {} # type: Dict[str_type, 'PandasExtensionDtype'] | |||
|
|||
def __unicode__(self): | |||
def __unicode__(self) -> str_type: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what this is, str should work here
pandas/core/dtypes/dtypes.py
Outdated
def __str__(self): | ||
@property | ||
def name(self) -> str_type: | ||
raise AbstractMethodError(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you using str_type? just str
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think str_type
came about in a previous PR because the class has a variable also called str
which was conflicting with the builtin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is not a good idea at all. pls revert that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the other option that was proposed was to namespace it as builtins.str within the scope of the class - do you prefer that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The discussions about str getting shadowed were in #26029, for reference. I still prefer the explicit builtins.str approach.
self._finalize(categories, ordered, fastpath=False) | ||
|
||
@classmethod | ||
def _from_fastpath(cls, categories=None, ordered=None): | ||
def _from_fastpath(cls, | ||
categories=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
categories should be ListLike which is ArrayLike + list + tuple + Iterable (define it in _typing)
@topper-123 can you merge master |
@@ -126,28 +130,28 @@ class PandasExtensionDtype(ExtensionDtype): | |||
isnative = 0 | |||
_cache = {} # type: Dict[str_type, 'PandasExtensionDtype'] | |||
|
|||
def __str__(self): | |||
def __str__(self) -> str_type: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WillAyd on your how-to-type-annotate can you add when / why to use str_type
@WillAyd ok with merging this, though maybe should fix the import itself, you commented?) anything else? |
taking a look |
Simplified a few things and removed the casts. OK to merge if green |
thanks @topper-123 I think we can deal with the .ordered annotation later, cc @jschendel |
This PR adds type hints to pandas/core/dtypes/dtypes.py down to and including the
CategoricalDtype
part of the module. I'm more familiar with Categoricals than DateTimeArrays etc. and because I think there are some difficulties in implementing type hints, I think it would make sense to add a general ruleset for adding type hints to pandas, before too much work is put into this.I'm putting some thoughts up for discussion in #25601 about how we add type hints to pandas, that probably should be discussed before this goes further.