-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement libmissing; untangles _libs dependencies #18357
Conversation
Codecov Report
@@ Coverage Diff @@
## master #18357 +/- ##
==========================================
- Coverage 91.38% 91.36% -0.02%
==========================================
Files 164 164
Lines 49790 49791 +1
==========================================
- Hits 45501 45493 -8
- Misses 4289 4298 +9
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #18357 +/- ##
==========================================
- Coverage 91.35% 91.33% -0.02%
==========================================
Files 163 163
Lines 49714 49716 +2
==========================================
- Hits 45415 45408 -7
- Misses 4299 4308 +9
Continue to review full report at Codecov.
|
return get_timedelta64_value(val) == NPY_NAT | ||
elif util.is_array(val): | ||
return False | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason for not pulling in util._checknull
here as well, as seems logical? (or just future PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean defining it here instead of in util
? Or importing it into the namespace? I'd be +1 on the former, indifferent to the latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for sure should define it here. but then this puts missing as a dep of things like hashing.pyx. ok with it being a dep of any of the tslibs though.
(pandas) bash-3.2$ find pandas -name '*.pyx' | xargs grep _checknull
pandas/_libs/hashing.pyx:from util cimport _checknull
pandas/_libs/hashing.pyx: elif _checknull(val):
pandas/_libs/interval.pyx: if util._checknull(interval):
pandas/_libs/lib.pyx:from util cimport is_array, _checknull, _checknan
pandas/_libs/lib.pyx: return _checknull(val)
pandas/_libs/lib.pyx: return _checknull(val)
pandas/_libs/lib.pyx: result[i] = val is NaT or util._checknull_old(val)
pandas/_libs/lib.pyx: _checknull(x) and _checknull(y)):
pandas/_libs/lib.pyx: if _checknull(val):
pandas/_libs/lib.pyx: if _checknull(x):
pandas/_libs/lib.pyx: if _checknull(x):
pandas/_libs/lib.pyx: elif _checknull(y):
pandas/_libs/src/inference.pyx: if util._checknull(val):
pandas/_libs/src/inference.pyx: elif util._checknull(v):
pandas/_libs/src/inference.pyx: if util._checknull(v):
pandas/_libs/src/inference.pyx: if util._checknull(v):
pandas/_libs/src/inference.pyx: if util._checknull(v):
pandas/_libs/src/inference.pyx: if util._checknull(v):
pandas/_libs/src/inference.pyx: return util._checknull(value)
pandas/_libs/src/inference.pyx: bint is_generic_null = util._checknull(value)
pandas/_libs/tslib.pyx:from tslibs.nattype cimport _checknull_with_nat, NPY_NAT
pandas/_libs/tslib.pyx: if _checknull_with_nat(val):
pandas/_libs/tslib.pyx: if _checknull_with_nat(val):
pandas/_libs/tslib.pyx: if _checknull_with_nat(val):
pandas/_libs/tslib.pyx: if _checknull_with_nat(val):
pandas/_libs/tslib.pyx: if _checknull_with_nat(val):
pandas/_libs/tslib.pyx: if _checknull_with_nat(val):
pandas/_libs/tslibs/nattype.pyx:cdef inline bint _checknull_with_nat(object val):
pandas/_libs/tslibs/strptime.pyx:from nattype cimport _checknull_with_nat, NPY_NAT
pandas/_libs/tslibs/strptime.pyx: if _checknull_with_nat(val):
pandas/_libs/tslibs/timedeltas.pyx:from nattype cimport _checknull_with_nat, NPY_NAT
pandas/_libs/tslibs/timedeltas.pyx: if _checknull_with_nat(ts):
pandas/_libs/tslibs/timedeltas.pyx: if _checknull_with_nat(other):
pandas/_libs/tslibs/timedeltas.pyx: elif _checknull_with_nat(value):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're moving util._checknull anyway, I'd advocate renaming it to e.g. check_none_or_nan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for sure should define it here. but then this puts missing as a dep of things like hashing.pyx. ok with it being a dep of any of the tslibs though.
I'll take a look and see which util funcs can be moved without messing with dependencies.
FWIW this PR already adds missing to the 'pxdfiles` key of hashtable, which cimports missing.checknull. Previously it was an un-declared dependency on lib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like util._checknull_old can be moved to missing (is used there once, nowhere else). Let's saving util._checknull for later, since it is used in a bunch of places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok with leaving these for later as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Just pushed commit with docstrings.
pandas/_libs/missing.pyx
Outdated
@cython.wraparound(False) | ||
@cython.boundscheck(False) | ||
def isnaobj(ndarray arr): | ||
cdef: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally add some doc-strings
pandas/_libs/missing.pyx
Outdated
cdef int64_t NPY_NAT = util.get_nat() | ||
|
||
|
||
cdef inline bint is_null_datetimelike(v): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prob should rename this for consistency (checknull_datetimelike), can be TODO
@@ -533,22 +533,6 @@ cpdef object infer_datetimelike_array(object arr): | |||
return 'mixed' | |||
|
|||
|
|||
cdef inline bint is_null_datetimelike(v): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason these are not in nattype? (and maybe namespaces to missing, but actual definition in nattype)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be fine putting this in nattype. The one below I considered moving but decided against because it's only used once here in inference.pyx.
pandas/_libs/tslib.pyx
Outdated
@@ -830,24 +829,6 @@ class Timestamp(_Timestamp): | |||
# ---------------------------------------------------------------------- | |||
|
|||
|
|||
cdef inline bint _check_all_nulls(object val): | |||
""" utility to check if a value is any type of null """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, its used once here in _libs.missing.
pandas/tests/test_lib.py
Outdated
@@ -208,14 +208,14 @@ class TestNAObj(object): | |||
|
|||
def _check_behavior(self, arr, expected): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should move these tests to test_missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Existing test_missing in tests.dtypes or a new test_libmissing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think tests.dtypes is ok, maybe add test_libmissing at later point (orthogonal to this)
@@ -83,7 +83,7 @@ def rank_1d_{{dtype}}(object in_arr, ties_method='average', ascending=True, | |||
nan_value = {{neg_nan_value}} | |||
|
|||
{{if dtype == 'object'}} | |||
mask = lib.isnaobj(values) | |||
mask = missing.isnaobj(values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these could be cimports instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ATM these are not cdef
pandas/io/formats/format.py
Outdated
@@ -1860,7 +1860,7 @@ def _format_strings(self): | |||
(lambda x: pprint_thing(x, escape_chars=('\t', '\r', '\n')))) | |||
|
|||
def _format(x): | |||
if self.na_rep is not None and lib.checknull(x): | |||
if self.na_rep is not None and libmissing.checknull(x): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in followup (or here), change this to
is_scalar(x) and isna(x)
; this is reaching too much into the internals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and all below here
maybe worth defining a is_scalar_na
function
@@ -50,7 +50,7 @@ def isna(obj): | |||
|
|||
def _isna_new(obj): | |||
if is_scalar(obj): | |||
return lib.checknull(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see this here is ok, as this is really the python api for missing values.
pandas/io/formats/excel.py
Outdated
@@ -381,12 +381,12 @@ def __init__(self, df, na_rep='', float_format=None, cols=None, | |||
self.inf_rep = inf_rep | |||
|
|||
def _format_value(self, val): | |||
if lib.checknull(val): | |||
if libmissing.checknull(val): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use is_scalar and isna
pandas/io/formats/excel.py
Outdated
val = self.na_rep | ||
elif is_float(val): | ||
if lib.isposinf_scalar(val): | ||
if libmissing.isposinf_scalar(val): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isposinf_scalar should be exposed in core/dtypes/missing.py as a python function
_1d_methods = ['isnaobj', 'isnaobj_old'] | ||
_2d_methods = ['isnaobj2d', 'isnaobj2d_old'] | ||
|
||
def _check_behavior(self, arr, expected): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you can parametrize/use fixtures would be good
rebase |
pls rebase |
pandas/_libs/missing.pyx
Outdated
cimport util | ||
|
||
from tslibs.np_datetime cimport get_timedelta64_value, get_datetime64_value | ||
from tslibs.nattype import NaT, iNaT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of importing iNaT, just use NPY_NAT to avoid perf issues.
pandas/_libs/missing.pyx
Outdated
cdef int64_t NPY_NAT = util.get_nat() | ||
|
||
|
||
cdef inline bint is_null_datetimelike(v): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type as object
Just pushed with both these changes. Also:
|
setup.py
Outdated
@@ -516,6 +517,10 @@ def pxd(name): | |||
'_libs.lib': { | |||
'pyxfile': '_libs/lib', | |||
'depends': lib_depends + tseries_depends}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to add missing to lib as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do. At the moment we've basically only implemented the pxfiles "rule" for tslibs. Should we go through and fill them out for the others?
In case I haven't mentioned it recently: I'm really looking forward to using cythonize and not worrying about this.
thanks! |
not sure this started showing up on this PR, but pls add to the list
|
Looks like these are in the new timedelta_struct functions, likely leftovers from copy/paste. Easy to fix in-place, but I'd advocate not having them in src/ to begin with. Put them directly in tslibs.np_datetime (or tslibs.np_timedelta). |
In the status quo,
algos
,hashtable
,parsers
, andperiod
depend onlib
. On top of thatgroupby
,join
, andindex
depend onalgos
andintervaltree
depends onhashtable
. (In fairness, some of these dependency relationships are python imports, not cimports.)lib
depends ontslib
which depends on essentially all oftslibs
. So we've got a whole bunch of dependencies (many of which aren't declared in setup.py).But the
lib
functions needed byalgos
,hashtable
, andperiod
are basically all a small handful of null-checkingchecnull
,isnaobj
,isnaobj2d
which depend only onutil
andNaT
.So this PR takes those functions and puts them into
_libs.missing
, then updates the relevant imports.