Skip to content

Commit

Permalink
Merge branch 'master' into Development
Browse files Browse the repository at this point in the history
  • Loading branch information
Varun270 committed Sep 12, 2021
2 parents a06149a + 5deec13 commit 4460d19
Show file tree
Hide file tree
Showing 19 changed files with 158 additions and 138 deletions.
1 change: 0 additions & 1 deletion ci/code_checks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,6 @@ fi
### DOCSTRINGS ###
if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then


MSG='Validate docstrings (GL03, GL04, GL05, GL06, GL07, GL09, GL10, SS01, SS02, SS04, SS05, PR03, PR04, PR05, PR06, PR10, EX04, RT01, RT04, RT05, SA02, SA03)' ; echo $MSG
$BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=GL03,GL04,GL05,GL06,GL07,GL09,GL10,SS02,SS04,SS05,PR03,PR04,PR05,PR10,EX04,RT01,RT04,RT05,SA02,SA03

Expand Down
21 changes: 4 additions & 17 deletions doc/source/whatsnew/v1.3.3.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. _whatsnew_133:

What's new in 1.3.3 (September ??, 2021)
What's new in 1.3.3 (September 12, 2021)
----------------------------------------

These are the changes in pandas 1.3.3. See :ref:`release` for a full changelog
Expand All @@ -15,7 +15,6 @@ including other versions of pandas.
Fixed regressions
~~~~~~~~~~~~~~~~~
- Fixed regression in :class:`DataFrame` constructor failing to broadcast for defined :class:`Index` and len one list of :class:`Timestamp` (:issue:`42810`)
- Performance regression in :meth:`core.window.ewm.ExponentialMovingWindow.mean` (:issue:`42333`)
- Fixed regression in :meth:`.GroupBy.agg` incorrectly raising in some cases (:issue:`42390`)
- Fixed regression in :meth:`.GroupBy.apply` where ``nan`` values were dropped even with ``dropna=False`` (:issue:`43205`)
- Fixed regression in :meth:`.GroupBy.quantile` which was failing with ``pandas.NA`` (:issue:`42849`)
Expand All @@ -29,8 +28,8 @@ Fixed regressions
- Fixed regression in :meth:`DataFrame.corr` where Kendall correlation would produce incorrect results for columns with repeated values (:issue:`43401`)
- Fixed regression in :meth:`DataFrame.groupby` where aggregation on columns with object types dropped results on those columns (:issue:`42395`, :issue:`43108`)
- Fixed regression in :meth:`Series.fillna` raising ``TypeError`` when filling ``float`` ``Series`` with list-like fill value having a dtype which couldn't cast lostlessly (like ``float32`` filled with ``float64``) (:issue:`43424`)
- Fixed regression in :func:`read_csv` throwing an ``AttributeError`` when the file handle is an ``tempfile.SpooledTemporaryFile`` object (:issue:`43439`)
-
- Fixed regression in :func:`read_csv` raising ``AttributeError`` when the file handle is an ``tempfile.SpooledTemporaryFile`` object (:issue:`43439`)
- Fixed performance regression in :meth:`core.window.ewm.ExponentialMovingWindow.mean` (:issue:`42333`)

.. ---------------------------------------------------------------------------
Expand All @@ -39,26 +38,14 @@ Fixed regressions
Performance improvements
~~~~~~~~~~~~~~~~~~~~~~~~
- Performance improvement for :meth:`DataFrame.__setitem__` when the key or value is not a :class:`DataFrame`, or key is not list-like (:issue:`43274`)
-
-

.. ---------------------------------------------------------------------------
.. _whatsnew_133.bug_fixes:

Bug fixes
~~~~~~~~~
- Bug in :meth:`.DataFrameGroupBy.agg` and :meth:`.DataFrameGroupBy.transform` with ``engine="numba"`` where ``index`` data was not being correctly passed into ``func`` (:issue:`43133`)
-

.. ---------------------------------------------------------------------------
.. _whatsnew_133.other:

Other
~~~~~
-
-
- Fixed bug in :meth:`.DataFrameGroupBy.agg` and :meth:`.DataFrameGroupBy.transform` with ``engine="numba"`` where ``index`` data was not being correctly passed into ``func`` (:issue:`43133`)

.. ---------------------------------------------------------------------------
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.4.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ Other enhancements
- :meth:`DataFrame.to_stata` and :meth:`StataWriter` now accept the keyword only argument ``value_labels`` to save labels for non-categorical columns
- Methods that relied on hashmap based algos such as :meth:`DataFrameGroupBy.value_counts`, :meth:`DataFrameGroupBy.count` and :func:`factorize` ignored imaginary component for complex numbers (:issue:`17927`)
- Add :meth:`Series.str.removeprefix` and :meth:`Series.str.removesuffix` introduced in Python 3.9 to remove pre-/suffixes from string-type :class:`Series` (:issue:`36944`)
- Attempting to write into a file in missing parent directory with :meth:`DataFrame.to_csv`, :meth:`DataFrame.to_html`, :meth:`DataFrame.to_excel`, :meth:`DataFrame.to_feather`, :meth:`DataFrame.to_parquet`, :meth:`DataFrame.to_stata`, :meth:`DataFrame.to_json`, :meth:`DataFrame.to_pickle`, and :meth:`DataFrame.to_xml` now explicitly mentions missing parent directory, the same is true for :class:`Series` counterparts (:issue:`24306`)


.. ---------------------------------------------------------------------------
Expand Down
5 changes: 0 additions & 5 deletions pandas/_libs/sparse_op_helper.pxi.in
Original file line number Diff line number Diff line change
Expand Up @@ -301,9 +301,4 @@ cpdef sparse_{{opname}}_{{dtype}}({{dtype}}_t[:] x,
else:
raise NotImplementedError


cpdef sparse_fill_{{opname}}_{{dtype}}({{dtype}}_t xfill,
{{dtype}}_t yfill):
return {{(opname, 'xfill', 'yfill', dtype) | get_op}}

{{endfor}}
18 changes: 0 additions & 18 deletions pandas/_libs/util.pxd
Original file line number Diff line number Diff line change
@@ -1,19 +1,8 @@
cimport numpy as cnp
from numpy cimport ndarray

from pandas._libs.tslibs.util cimport *


cdef extern from "numpy/ndarraytypes.h":
void PyArray_CLEARFLAGS(ndarray arr, int flags) nogil


cdef extern from "numpy/arrayobject.h":
enum:
NPY_ARRAY_C_CONTIGUOUS
NPY_ARRAY_F_CONTIGUOUS


cdef extern from "src/headers/stdint.h":
enum: UINT8_MAX
enum: UINT16_MAX
Expand Down Expand Up @@ -42,10 +31,3 @@ ctypedef fused numeric:

cnp.float32_t
cnp.float64_t


cdef inline void set_array_not_contiguous(ndarray ao) nogil:
# Numpy>=1.8-compliant equivalent to:
# ao->flags &= ~(NPY_ARRAY_C_CONTIGUOUS | NPY_ARRAY_F_CONTIGUOUS);
PyArray_CLEARFLAGS(ao,
(NPY_ARRAY_C_CONTIGUOUS | NPY_ARRAY_F_CONTIGUOUS))
9 changes: 2 additions & 7 deletions pandas/core/arrays/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -449,7 +449,7 @@ def __ne__(self, other: Any) -> ArrayLike: # type: ignore[override]

def to_numpy(
self,
dtype: Dtype | None = None,
dtype: npt.DTypeLike | None = None,
copy: bool = False,
na_value=lib.no_default,
) -> np.ndarray:
Expand Down Expand Up @@ -478,12 +478,7 @@ def to_numpy(
-------
numpy.ndarray
"""
# error: Argument "dtype" to "asarray" has incompatible type
# "Union[ExtensionDtype, str, dtype[Any], Type[str], Type[float], Type[int],
# Type[complex], Type[bool], Type[object], None]"; expected "Union[dtype[Any],
# None, type, _SupportsDType, str, Union[Tuple[Any, int], Tuple[Any, Union[int,
# Sequence[int]]], List[Any], _DTypeDict, Tuple[Any, Any]]]"
result = np.asarray(self, dtype=dtype) # type: ignore[arg-type]
result = np.asarray(self, dtype=dtype)
if copy or na_value is not lib.no_default:
result = result.copy()
if na_value is not lib.no_default:
Expand Down
7 changes: 2 additions & 5 deletions pandas/core/arrays/masked.py
Original file line number Diff line number Diff line change
Expand Up @@ -224,12 +224,9 @@ def __len__(self) -> int:
def __invert__(self: BaseMaskedArrayT) -> BaseMaskedArrayT:
return type(self)(~self._data, self._mask.copy())

# error: Argument 1 of "to_numpy" is incompatible with supertype "ExtensionArray";
# supertype defines the argument type as "Union[ExtensionDtype, str, dtype[Any],
# Type[str], Type[float], Type[int], Type[complex], Type[bool], Type[object], None]"
def to_numpy( # type: ignore[override]
def to_numpy(
self,
dtype: NpDtype | None = None,
dtype: npt.DTypeLike | None = None,
copy: bool = False,
na_value: Scalar = lib.no_default,
) -> np.ndarray:
Expand Down
8 changes: 3 additions & 5 deletions pandas/core/arrays/numpy_.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
Dtype,
NpDtype,
Scalar,
npt,
)
from pandas.compat.numpy import function as nv

Expand Down Expand Up @@ -365,12 +366,9 @@ def skew(
# ------------------------------------------------------------------------
# Additional Methods

# error: Argument 1 of "to_numpy" is incompatible with supertype "ExtensionArray";
# supertype defines the argument type as "Union[ExtensionDtype, str, dtype[Any],
# Type[str], Type[float], Type[int], Type[complex], Type[bool], Type[object], None]"
def to_numpy( # type: ignore[override]
def to_numpy(
self,
dtype: NpDtype | None = None,
dtype: npt.DTypeLike | None = None,
copy: bool = False,
na_value=lib.no_default,
) -> np.ndarray:
Expand Down
8 changes: 3 additions & 5 deletions pandas/core/arrays/string_arrow.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
Scalar,
ScalarIndexer,
SequenceIndexer,
npt,
)
from pandas.compat import (
pa_version_under1p0,
Expand Down Expand Up @@ -199,12 +200,9 @@ def __arrow_array__(self, type=None):
"""Convert myself to a pyarrow Array or ChunkedArray."""
return self._data

# error: Argument 1 of "to_numpy" is incompatible with supertype "ExtensionArray";
# supertype defines the argument type as "Union[ExtensionDtype, str, dtype[Any],
# Type[str], Type[float], Type[int], Type[complex], Type[bool], Type[object], None]"
def to_numpy( # type: ignore[override]
def to_numpy(
self,
dtype: NpDtype | None = None,
dtype: npt.DTypeLike | None = None,
copy: bool = False,
na_value=lib.no_default,
) -> np.ndarray:
Expand Down
10 changes: 1 addition & 9 deletions pandas/core/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -516,16 +516,8 @@ def to_numpy(
"""
if is_extension_array_dtype(self.dtype):
# error: Too many arguments for "to_numpy" of "ExtensionArray"

# error: Argument 1 to "to_numpy" of "ExtensionArray" has incompatible type
# "Optional[Union[dtype[Any], None, type, _SupportsDType[dtype[Any]], str,
# Union[Tuple[Any, int], Tuple[Any, Union[SupportsIndex,
# Sequence[SupportsIndex]]], List[Any], _DTypeDict, Tuple[Any, Any]]]]";
# expected "Optional[Union[ExtensionDtype, Union[str, dtype[Any]],
# Type[str], Type[float], Type[int], Type[complex], Type[bool],
# Type[object]]]"
return self.array.to_numpy( # type: ignore[call-arg]
dtype, copy=copy, na_value=na_value, **kwargs # type: ignore[arg-type]
dtype, copy=copy, na_value=na_value, **kwargs
)
elif kwargs:
bad_keys = list(kwargs.keys())[0]
Expand Down
41 changes: 4 additions & 37 deletions pandas/core/groupby/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -792,24 +792,6 @@ def count(self) -> Series:
)
return self._reindex_output(result, fill_value=0)

def pct_change(self, periods=1, fill_method="pad", limit=None, freq=None):
"""Calculate pct_change of each value to previous entry in group"""
# TODO: Remove this conditional when #23918 is fixed
if freq:
return self.apply(
lambda x: x.pct_change(
periods=periods, fill_method=fill_method, limit=limit, freq=freq
)
)
if fill_method is None: # GH30463
fill_method = "pad"
limit = 0
filled = getattr(self, fill_method)(limit=limit)
fill_grp = filled.groupby(self.grouper.codes)
shifted = fill_grp.shift(periods=periods, freq=freq)

return (filled / shifted) - 1

@doc(Series.nlargest)
def nlargest(self, n: int = 5, keep: str = "first"):
f = partial(Series.nlargest, n=n, keep=keep)
Expand Down Expand Up @@ -1086,14 +1068,10 @@ def _aggregate_item_by_item(self, func, *args, **kwargs) -> DataFrame:
# test_resample_apply_product

obj = self._obj_with_exclusions
result: dict[int | str, NDFrame] = {}
for i, item in enumerate(obj):
ser = obj.iloc[:, i]
colg = SeriesGroupBy(
ser, selection=item, grouper=self.grouper, exclusions=self.exclusions
)
result: dict[int, NDFrame] = {}

result[i] = colg.aggregate(func, *args, **kwargs)
for i, (item, sgb) in enumerate(self._iterate_column_groupbys(obj)):
result[i] = sgb.aggregate(func, *args, **kwargs)

res_df = self.obj._constructor(result)
res_df.columns = obj.columns
Expand Down Expand Up @@ -1168,11 +1146,7 @@ def _wrap_applied_output_series(
applied_index = self._selected_obj._get_axis(self.axis)
singular_series = len(values) == 1 and applied_index.nlevels == 1

# assign the name to this series
if singular_series:
keys = self.grouper.group_keys_seq
values[0].name = keys[0]

# GH2893
# we have series in the values array, we want to
# produce a series:
Expand Down Expand Up @@ -1372,14 +1346,7 @@ def _transform_item_by_item(self, obj: DataFrame, wrapper) -> DataFrame:
# gets here with non-unique columns
output = {}
inds = []
for i, col in enumerate(obj):
subset = obj.iloc[:, i]
sgb = SeriesGroupBy(
subset,
selection=col,
grouper=self.grouper,
exclusions=self.exclusions,
)
for i, (colname, sgb) in enumerate(self._iterate_column_groupbys(obj)):
try:
output[i] = sgb.transform(wrapper)
except TypeError:
Expand Down
4 changes: 3 additions & 1 deletion pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -1037,7 +1037,7 @@ def reset_identity(values):
if self.as_index:

# possible MI return case
group_keys = self.grouper.group_keys_seq
group_keys = self.grouper.result_index
group_levels = self.grouper.levels
group_names = self.grouper.names

Expand Down Expand Up @@ -3236,6 +3236,7 @@ def shift(self, periods=1, freq=None, axis=0, fill_value=None):
)
return res

@final
@Substitution(name="groupby")
@Appender(_common_see_also)
def pct_change(self, periods=1, fill_method="pad", limit=None, freq=None, axis=0):
Expand All @@ -3247,6 +3248,7 @@ def pct_change(self, periods=1, fill_method="pad", limit=None, freq=None, axis=0
Series or DataFrame
Percentage changes within each group.
"""
# TODO: Remove this conditional for SeriesGroupBy when GH#23918 is fixed
if freq is not None or axis != 0:
return self.apply(
lambda x: x.pct_change(
Expand Down
13 changes: 13 additions & 0 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,11 @@ def get_values(self, dtype: DtypeObj | None = None) -> np.ndarray:
# expected "ndarray")
return self.values # type: ignore[return-value]

def values_for_json(self) -> np.ndarray:
# Incompatible return value type (got "Union[ndarray[Any, Any],
# ExtensionArray]", expected "ndarray[Any, Any]")
return self.values # type: ignore[return-value]

@final
@cache_readonly
def fill_value(self):
Expand Down Expand Up @@ -1375,6 +1380,9 @@ def get_values(self, dtype: DtypeObj | None = None) -> np.ndarray:
# TODO(EA2D): reshape not needed with 2D EAs
return np.asarray(values).reshape(self.shape)

def values_for_json(self) -> np.ndarray:
return np.asarray(self.values)

def interpolate(
self, method="pad", axis=0, inplace=False, limit=None, fill_value=None, **kwargs
):
Expand Down Expand Up @@ -1805,6 +1813,11 @@ class DatetimeLikeBlock(NDArrayBackedExtensionBlock):
is_numeric = False
values: DatetimeArray | TimedeltaArray

def values_for_json(self) -> np.ndarray:
# special casing datetimetz to avoid conversion through
# object dtype
return self.values._ndarray


class DatetimeTZBlock(DatetimeLikeBlock):
"""implement a datetime64 block with a tz attribute"""
Expand Down
33 changes: 17 additions & 16 deletions pandas/core/internals/managers.py
Original file line number Diff line number Diff line change
Expand Up @@ -998,24 +998,25 @@ def column_arrays(self) -> list[np.ndarray]:
"""
Used in the JSON C code to access column arrays.
This optimizes compared to using `iget_values` by converting each
block.values to a np.ndarray only once up front
"""
# special casing datetimetz to avoid conversion through object dtype
arrays = [
blk.values._ndarray
if isinstance(blk, DatetimeTZBlock)
else np.asarray(blk.values)
for blk in self.blocks
]
result = []
for i in range(len(self.items)):
arr = arrays[self.blknos[i]]
if arr.ndim == 2:
values = arr[self.blklocs[i]]
# This is an optimized equivalent to
# result = [self.iget_values(i) for i in range(len(self.items))]
result: list[np.ndarray | None] = [None] * len(self.items)

for blk in self.blocks:
mgr_locs = blk._mgr_locs
values = blk.values_for_json()
if values.ndim == 1:
# TODO(EA2D): special casing not needed with 2D EAs
result[mgr_locs[0]] = values

else:
values = arr
result.append(values)
return result
for i, loc in enumerate(mgr_locs):
result[loc] = values[i]

# error: Incompatible return value type (got "List[None]",
# expected "List[ndarray[Any, Any]]")
return result # type: ignore[return-value]

def iset(self, loc: int | slice | np.ndarray, value: ArrayLike):
"""
Expand Down
Loading

0 comments on commit 4460d19

Please sign in to comment.