From 0ea35483d22fabb1f73a75d5dafb2ac7d655250c Mon Sep 17 00:00:00 2001 From: pilkibun Date: Tue, 18 Jun 2019 09:36:30 +0300 Subject: [PATCH 01/31] DOC/EA: developer docs for implementing Series.round/sum/etc in EA --- doc/source/development/extending.rst | 170 +++++++++++++++++++++++++++ 1 file changed, 170 insertions(+) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index 8bee0452c2207..398f5dc811aad 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -208,6 +208,176 @@ will 2. call ``result = op(values, ExtensionArray)`` 3. re-box the result in a ``Series`` +:class:`~pandas.api.extensions.ExtensionArray` Series Operations Support +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. versionadded:: 0.25.0 + +In addition to operators like `__mul__` and `__add__`, the pandas Series +namespace provides a long list of useful operations such as :meth:`Series.round`, +:meth:`Series.sum`, :meth:`Series.abs`, etc'. Some of these are handled by +pandas own algorithm implementations (via a dispatch function), while others +simply call an equivalent numpy function with data from the underlying array. +In order to support this operations in a new ExtensionArray, you must provide +an implementation for them. + +.. note:: + + There is a third category of operation which live on the `pandas` + namespace, for example `:meth:pd.concat`. There is an equivalent + numpy function `:meth:np.concatenate`, but this is not used. n + general, these function should just work with your EA, you do not + need to impelment more than the general EA interface. + +As of 0.25.0, the list of series operations which pandas' provides its own +implementations are: :meth:`Series.any`, :meth:`Series.all`, +:meth:`Series.min`, :meth:`Series.max`, :meth:`Series.sum`, +:meth:`Series.mean`, :meth:`Series.median`, :meth:`Series.prod` +(and its alias :meth:`Series.product`), :meth:`Series.std`, +:meth:`Series.var`, :meth:`Series.sem`, +:meth:`Series.kurt`, and :meth:`Series.skew`. + +In order to implement any of this functions, your ExtensionArray must include +an Implementation of :meth:`ExtensionArray._reduce`. Once you provide an +implementation of :meth:`ExtensionArray._reduce` which handles a particular +method, calling the method on the Series will invoke the implementation +on your ExtensionArray. All these methods are reduction functions, and +so are expected to return a scalr value of some type. However it is perfectly +acceptable to return some instance of an :class:`pandas.api.extensions.ExtensionDtype`. + +Series operations which are not handled by :meth:`ExtensionArray._reduce`, +such as :meth:`Series.round`, will generally invoke an equivalent numpy +function with your extension array as the argument. Pandas only guarantees +that your array will be passed to a numpy function, it does not dictate +how your ExtensionArray should interact with numpy's dispatch logic +in order to achieve its goal, since there are several alternative ways +of achieving similar results. + +However, the details of numpy's dispatch logic are not entirely simple, and +there nuances which you should be aware of. For that reason, and in order to +make it easier to create new pandas extensions, we will now cover some of +possible approaches of dealing with numpy. + +The first alternative, and the simplest, is to simply provide an `__array__` +method for your ExtensionArray. This is a standard numpy function documented +here (TBD), which must return a numpy array equivalent of your ExtensionArray. +This will usually be an array whose dtype is `object` and whose values are +instances of some class which your ExtensionArray wraps into an array. For +example, the pandas tests include an ExtensionArray example called +`DecimalArray`, if it used this method, its `__array__` method would return an +ndarray of `decimal.Decimal` objects. + +Implementing `__array__` is easy, but it usually isn't satisfactory because +it means most Series operations will return a Series of object dtype, instead +of maintaining your ExtensionArray's dtype. + +The second approach is more involved, but it does a proper job of maintaining +the ExtensionArray's dtype through operations. It requires a detailed +understanding of how numpy functions operate on non ndarray objects. + +Just as pandas handles some operation via :meth:`ExtensionArray._reduce` +and others by delegating to numpy, numpy makes a distinction between +between two types of opersions: ufuncs (such as `np.floor`, `np.ceil`, +and `np.abs`), and non-ufuncs (for example `np.round`, and `np.repeat`). + +We will deal with ufuncs first. You can find a list of numpy's ufuncs here +(TBD). In order to support numpy ufuncs, a convenient approach is to implement +numpy's `__array_ufunc__` interface, specified in +[NEP13](https://www.numpy.org/neps/nep-0013-ufunc-overrides.html). In brief, +if your ExtensionArray implements a compliant `__array_ufunc__` interface, +when a numpy ufunc such as `np.floor` is invoked on your array, its +implementation of `__array_ufunc__` will bec called first and given the +opportunity to compute the function. The return value needn't be a numpy +ndarray (though it can be). In general, you want the return value to be an +instance of your ExtensionArray. In some cases, your implementation can +calculate the result itself (see for example TBD), or, if your ExtensionArray +already has a numeric ndarray backing it, your implementation will itself +invoke the numpy ufunc itself on it (see for example TBD). In either case, +after computing the values, you will usually want to wrap the result as a new +ExtensionArray instance and return it to the caller. Pandas will automatically +use that Array as the backing ExtensionArray for a new Series object. + +.. note:: + Before [NEP13](https://www.numpy.org/neps/nep-0013-ufunc-overrides.html), + numpy already provides a way of wrapping ufunc functions via `__array_prepare__` + and `__array_wrap__`, as documented in the Numpy docuemntation section + ["Subclassing Numpy"](http://docs.python.org/doc/numpy/user/basics.subclassing.html). + However, NEP13 seems to have largely superceded that mechanism. + + +With ufuncs out of the way, we turn to the remaining numpy operations, such +as `np.round`. The simplest way to support these operations is to simply +implement a compatible method on your ExtensionArray. For example, if your +ExtensionArray has a compatible `round` method on your ExtensionArray, +When python involes `ser.round()`, :meth:``Series.round` will invoke +`np.round(self.array)`, which will pass your ExtensionArray to the `np.round` +method. Numpy will detect that your EA implements a compatible `round` +and will invoke it to perform the operation. As in the ufunc case, +your implemntation will generally perform the calculaion itself, +or call numpy on its own acking numeric array, and in either case +will wrap the result as a new instance of ExtensionArray and return that +as a result. It is usually possible to write generic code to handle +most ufuncs without having to provide a special case for each. For an example, see TBD. + +.. important:: + + When providing implementations of numpy functions such as `np.round`, + It essential that function signature is compatible with the numpy original. + Otherwise,, numpy will ignore it. + + For example, the signature for `np.round` is `np.round(a, decimals=0, out=None)`. + if you implement a round function which omits the `out` keyword: + +.. code-block:: python + + def round(self, decimals=0): + pass + + + numpy will ignore it. The following will work however: + +.. code-block:: python + + def round(self, decimals=0, **kwds): + pass + + +An alternative to providing individual functions, is to use the `__array_function__` +mechanism introduced by [NEP18](https://www.numpy.org/neps/nep-0018-array-function-protocol.html). +This is an opt-in mechanism in numpy 1.16 (by setting an environment variable), and +is enabled by default starting with numpy 1.17. As of 1.17 it is still considered +experimental, and its design is actively being revised. We will not discuss it further +here, but it is certainly possible to make use of it to achieve the same goal. Your +mileage may vary. + +.. important:: + Implementing `__array_function__` is not a substitute for implementing `__array_ufunc__`. + The `__array_function__` mechanism complements (and to a degree copies) the`__array_ufunc__` + mechanism, by providing the same flexibility for non-ufuncs. + +.. important:: + `__array_function__` is an "all-in" solution. That means that if you cannot mix it with + explicit implementations for some methods and using `__array_function__` for some. + If you both `__array_function__` and also provide an implementation of `round`, numpy + will invoke `__array_function__` for all the operations in the specification, **including** + `round`. + +With this overview in hand, you hopefully have the necessary information in order +to develop rich, full-featured ExtensionArrays that seamlessly plug in to pandas. + +.. important:: + +The above description currently leads the state of the code considerably. Many Series +methods need to be updated to conform to this model of EA support. If you find a +bug, or something else which does not behave as described, please report it to +the pandas team by opening an issue. + + +Formatting Extension Arrays +^^^^^^^^^^^^^^^^^^^^^^^^ + +TBD + .. _extending.extension.testing: Testing Extension Arrays From d137a1018a7c5679fa37fa7faa1671626a935c3d Mon Sep 17 00:00:00 2001 From: pilkibun Date: Tue, 18 Jun 2019 10:24:16 +0300 Subject: [PATCH 02/31] DOC: describe one more approach --- doc/source/development/extending.rst | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index 398f5dc811aad..56e162cc0d36e 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -341,8 +341,17 @@ most ufuncs without having to provide a special case for each. For an example, s def round(self, decimals=0, **kwds): pass +An alternative approach to implementing individual functions, is to override +`__getattr__` in your ExtensionArray, and to intercept requests for method +names which you wish to support (such as `round`). For most functions, +you can return a dynamiclly generated function, which simply calls +the numpy function on your existing backing numeric array, wraps +the result in your ExtensionArray, and returns it. This approach can +reduce boilerplate significantly, but you do have to maintain a whitelist, +and may require more than one case, based on signature. -An alternative to providing individual functions, is to use the `__array_function__` + +A third possible approach, is to use the `__array_function__` mechanism introduced by [NEP18](https://www.numpy.org/neps/nep-0018-array-function-protocol.html). This is an opt-in mechanism in numpy 1.16 (by setting an environment variable), and is enabled by default starting with numpy 1.17. As of 1.17 it is still considered From 92d4b9bf6fb83fbd9162012e0f5d86c3ee83cb64 Mon Sep 17 00:00:00 2001 From: pilkibun Date: Tue, 18 Jun 2019 10:27:00 +0300 Subject: [PATCH 03/31] DOC: add note about incremental implementation --- doc/source/development/extending.rst | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index 56e162cc0d36e..0dd4ef4ad58d3 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -374,6 +374,15 @@ mileage may vary. With this overview in hand, you hopefully have the necessary information in order to develop rich, full-featured ExtensionArrays that seamlessly plug in to pandas. +.. important:: + You are not required to provide implementations for the full complemnt of Series + operations in your ExtensionArray. In fact, some of them may not even make sense + within your context. You amay also choose to ass implementations incrementally, + as the need arised. + + TBD: should we have a standard way of signalling not supported instead of a + random AttributeError exception being thrown. + .. important:: The above description currently leads the state of the code considerably. Many Series From d1bf105436aa8e4187b791cd4d7fac8d9e350af1 Mon Sep 17 00:00:00 2001 From: pilkibun Date: Wed, 19 Jun 2019 05:29:36 +0300 Subject: [PATCH 04/31] Reference the right class --- doc/source/development/extending.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index 0dd4ef4ad58d3..41d512e4a793b 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -243,7 +243,7 @@ implementation of :meth:`ExtensionArray._reduce` which handles a particular method, calling the method on the Series will invoke the implementation on your ExtensionArray. All these methods are reduction functions, and so are expected to return a scalr value of some type. However it is perfectly -acceptable to return some instance of an :class:`pandas.api.extensions.ExtensionDtype`. +acceptable to return some instance of an :class:`pandas.api.extensions.ExtensionArray`. Series operations which are not handled by :meth:`ExtensionArray._reduce`, such as :meth:`Series.round`, will generally invoke an equivalent numpy From 81cd3cfffbffe3045773bc9fa606197628931fad Mon Sep 17 00:00:00 2001 From: pilkibun Date: Wed, 19 Jun 2019 09:42:26 +0300 Subject: [PATCH 05/31] Fix review comment --- doc/source/development/extending.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index 41d512e4a793b..cf8c3148d952d 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -224,10 +224,10 @@ an implementation for them. .. note:: There is a third category of operation which live on the `pandas` - namespace, for example `:meth:pd.concat`. There is an equivalent - numpy function `:meth:np.concatenate`, but this is not used. n - general, these function should just work with your EA, you do not - need to impelment more than the general EA interface. + namespace, for example `:meth:pd.concat`. There is an equivalent numpy + function `:meth:np.concatenate`, but it is not called by the pandas + method. In general, these function should just work with your EA, you do + not need to impelment more than the general EA interface. As of 0.25.0, the list of series operations which pandas' provides its own implementations are: :meth:`Series.any`, :meth:`Series.all`, From cb3fd56f23aa260899fa6ad7ba2468fca5c2e047 Mon Sep 17 00:00:00 2001 From: pilkibun Date: Wed, 19 Jun 2019 09:43:30 +0300 Subject: [PATCH 06/31] Fix review comment --- doc/source/development/extending.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index cf8c3148d952d..c1e2c7409e242 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -227,10 +227,10 @@ an implementation for them. namespace, for example `:meth:pd.concat`. There is an equivalent numpy function `:meth:np.concatenate`, but it is not called by the pandas method. In general, these function should just work with your EA, you do - not need to impelment more than the general EA interface. + not need to implement more than the general EA interface. -As of 0.25.0, the list of series operations which pandas' provides its own -implementations are: :meth:`Series.any`, :meth:`Series.all`, +As of 0.25.0, pandas provides its own implementations for the following +operations: :meth:`Series.any`, :meth:`Series.all`, :meth:`Series.min`, :meth:`Series.max`, :meth:`Series.sum`, :meth:`Series.mean`, :meth:`Series.median`, :meth:`Series.prod` (and its alias :meth:`Series.product`), :meth:`Series.std`, From c90c5978b62a917ac5b52daa7af795a6c5a9d212 Mon Sep 17 00:00:00 2001 From: pilkibun Date: Wed, 19 Jun 2019 09:43:49 +0300 Subject: [PATCH 07/31] Fix review comment --- doc/source/development/extending.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index c1e2c7409e242..d57a91065f541 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -237,7 +237,7 @@ operations: :meth:`Series.any`, :meth:`Series.all`, :meth:`Series.var`, :meth:`Series.sem`, :meth:`Series.kurt`, and :meth:`Series.skew`. -In order to implement any of this functions, your ExtensionArray must include +In order to implement any of these functions, your ExtensionArray must include an Implementation of :meth:`ExtensionArray._reduce`. Once you provide an implementation of :meth:`ExtensionArray._reduce` which handles a particular method, calling the method on the Series will invoke the implementation From ee3ad20918dea2434cc7f6745ab952dffe422f8b Mon Sep 17 00:00:00 2001 From: pilkibun Date: Wed, 19 Jun 2019 09:44:08 +0300 Subject: [PATCH 08/31] Fix review comment --- doc/source/development/extending.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index d57a91065f541..9f141d23ddff4 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -238,7 +238,7 @@ operations: :meth:`Series.any`, :meth:`Series.all`, :meth:`Series.kurt`, and :meth:`Series.skew`. In order to implement any of these functions, your ExtensionArray must include -an Implementation of :meth:`ExtensionArray._reduce`. Once you provide an +an implementation of :meth:`ExtensionArray._reduce`. Once you provide an implementation of :meth:`ExtensionArray._reduce` which handles a particular method, calling the method on the Series will invoke the implementation on your ExtensionArray. All these methods are reduction functions, and From 901579e832f8a8514a9f38289c3fc055aef69351 Mon Sep 17 00:00:00 2001 From: pilkibun Date: Wed, 19 Jun 2019 09:46:52 +0300 Subject: [PATCH 09/31] Fix more typos --- doc/source/development/extending.rst | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index 9f141d23ddff4..e94f9f0e3984c 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -277,7 +277,7 @@ understanding of how numpy functions operate on non ndarray objects. Just as pandas handles some operation via :meth:`ExtensionArray._reduce` and others by delegating to numpy, numpy makes a distinction between -between two types of opersions: ufuncs (such as `np.floor`, `np.ceil`, +between two types of operations: ufuncs (such as `np.floor`, `np.ceil`, and `np.abs`), and non-ufuncs (for example `np.round`, and `np.repeat`). We will deal with ufuncs first. You can find a list of numpy's ufuncs here @@ -300,21 +300,21 @@ use that Array as the backing ExtensionArray for a new Series object. .. note:: Before [NEP13](https://www.numpy.org/neps/nep-0013-ufunc-overrides.html), numpy already provides a way of wrapping ufunc functions via `__array_prepare__` - and `__array_wrap__`, as documented in the Numpy docuemntation section + and `__array_wrap__`, as documented in the Numpy documentation section ["Subclassing Numpy"](http://docs.python.org/doc/numpy/user/basics.subclassing.html). - However, NEP13 seems to have largely superceded that mechanism. + However, NEP13 seems to have largely superseded that mechanism. With ufuncs out of the way, we turn to the remaining numpy operations, such as `np.round`. The simplest way to support these operations is to simply implement a compatible method on your ExtensionArray. For example, if your ExtensionArray has a compatible `round` method on your ExtensionArray, -When python involes `ser.round()`, :meth:``Series.round` will invoke +When python involves `ser.round()`, :meth:``Series.round` will invoke `np.round(self.array)`, which will pass your ExtensionArray to the `np.round` method. Numpy will detect that your EA implements a compatible `round` and will invoke it to perform the operation. As in the ufunc case, -your implemntation will generally perform the calculaion itself, -or call numpy on its own acking numeric array, and in either case +your implementation will generally perform the calculation itself, +or call numpy on its own backing numeric array, and in either case will wrap the result as a new instance of ExtensionArray and return that as a result. It is usually possible to write generic code to handle most ufuncs without having to provide a special case for each. For an example, see TBD. @@ -344,7 +344,7 @@ most ufuncs without having to provide a special case for each. For an example, s An alternative approach to implementing individual functions, is to override `__getattr__` in your ExtensionArray, and to intercept requests for method names which you wish to support (such as `round`). For most functions, -you can return a dynamiclly generated function, which simply calls +you can return a dynamically generated function, which simply calls the numpy function on your existing backing numeric array, wraps the result in your ExtensionArray, and returns it. This approach can reduce boilerplate significantly, but you do have to maintain a whitelist, @@ -375,10 +375,10 @@ With this overview in hand, you hopefully have the necessary information in orde to develop rich, full-featured ExtensionArrays that seamlessly plug in to pandas. .. important:: - You are not required to provide implementations for the full complemnt of Series + You are not required to provide implementations for the full complement of Series operations in your ExtensionArray. In fact, some of them may not even make sense - within your context. You amay also choose to ass implementations incrementally, - as the need arised. + within your context. You may also choose to ass implementations incrementally, + as the need arises. TBD: should we have a standard way of signalling not supported instead of a random AttributeError exception being thrown. @@ -402,7 +402,7 @@ Testing Extension Arrays ^^^^^^^^^^^^^^^^^^^^^^^^ We provide a test suite for ensuring that your extension arrays satisfy the expected -behavior. To use the test suite, you must provide several pytest fixtures and inherit +behaviour. To use the test suite, you must provide several pytest fixtures and inherit from the base test class. The required fixtures are found in https://github.com/pandas-dev/pandas/blob/master/pandas/tests/extension/conftest.py. From 1e253c7be639aa83b7bd55b0217fc708899e21da Mon Sep 17 00:00:00 2001 From: pilkibun Date: Wed, 19 Jun 2019 19:47:13 +0300 Subject: [PATCH 10/31] Remove redundant note --- doc/source/development/extending.rst | 8 -------- 1 file changed, 8 deletions(-) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index e94f9f0e3984c..a899641351d91 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -297,14 +297,6 @@ after computing the values, you will usually want to wrap the result as a new ExtensionArray instance and return it to the caller. Pandas will automatically use that Array as the backing ExtensionArray for a new Series object. -.. note:: - Before [NEP13](https://www.numpy.org/neps/nep-0013-ufunc-overrides.html), - numpy already provides a way of wrapping ufunc functions via `__array_prepare__` - and `__array_wrap__`, as documented in the Numpy documentation section - ["Subclassing Numpy"](http://docs.python.org/doc/numpy/user/basics.subclassing.html). - However, NEP13 seems to have largely superseded that mechanism. - - With ufuncs out of the way, we turn to the remaining numpy operations, such as `np.round`. The simplest way to support these operations is to simply implement a compatible method on your ExtensionArray. For example, if your From 696b320ada06a8324ef767252ad3d51a4d3ad3b9 Mon Sep 17 00:00:00 2001 From: pilkibun Date: Wed, 19 Jun 2019 19:47:54 +0300 Subject: [PATCH 11/31] Remove redundant note --- doc/source/development/extending.rst | 8 -------- 1 file changed, 8 deletions(-) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index a899641351d91..8224406a085cc 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -221,14 +221,6 @@ simply call an equivalent numpy function with data from the underlying array. In order to support this operations in a new ExtensionArray, you must provide an implementation for them. -.. note:: - - There is a third category of operation which live on the `pandas` - namespace, for example `:meth:pd.concat`. There is an equivalent numpy - function `:meth:np.concatenate`, but it is not called by the pandas - method. In general, these function should just work with your EA, you do - not need to implement more than the general EA interface. - As of 0.25.0, pandas provides its own implementations for the following operations: :meth:`Series.any`, :meth:`Series.all`, :meth:`Series.min`, :meth:`Series.max`, :meth:`Series.sum`, From e70dbb9b17ad07f7350b3122d146db226e396cbb Mon Sep 17 00:00:00 2001 From: pilkibun Date: Wed, 19 Jun 2019 19:58:43 +0300 Subject: [PATCH 12/31] snip --- doc/source/development/extending.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index 8224406a085cc..4dda098da7409 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -231,11 +231,11 @@ operations: :meth:`Series.any`, :meth:`Series.all`, In order to implement any of these functions, your ExtensionArray must include an implementation of :meth:`ExtensionArray._reduce`. Once you provide an -implementation of :meth:`ExtensionArray._reduce` which handles a particular -method, calling the method on the Series will invoke the implementation -on your ExtensionArray. All these methods are reduction functions, and -so are expected to return a scalr value of some type. However it is perfectly -acceptable to return some instance of an :class:`pandas.api.extensions.ExtensionArray`. +implementation of :meth:`ExtensionArray._reduce`, calling the method on the +Series will invoke the implementation on your ExtensionArray. All these +methods are reduction functions, and so are expected to return a scalr value +of some type. However it is perfectly acceptable to return some instance of an +:class:`pandas.api.extensions.ExtensionArray`. Series operations which are not handled by :meth:`ExtensionArray._reduce`, such as :meth:`Series.round`, will generally invoke an equivalent numpy From ab418c1c0fbb1ee7efd08af737d164c99d8d59bb Mon Sep 17 00:00:00 2001 From: pilkibun Date: Wed, 19 Jun 2019 20:07:30 +0300 Subject: [PATCH 13/31] typo --- doc/source/development/extending.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index 4dda098da7409..9fb0078c87739 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -233,7 +233,7 @@ In order to implement any of these functions, your ExtensionArray must include an implementation of :meth:`ExtensionArray._reduce`. Once you provide an implementation of :meth:`ExtensionArray._reduce`, calling the method on the Series will invoke the implementation on your ExtensionArray. All these -methods are reduction functions, and so are expected to return a scalr value +methods are reduction functions, and so are expected to return a scalar value of some type. However it is perfectly acceptable to return some instance of an :class:`pandas.api.extensions.ExtensionArray`. From d5db7e6df9595bbb2a11b2f6964d019321dcc919 Mon Sep 17 00:00:00 2001 From: pilkibun Date: Wed, 19 Jun 2019 20:07:33 +0300 Subject: [PATCH 14/31] snip --- doc/source/development/extending.rst | 5 ----- 1 file changed, 5 deletions(-) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index 9fb0078c87739..377c1f31d9fac 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -245,11 +245,6 @@ how your ExtensionArray should interact with numpy's dispatch logic in order to achieve its goal, since there are several alternative ways of achieving similar results. -However, the details of numpy's dispatch logic are not entirely simple, and -there nuances which you should be aware of. For that reason, and in order to -make it easier to create new pandas extensions, we will now cover some of -possible approaches of dealing with numpy. - The first alternative, and the simplest, is to simply provide an `__array__` method for your ExtensionArray. This is a standard numpy function documented here (TBD), which must return a numpy array equivalent of your ExtensionArray. From d7ebacf0ba17a7915681d1bf02a4613b978612fc Mon Sep 17 00:00:00 2001 From: pilkibun Date: Wed, 19 Jun 2019 20:21:19 +0300 Subject: [PATCH 15/31] rephrase --- doc/source/development/extending.rst | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index 377c1f31d9fac..8959fa0e8ec3c 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -245,18 +245,12 @@ how your ExtensionArray should interact with numpy's dispatch logic in order to achieve its goal, since there are several alternative ways of achieving similar results. -The first alternative, and the simplest, is to simply provide an `__array__` -method for your ExtensionArray. This is a standard numpy function documented -here (TBD), which must return a numpy array equivalent of your ExtensionArray. -This will usually be an array whose dtype is `object` and whose values are -instances of some class which your ExtensionArray wraps into an array. For -example, the pandas tests include an ExtensionArray example called -`DecimalArray`, if it used this method, its `__array__` method would return an -ndarray of `decimal.Decimal` objects. - -Implementing `__array__` is easy, but it usually isn't satisfactory because -it means most Series operations will return a Series of object dtype, instead -of maintaining your ExtensionArray's dtype. +For the most basic support, the default implemntation of :meth:`ExtensionArray.__array__` +will transperantly convert your EA to a numpy object array. You can also +override it to return any numpy array which suits your case. However, +this solution usually falls short, becase any series methods you then +use casts your EA into an object ndarray, while you usually want the +result to remain an instance of your EA. The second approach is more involved, but it does a proper job of maintaining the ExtensionArray's dtype through operations. It requires a detailed From 0fec603f843f80537f6f6b45d78f62f8cba372e0 Mon Sep 17 00:00:00 2001 From: pilkibun Date: Wed, 19 Jun 2019 20:26:32 +0300 Subject: [PATCH 16/31] snip --- doc/source/development/extending.rst | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index 8959fa0e8ec3c..7dd89af951e75 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -329,8 +329,7 @@ mechanism introduced by [NEP18](https://www.numpy.org/neps/nep-0018-array-functi This is an opt-in mechanism in numpy 1.16 (by setting an environment variable), and is enabled by default starting with numpy 1.17. As of 1.17 it is still considered experimental, and its design is actively being revised. We will not discuss it further -here, but it is certainly possible to make use of it to achieve the same goal. Your -mileage may vary. +here, but it is certainly possible to make use of it to achieve the same goal. .. important:: Implementing `__array_function__` is not a substitute for implementing `__array_ufunc__`. From 54b78225939c5809ab33b50539c1861d6df1e4f6 Mon Sep 17 00:00:00 2001 From: pilkibun Date: Wed, 19 Jun 2019 20:26:59 +0300 Subject: [PATCH 17/31] typo --- doc/source/development/extending.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index 7dd89af951e75..9c78288ef793f 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -349,7 +349,7 @@ to develop rich, full-featured ExtensionArrays that seamlessly plug in to pandas .. important:: You are not required to provide implementations for the full complement of Series operations in your ExtensionArray. In fact, some of them may not even make sense - within your context. You may also choose to ass implementations incrementally, + within your context. You may also choose to add implementations incrementally, as the need arises. TBD: should we have a standard way of signalling not supported instead of a From d70ffec5672337071fcd8d30f0722aef80720c57 Mon Sep 17 00:00:00 2001 From: pilkibun Date: Wed, 19 Jun 2019 20:29:57 +0300 Subject: [PATCH 18/31] Rearrange --- doc/source/development/extending.rst | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index 9c78288ef793f..c54f2c86b43f0 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -345,6 +345,8 @@ here, but it is certainly possible to make use of it to achieve the same goal. With this overview in hand, you hopefully have the necessary information in order to develop rich, full-featured ExtensionArrays that seamlessly plug in to pandas. +EA support is still being actively worked on, so if you encounter a bug, or behaviour +which does not behave as described, please report it to the team. .. important:: You are not required to provide implementations for the full complement of Series @@ -355,13 +357,6 @@ to develop rich, full-featured ExtensionArrays that seamlessly plug in to pandas TBD: should we have a standard way of signalling not supported instead of a random AttributeError exception being thrown. -.. important:: - -The above description currently leads the state of the code considerably. Many Series -methods need to be updated to conform to this model of EA support. If you find a -bug, or something else which does not behave as described, please report it to -the pandas team by opening an issue. - Formatting Extension Arrays ^^^^^^^^^^^^^^^^^^^^^^^^ From 12962679e5beb611513038975039c48a5f7f3974 Mon Sep 17 00:00:00 2001 From: pilkibun Date: Wed, 19 Jun 2019 20:37:42 +0300 Subject: [PATCH 19/31] typo --- doc/source/development/extending.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index c54f2c86b43f0..9ad333f50c04e 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -267,7 +267,7 @@ numpy's `__array_ufunc__` interface, specified in [NEP13](https://www.numpy.org/neps/nep-0013-ufunc-overrides.html). In brief, if your ExtensionArray implements a compliant `__array_ufunc__` interface, when a numpy ufunc such as `np.floor` is invoked on your array, its -implementation of `__array_ufunc__` will bec called first and given the +implementation of `__array_ufunc__` will be called first and given the opportunity to compute the function. The return value needn't be a numpy ndarray (though it can be). In general, you want the return value to be an instance of your ExtensionArray. In some cases, your implementation can From b8187b4d94c1f7c4669f7aa5bc110f825ededa75 Mon Sep 17 00:00:00 2001 From: pilkibun Date: Wed, 19 Jun 2019 20:38:19 +0300 Subject: [PATCH 20/31] Snip --- doc/source/development/extending.rst | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index 9ad333f50c04e..e292efca5b879 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -270,13 +270,7 @@ when a numpy ufunc such as `np.floor` is invoked on your array, its implementation of `__array_ufunc__` will be called first and given the opportunity to compute the function. The return value needn't be a numpy ndarray (though it can be). In general, you want the return value to be an -instance of your ExtensionArray. In some cases, your implementation can -calculate the result itself (see for example TBD), or, if your ExtensionArray -already has a numeric ndarray backing it, your implementation will itself -invoke the numpy ufunc itself on it (see for example TBD). In either case, -after computing the values, you will usually want to wrap the result as a new -ExtensionArray instance and return it to the caller. Pandas will automatically -use that Array as the backing ExtensionArray for a new Series object. +instance of your ExtensionArray. With ufuncs out of the way, we turn to the remaining numpy operations, such as `np.round`. The simplest way to support these operations is to simply From 387fd680263f1164436a528514f8d95258e21287 Mon Sep 17 00:00:00 2001 From: pilkibun Date: Wed, 19 Jun 2019 20:40:59 +0300 Subject: [PATCH 21/31] US --- doc/source/development/extending.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index e292efca5b879..d15638b036aa8 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -363,7 +363,7 @@ Testing Extension Arrays ^^^^^^^^^^^^^^^^^^^^^^^^ We provide a test suite for ensuring that your extension arrays satisfy the expected -behaviour. To use the test suite, you must provide several pytest fixtures and inherit +behavior. To use the test suite, you must provide several pytest fixtures and inherit from the base test class. The required fixtures are found in https://github.com/pandas-dev/pandas/blob/master/pandas/tests/extension/conftest.py. From b800b6798c592457bf373a89f033c3815006858d Mon Sep 17 00:00:00 2001 From: pilkibun Date: Wed, 19 Jun 2019 20:48:56 +0300 Subject: [PATCH 22/31] Remove explicit list --- doc/source/development/extending.rst | 19 +++++++------------ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index d15638b036aa8..a447cc75e7e83 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -221,18 +221,13 @@ simply call an equivalent numpy function with data from the underlying array. In order to support this operations in a new ExtensionArray, you must provide an implementation for them. -As of 0.25.0, pandas provides its own implementations for the following -operations: :meth:`Series.any`, :meth:`Series.all`, -:meth:`Series.min`, :meth:`Series.max`, :meth:`Series.sum`, -:meth:`Series.mean`, :meth:`Series.median`, :meth:`Series.prod` -(and its alias :meth:`Series.product`), :meth:`Series.std`, -:meth:`Series.var`, :meth:`Series.sem`, -:meth:`Series.kurt`, and :meth:`Series.skew`. - -In order to implement any of these functions, your ExtensionArray must include -an implementation of :meth:`ExtensionArray._reduce`. Once you provide an -implementation of :meth:`ExtensionArray._reduce`, calling the method on the -Series will invoke the implementation on your ExtensionArray. All these +As of 0.25.0, pandas provides its own implementations for some +reduction operations such as min/max/sum/etc'. For your ExtensionArray +to support these methods, it must include an implementation of +:meth:`ExtensionArray._reduce`. See its docstring for a complete list o +if the series operations it handles. Once your EA implements +:meth:`ExtensionArray._reduce`, your implementation will be cailled +whenever one of the related Series method is called. All these methods are reduction functions, and so are expected to return a scalar value of some type. However it is perfectly acceptable to return some instance of an :class:`pandas.api.extensions.ExtensionArray`. From be69f0f186fa3e252bb49e81031fe36255924129 Mon Sep 17 00:00:00 2001 From: pilkibun Date: Wed, 19 Jun 2019 21:00:14 +0300 Subject: [PATCH 23/31] Move sentence to a note --- doc/source/development/extending.rst | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index a447cc75e7e83..28ba8fd07724f 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -229,8 +229,7 @@ if the series operations it handles. Once your EA implements :meth:`ExtensionArray._reduce`, your implementation will be cailled whenever one of the related Series method is called. All these methods are reduction functions, and so are expected to return a scalar value -of some type. However it is perfectly acceptable to return some instance of an -:class:`pandas.api.extensions.ExtensionArray`. +of some type. Series operations which are not handled by :meth:`ExtensionArray._reduce`, such as :meth:`Series.round`, will generally invoke an equivalent numpy @@ -247,15 +246,25 @@ this solution usually falls short, becase any series methods you then use casts your EA into an object ndarray, while you usually want the result to remain an instance of your EA. -The second approach is more involved, but it does a proper job of maintaining -the ExtensionArray's dtype through operations. It requires a detailed -understanding of how numpy functions operate on non ndarray objects. +In most cases, you will want to provide your own implementations of the +methods. This takes more work, but does a proper job of maintaining the +ExtensionArray's dtype through operations. Understanding how to do this +requires a more detailed understanding of how numpy functions operate on non +ndarray objects. Just as pandas handles some operation via :meth:`ExtensionArray._reduce` and others by delegating to numpy, numpy makes a distinction between between two types of operations: ufuncs (such as `np.floor`, `np.ceil`, and `np.abs`), and non-ufuncs (for example `np.round`, and `np.repeat`). +.. note:: + To be clear, although your code will override numpy's own functions, + It is perfectly common, and valid for your function to return an + an instance of :class:`pandas.api.extensions.ExtensionArray`, + usually your own. You are *not* required to return numpy arrays + from these function. + + We will deal with ufuncs first. You can find a list of numpy's ufuncs here (TBD). In order to support numpy ufuncs, a convenient approach is to implement numpy's `__array_ufunc__` interface, specified in From 4d948fa43aa1a1b7fdc0a9954971680b2ed35bb7 Mon Sep 17 00:00:00 2001 From: pilkibun Date: Thu, 20 Jun 2019 06:37:23 +0300 Subject: [PATCH 24/31] cleanups --- doc/source/development/extending.rst | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index 28ba8fd07724f..12822a2405da8 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -292,12 +292,14 @@ most ufuncs without having to provide a special case for each. For an example, s .. important:: - When providing implementations of numpy functions such as `np.round`, - It essential that function signature is compatible with the numpy original. - Otherwise,, numpy will ignore it. + When providing implementations of numpy functions such as `np.round`, + You muse ensure that the method signature is compatible with the numpy method + it implements. + Otherwise, numpy will ignore it. + + For example, the signature for `np.round` is `np.round(a, decimals=0, out=None)`. + if you implement a round function which omits the `out` keyword: - For example, the signature for `np.round` is `np.round(a, decimals=0, out=None)`. - if you implement a round function which omits the `out` keyword: .. code-block:: python @@ -312,6 +314,7 @@ most ufuncs without having to provide a special case for each. For an example, s def round(self, decimals=0, **kwds): pass + An alternative approach to implementing individual functions, is to override `__getattr__` in your ExtensionArray, and to intercept requests for method names which you wish to support (such as `round`). For most functions, @@ -321,7 +324,6 @@ the result in your ExtensionArray, and returns it. This approach can reduce boilerplate significantly, but you do have to maintain a whitelist, and may require more than one case, based on signature. - A third possible approach, is to use the `__array_function__` mechanism introduced by [NEP18](https://www.numpy.org/neps/nep-0018-array-function-protocol.html). This is an opt-in mechanism in numpy 1.16 (by setting an environment variable), and @@ -357,7 +359,7 @@ which does not behave as described, please report it to the team. Formatting Extension Arrays -^^^^^^^^^^^^^^^^^^^^^^^^ +^^^^^^^^^^^^^^^^^^^^^^^^^^^ TBD From eef58bc81a16565f2ecf067a4c7ac2c89a8f31c8 Mon Sep 17 00:00:00 2001 From: pilkibun Date: Thu, 20 Jun 2019 06:59:00 +0300 Subject: [PATCH 25/31] Rewrites --- doc/source/development/extending.rst | 52 ++++++++++++++-------------- 1 file changed, 26 insertions(+), 26 deletions(-) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index 12822a2405da8..554a631ba58d2 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -224,8 +224,8 @@ an implementation for them. As of 0.25.0, pandas provides its own implementations for some reduction operations such as min/max/sum/etc'. For your ExtensionArray to support these methods, it must include an implementation of -:meth:`ExtensionArray._reduce`. See its docstring for a complete list o -if the series operations it handles. Once your EA implements +:meth:`ExtensionArray._reduce`. See its docstring for a complete list of +the series operations it handles. Once your EA implements :meth:`ExtensionArray._reduce`, your implementation will be cailled whenever one of the related Series method is called. All these methods are reduction functions, and so are expected to return a scalar value @@ -258,17 +258,16 @@ between two types of operations: ufuncs (such as `np.floor`, `np.ceil`, and `np.abs`), and non-ufuncs (for example `np.round`, and `np.repeat`). .. note:: - To be clear, although your code will override numpy's own functions, - It is perfectly common, and valid for your function to return an - an instance of :class:`pandas.api.extensions.ExtensionArray`, - usually your own. You are *not* required to return numpy arrays - from these function. + Although your methods will override numpy's own methods, they + are *not* required to return numpy arrays or builtin python types. In + fact, you will often want your method to return a new instance of your + :class:`pandas.api.extensions.ExtensionArray` as the return value. We will deal with ufuncs first. You can find a list of numpy's ufuncs here (TBD). In order to support numpy ufuncs, a convenient approach is to implement numpy's `__array_ufunc__` interface, specified in -[NEP13](https://www.numpy.org/neps/nep-0013-ufunc-overrides.html). In brief, +`NEP-13 `__ if your ExtensionArray implements a compliant `__array_ufunc__` interface, when a numpy ufunc such as `np.floor` is invoked on your array, its implementation of `__array_ufunc__` will be called first and given the @@ -280,7 +279,7 @@ With ufuncs out of the way, we turn to the remaining numpy operations, such as `np.round`. The simplest way to support these operations is to simply implement a compatible method on your ExtensionArray. For example, if your ExtensionArray has a compatible `round` method on your ExtensionArray, -When python involves `ser.round()`, :meth:``Series.round` will invoke +When :meth:`Series.round` is called, it in turn calls `np.round(self.array)`, which will pass your ExtensionArray to the `np.round` method. Numpy will detect that your EA implements a compatible `round` and will invoke it to perform the operation. As in the ufunc case, @@ -294,8 +293,7 @@ most ufuncs without having to provide a special case for each. For an example, s When providing implementations of numpy functions such as `np.round`, You muse ensure that the method signature is compatible with the numpy method - it implements. - Otherwise, numpy will ignore it. + it implements. If the signatures do not match, numpy will ignore it. For example, the signature for `np.round` is `np.round(a, decimals=0, out=None)`. if you implement a round function which omits the `out` keyword: @@ -324,24 +322,26 @@ the result in your ExtensionArray, and returns it. This approach can reduce boilerplate significantly, but you do have to maintain a whitelist, and may require more than one case, based on signature. -A third possible approach, is to use the `__array_function__` -mechanism introduced by [NEP18](https://www.numpy.org/neps/nep-0018-array-function-protocol.html). -This is an opt-in mechanism in numpy 1.16 (by setting an environment variable), and -is enabled by default starting with numpy 1.17. As of 1.17 it is still considered -experimental, and its design is actively being revised. We will not discuss it further -here, but it is certainly possible to make use of it to achieve the same goal. +A third possible approach, is to use the `__array_function__` mechanism +introduced by numpy's +`NEP-18 `__ +proposal. NEP-18 is an experimental mechanism introduced in numpy 1.16, and is +enabled by default starting with numpy 1.17 (to enable it in 1.16, you must +set the environment variable `NUMPY_EXPERIMENTAL_ARRAY_FUNCTION` in your +shell). NEP-18 is an "opt-in, all-in" solution, meaning that if you choose to +make use of it in your class, by implementing the `__array_function__` +interface, it will always be used when (non-ufunc) numpy methods are called +with an instance of your EA as the argument. Numpy will not make use of an `__array__` +method if you have one. If you include both a `__array_function__` and an +implementation of `round`, for example, numpy will always invoke `__array_function__` +when `np.round` is passed an instance of your EA. .. important:: - Implementing `__array_function__` is not a substitute for implementing `__array_ufunc__`. - The `__array_function__` mechanism complements (and to a degree copies) the`__array_ufunc__` - mechanism, by providing the same flexibility for non-ufuncs. -.. important:: - `__array_function__` is an "all-in" solution. That means that if you cannot mix it with - explicit implementations for some methods and using `__array_function__` for some. - If you both `__array_function__` and also provide an implementation of `round`, numpy - will invoke `__array_function__` for all the operations in the specification, **including** - `round`. + If you choose to implement `__array_function__`, you will still need to + implement `__array_ufunc__` in order to override ufuncs. Each of these + two interfaces covers a seperate portion of numpy's functionality. + With this overview in hand, you hopefully have the necessary information in order to develop rich, full-featured ExtensionArrays that seamlessly plug in to pandas. From 068ff610ae1c732fdd7207a2682de8b7e8a33b77 Mon Sep 17 00:00:00 2001 From: pilkibun Date: Thu, 20 Jun 2019 07:11:48 +0300 Subject: [PATCH 26/31] Rewrite --- doc/source/development/extending.rst | 32 +++++++++++++--------------- 1 file changed, 15 insertions(+), 17 deletions(-) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index 554a631ba58d2..ee0fa5a77c71e 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -275,19 +275,19 @@ opportunity to compute the function. The return value needn't be a numpy ndarray (though it can be). In general, you want the return value to be an instance of your ExtensionArray. -With ufuncs out of the way, we turn to the remaining numpy operations, such -as `np.round`. The simplest way to support these operations is to simply +With ufuncs out of the way, we turn to the remaining numpy operations, such as +`np.round`. The simplest way to support these operations is to simply implement a compatible method on your ExtensionArray. For example, if your -ExtensionArray has a compatible `round` method on your ExtensionArray, -When :meth:`Series.round` is called, it in turn calls -`np.round(self.array)`, which will pass your ExtensionArray to the `np.round` -method. Numpy will detect that your EA implements a compatible `round` -and will invoke it to perform the operation. As in the ufunc case, -your implementation will generally perform the calculation itself, -or call numpy on its own backing numeric array, and in either case -will wrap the result as a new instance of ExtensionArray and return that -as a result. It is usually possible to write generic code to handle -most ufuncs without having to provide a special case for each. For an example, see TBD. +ExtensionArray has a compatible `round` method on your ExtensionArray, When +:meth:`Series.round` is called, it in turn calls `np.round(self.array)`, +passing your EA into numpy's dispatch logic. Numpy will detect that your EA +implements a compatible `round` method and use it instead of its own +version. As in the ufunc case, your implementation will perform +the calculation on its internal data, and then usually wrap the +result in anew instance of your EA class, and return that as the result. + +It is usually possible to write generic code to handle most ufuncs, +instead of providing a special case for each. For an example, see TBD. .. important:: @@ -296,8 +296,7 @@ most ufuncs without having to provide a special case for each. For an example, s it implements. If the signatures do not match, numpy will ignore it. For example, the signature for `np.round` is `np.round(a, decimals=0, out=None)`. - if you implement a round function which omits the `out` keyword: - + if you implement a round function which omits the `out` keyword, .. code-block:: python @@ -305,15 +304,14 @@ most ufuncs without having to provide a special case for each. For an example, s pass - numpy will ignore it. The following will work however: - +\... numpy will ignore it. The following will work however: .. code-block:: python def round(self, decimals=0, **kwds): pass -An alternative approach to implementing individual functions, is to override +An second possible approach to implementing individual operations, is to override `__getattr__` in your ExtensionArray, and to intercept requests for method names which you wish to support (such as `round`). For most functions, you can return a dynamically generated function, which simply calls From 40988256ade3c32ad9b8b4997261303468d262de Mon Sep 17 00:00:00 2001 From: pilkibun Date: Thu, 20 Jun 2019 07:16:23 +0300 Subject: [PATCH 27/31] Rephrase --- doc/source/development/extending.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index ee0fa5a77c71e..9a3a730c1329e 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -336,7 +336,7 @@ when `np.round` is passed an instance of your EA. .. important:: - If you choose to implement `__array_function__`, you will still need to + Even if you choose to implement `__array_function__`, you still need to implement `__array_ufunc__` in order to override ufuncs. Each of these two interfaces covers a seperate portion of numpy's functionality. From 2d94b82ebdf66436fc63e1ade6063b4cbe53b6f5 Mon Sep 17 00:00:00 2001 From: pilkibun Date: Thu, 20 Jun 2019 07:16:41 +0300 Subject: [PATCH 28/31] whitespace --- doc/source/development/extending.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index 9a3a730c1329e..07494247d4d93 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -347,6 +347,7 @@ EA support is still being actively worked on, so if you encounter a bug, or beha which does not behave as described, please report it to the team. .. important:: + You are not required to provide implementations for the full complement of Series operations in your ExtensionArray. In fact, some of them may not even make sense within your context. You may also choose to add implementations incrementally, From 5a9125b3e0628348cf679513cfadb2b81cf797a0 Mon Sep 17 00:00:00 2001 From: pilkibun Date: Thu, 20 Jun 2019 07:20:39 +0300 Subject: [PATCH 29/31] whitespace --- doc/source/development/extending.rst | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index 07494247d4d93..2b31c372937e5 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -258,7 +258,6 @@ between two types of operations: ufuncs (such as `np.floor`, `np.ceil`, and `np.abs`), and non-ufuncs (for example `np.round`, and `np.repeat`). .. note:: - Although your methods will override numpy's own methods, they are *not* required to return numpy arrays or builtin python types. In fact, you will often want your method to return a new instance of your @@ -305,6 +304,7 @@ instead of providing a special case for each. For an example, see TBD. \... numpy will ignore it. The following will work however: + .. code-block:: python def round(self, decimals=0, **kwds): @@ -335,7 +335,6 @@ implementation of `round`, for example, numpy will always invoke `__array_functi when `np.round` is passed an instance of your EA. .. important:: - Even if you choose to implement `__array_function__`, you still need to implement `__array_ufunc__` in order to override ufuncs. Each of these two interfaces covers a seperate portion of numpy's functionality. @@ -347,7 +346,6 @@ EA support is still being actively worked on, so if you encounter a bug, or beha which does not behave as described, please report it to the team. .. important:: - You are not required to provide implementations for the full complement of Series operations in your ExtensionArray. In fact, some of them may not even make sense within your context. You may also choose to add implementations incrementally, From 6866b66fb2372e386862e235d5cd55c09374da96 Mon Sep 17 00:00:00 2001 From: pilkibun Date: Thu, 20 Jun 2019 07:21:17 +0300 Subject: [PATCH 30/31] cleanup --- doc/source/development/extending.rst | 3 --- 1 file changed, 3 deletions(-) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index 2b31c372937e5..9eef052c37a93 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -351,9 +351,6 @@ which does not behave as described, please report it to the team. within your context. You may also choose to add implementations incrementally, as the need arises. - TBD: should we have a standard way of signalling not supported instead of a - random AttributeError exception being thrown. - Formatting Extension Arrays ^^^^^^^^^^^^^^^^^^^^^^^^^^^ From 1f689c18d9e16591a30296f6f109102799629940 Mon Sep 17 00:00:00 2001 From: pilkibun Date: Thu, 20 Jun 2019 07:21:50 +0300 Subject: [PATCH 31/31] reword --- doc/source/development/extending.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/development/extending.rst b/doc/source/development/extending.rst index 9eef052c37a93..9a218e0afadfd 100644 --- a/doc/source/development/extending.rst +++ b/doc/source/development/extending.rst @@ -348,7 +348,7 @@ which does not behave as described, please report it to the team. .. important:: You are not required to provide implementations for the full complement of Series operations in your ExtensionArray. In fact, some of them may not even make sense - within your context. You may also choose to add implementations incrementally, + within its context. You may also choose to add implementations incrementally, as the need arises.