Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add standard deviation as a function #566

Merged
merged 33 commits into from
Oct 13, 2023
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
6eada72
STD for FDataBasis and FDataGrid
pcuestas Jul 29, 2023
036028a
A test for std(FData).
pcuestas Jul 29, 2023
b2f251a
FIX typing issues in function_to_fdatabasis
pcuestas Jul 30, 2023
f5d59a9
Fix style issues.
pcuestas Aug 17, 2023
3e16d90
Fix style issues.
pcuestas Aug 17, 2023
5f3631b
Fix some issues outlined in the pull request:
pcuestas Aug 25, 2023
4c2757e
Test for std_fdatagrid.
pcuestas Aug 25, 2023
23b2d4e
Fix style.
pcuestas Aug 25, 2023
7137aba
Test for std_fdatagrid.
pcuestas Aug 26, 2023
71f7071
More than 1 evaluation at a time in std_fdatabasis
pcuestas Aug 26, 2023
614aac5
Fix test_stats_std style issues.
pcuestas Aug 27, 2023
7a8ece0
Exact std function improved to accept more than one input and to retu…
pcuestas Aug 28, 2023
eb80f31
Undo test_stats changes
pcuestas Sep 4, 2023
32233c9
Import order
pcuestas Sep 4, 2023
0fcbb9c
Merge branch 'develop' into feature/541-add-standard-deviation-as-a-f…
pcuestas Sep 4, 2023
026d488
std_fdatabasis with dim>1 (domain or/and codomain) + tests
pcuestas Sep 14, 2023
88c7fd4
Do not return input in function_to_fdatabasis, but a copy. No sample …
pcuestas Sep 17, 2023
0457e39
Fix test_stats_std readability. Remove confusing Gaussian processes t…
pcuestas Sep 17, 2023
ccb8ed4
mypy ignore[no-any-return] on fixtures
pcuestas Sep 17, 2023
c7f70a1
Fix typing issues in std_function inside of std_fdatabasis
pcuestas Sep 17, 2023
0b51aa5
Reorganize std_function for clarity
pcuestas Sep 17, 2023
a0d7d7e
import FDatabasis ignore warning in function_to_fdatabasis
pcuestas Sep 17, 2023
0c11d71
Fix style
pcuestas Sep 17, 2023
533d13b
Add std to stats.rst
pcuestas Sep 20, 2023
f3845b5
Fix type checking for function_to_fdatabasis
pcuestas Sep 20, 2023
21967db
Isort
pcuestas Sep 20, 2023
ce105d9
Fix nested function warning
pcuestas Sep 20, 2023
23432e2
Merge branch 'develop' into feature/541-add-standard-deviation-as-a-f…
vnmabus Sep 20, 2023
5509c00
Fix tests, remove type from std.register
pcuestas Oct 4, 2023
6b5e47d
sum instead of diag to compute std in std_fdatagrid integrand
pcuestas Oct 12, 2023
8897f29
Merge branch 'fix/581-default-correction=0-for-cov-var' into feature/…
pcuestas Oct 12, 2023
bb5923d
Rename ddof -> correction
pcuestas Oct 12, 2023
3115a6d
Merge branch 'develop' into feature/541-add-standard-deviation-as-a-f…
vnmabus Oct 13, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions skfda/_utils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
"_same_domain",
"_to_grid",
"_to_grid_points",
"function_to_fdatabasis",
"nquad_vec",
],
'_warping': [
Expand Down
36 changes: 34 additions & 2 deletions skfda/_utils/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@

import functools
import numbers
from functools import singledispatch
from typing import (
TYPE_CHECKING,
Any,
Expand Down Expand Up @@ -36,7 +35,7 @@
ArrayDTypeT = TypeVar("ArrayDTypeT", bound="np.generic")

if TYPE_CHECKING:
from ..representation import FData, FDataGrid
from ..representation import FData, FDataBasis, FDataGrid
from ..representation.basis import Basis
from ..representation.extrapolation import ExtrapolationLike

Expand Down Expand Up @@ -609,3 +608,36 @@ def _classifier_get_classes(
f'one; got {classes.size} class',
)
return classes, y_ind


def function_to_fdatabasis(
f: Callable[[NDArrayFloat], NDArrayFloat],
new_basis: Basis,
) -> FDataBasis:
"""Express a math function as a FDataBasis with a given basis.

Args:
f: math function.
new_basis: the basis of the output.

Returns:
pcuestas marked this conversation as resolved.
Show resolved Hide resolved
FDataBasis: FDataBasis with calculated coefficients and the new
basis.
"""
from .. import FDataBasis
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
WPS442 Found outer scope names shadowing: FDataBasis

pcuestas marked this conversation as resolved.
Show resolved Hide resolved
from ..misc._math import inner_product_matrix

vnmabus marked this conversation as resolved.
Show resolved Hide resolved
if isinstance(f, FDataBasis) and f.basis == new_basis:
return f
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we should return a copy here or not. I am worried that not copying can cause subtle bugs in the future, as the semantics of this function imply that a new FDataBasis is returned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think returning a copy is safer. Should it be a deep copy?


inner_prod = inner_product_matrix(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫 [mypy] reported by reviewdog 🐶
Value of type variable "Vector" of "inner_product_matrix" cannot be "object" [type-var]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because the typing of inner_product and inner_product_matrix is wrong. It should accept callables too. This should be fixed in another PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't inner_product and inner_product_matrix accept callables already?

def inner_product(
arg1: Vector,
arg2: Vector,
*,
_matrix: bool = False,
_domain_range: Optional[DomainRange] = None,
**kwargs: Any,
) -> NDArrayFloat:

def inner_product_matrix(
arg1: Vector,
arg2: Optional[Vector] = None,
**kwargs: Any,
) -> NDArrayFloat:

And

Vector = TypeVar(
"Vector",
bound=Union[NDArrayFloat, Basis, Callable[[NDArrayFloat], NDArrayFloat]],
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They accept them at runtime. However, they are typed as accepting Vector, a protocol which defines sum and scalar product. Normal callables do not define these operations (FData are an exception).

new_basis,
f,
_domain_range=new_basis.domain_range,
)

gram_matrix = new_basis.gram_matrix()

coefs = np.linalg.solve(gram_matrix, inner_prod)

return FDataBasis(new_basis, coefs.T)
2 changes: 2 additions & 0 deletions skfda/exploratory/stats/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
"gmean",
"mean",
"modified_epigraph_index",
"std",
"trim_mean",
"var",
],
Expand All @@ -37,6 +38,7 @@
gmean as gmean,
mean as mean,
modified_epigraph_index as modified_epigraph_index,
std as std,
trim_mean as trim_mean,
var as var,
)
65 changes: 63 additions & 2 deletions skfda/exploratory/stats/_stats.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,13 @@
from builtins import isinstance
from typing import Callable, TypeVar, Union

import functools
pcuestas marked this conversation as resolved.
Show resolved Hide resolved
import numpy as np
from scipy import integrate
from scipy.stats import rankdata

from ...misc.metrics._lp_distances import l2_distance
from ...representation import FData, FDataGrid
from ...representation import FData, FDataBasis, FDataGrid
from ...typing._metric import Metric
from ...typing._numpy import NDArrayFloat
from ..depth import Depth, ModifiedBandDepth
Expand Down Expand Up @@ -76,7 +77,7 @@ def gmean(X: FDataGrid) -> FDataGrid:

def cov(
X: FData,
ddof: int = 1
ddof: int = 1,
) -> Callable[[NDArrayFloat, NDArrayFloat], NDArrayFloat]:
"""
Compute the covariance.
Expand All @@ -99,6 +100,66 @@ def cov(
return X.cov(ddof=ddof)


@functools.singledispatch
def std(X: F, ddof: int = 1) -> F:
pcuestas marked this conversation as resolved.
Show resolved Hide resolved
r"""
Compute the standard deviation of all the samples in a FData object.

.. math::
\text{std}_X(t) = \sqrt{\frac{1}{N-\text{ddof}}
\sum_{n=1}^{N}{\left(X_n(t) - \overline{X}(t)\right)^2}}

Args:
X: Object containing all the samples whose standard deviation is
wanted.
ddof: Means "Delta Degrees of Freedom". The divisor used in
calculations is `N - ddof`, where `N` represents the number of
samples in `X`. By default ddof is 1.

Returns:
Standard deviation of all the samples in the original object, as a
:term:`functional data object` with just one sample.

"""
raise NotImplementedError("Not implemented for this type")


@std.register(FDataGrid)
pcuestas marked this conversation as resolved.
Show resolved Hide resolved
def std_fdatagrid(X: FDataGrid, ddof: int = 1) -> FDataGrid:
pcuestas marked this conversation as resolved.
Show resolved Hide resolved
"""Standard deviation of a FDataGrid."""
pcuestas marked this conversation as resolved.
Show resolved Hide resolved
return X.copy(
data_matrix=np.std(X.data_matrix, axis=0, ddof=ddof)[np.newaxis, ...],
sample_names=("standard deviation",),
pcuestas marked this conversation as resolved.
Show resolved Hide resolved
)


@std.register(FDataBasis)
def std_fdatabasis(X: FDataBasis, ddof: int = 1) -> FDataBasis:
pcuestas marked this conversation as resolved.
Show resolved Hide resolved

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
WPS473 Found too many empty lines in def: 8 > 7

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the pep8 advice.

"""Standard deviation of a FDataBasis."""
pcuestas marked this conversation as resolved.
Show resolved Hide resolved
from ..._utils import function_to_fdatabasis

if X.dim_domain != 1 or X.dim_codomain != 1:
pcuestas marked this conversation as resolved.
Show resolved Hide resolved
raise NotImplementedError(
"Standard deviation only implemented "
"for univariate functions.",
)

basis = X.basis
coeff_matrix = np.cov(X.coefficients, rowvar=False, ddof=ddof)

def std_function(t_points: NDArrayFloat) -> NDArrayFloat:
pcuestas marked this conversation as resolved.
Show resolved Hide resolved
assert len(t_points) == 1, (
pcuestas marked this conversation as resolved.
Show resolved Hide resolved
"Standard deviation function only implemented for "
"one-point-at-a-time evaluations."
)
basis_evaluation = basis(t_points).reshape((-1, 1))
return np.sqrt(
pcuestas marked this conversation as resolved.
Show resolved Hide resolved
basis_evaluation.T @ coeff_matrix @ basis_evaluation,
).reshape((1, -1, 1))

return function_to_fdatabasis(f=std_function, new_basis=X.basis)
pcuestas marked this conversation as resolved.
Show resolved Hide resolved


def modified_epigraph_index(X: FDataGrid) -> NDArrayFloat:
"""
Calculate the Modified Epigraph Index of a FDataGrid.
Expand Down
11 changes: 9 additions & 2 deletions skfda/tests/test_stats.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,15 @@

import numpy as np

from skfda.datasets import fetch_phoneme, fetch_tecator, fetch_weather
from skfda.exploratory.stats import geometric_median, modified_epigraph_index
from skfda.datasets import (
pcuestas marked this conversation as resolved.
Show resolved Hide resolved
fetch_phoneme,
pcuestas marked this conversation as resolved.
Show resolved Hide resolved
fetch_tecator,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
I001 isort found an import in the wrong position

fetch_weather,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
I001 isort found an import in the wrong position

)
pcuestas marked this conversation as resolved.
Show resolved Hide resolved
from skfda.exploratory.stats import (
pcuestas marked this conversation as resolved.
Show resolved Hide resolved
geometric_median,
pcuestas marked this conversation as resolved.
Show resolved Hide resolved
modified_epigraph_index,
pcuestas marked this conversation as resolved.
Show resolved Hide resolved
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
I005 isort found an unexpected missing import

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
I005 isort found an unexpected missing import

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
I005 isort found an unexpected missing import

pcuestas marked this conversation as resolved.
Show resolved Hide resolved


class TestGeometricMedian(unittest.TestCase):
Expand Down
83 changes: 83 additions & 0 deletions skfda/tests/test_stats_std.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
"""Test stats functions."""

from __future__ import annotations

from typing import Any

import numpy as np
import pytest

from skfda import FDataGrid
from skfda.datasets import make_gaussian_process
from skfda.exploratory.stats import std
from skfda.misc.covariances import Gaussian
from skfda.representation.basis import FourierBasis

pcuestas marked this conversation as resolved.
Show resolved Hide resolved

@pytest.fixture(params=[61, 71])
def n_basis(request: Any) -> int:
"""Fixture for n_basis to test."""
return request.param
pcuestas marked this conversation as resolved.
Show resolved Hide resolved


@pytest.fixture
def start() -> int:
pcuestas marked this conversation as resolved.
Show resolved Hide resolved
"""Fixture for the infimum of the domain."""
return 0


@pytest.fixture
def stop() -> int:
"""Fixture for the supremum of the domain."""
return 1


@pytest.fixture
def n_features() -> int:
"""Fixture for the number of features."""
return 1000


@pytest.fixture
def gaussian_process(start: int, stop: int, n_features: int) -> FDataGrid:
"""Fixture for a Gaussian process."""
return make_gaussian_process(
start=start,
stop=stop,
n_samples=100,
n_features=n_features,
mean=0.0,
cov=Gaussian(variance=1, length_scale=0.1),
random_state=0,
)


def test_std_gaussian_fourier(
start: int,
stop: int,
n_features: int,
n_basis: int,
gaussian_process: FDataGrid,
) -> None:
"""Test standard deviation: Gaussian processes and a Fourier basis."""
fourier_basis = FourierBasis(n_basis=n_basis, domain_range=(0, 1))
fd = gaussian_process.to_basis(fourier_basis)

std_fd = std(fd)
grid = np.linspace(start, stop, n_features)
almost_std_fd = std(fd.to_grid(grid)).to_basis(fourier_basis)

inner_grid_limit = n_features // 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I do not get what you are trying to do in this test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was not well designed and that the newer tests are clearer. I removed this one.

inner_grid = grid[inner_grid_limit:-inner_grid_limit]
np.testing.assert_allclose(
std_fd(inner_grid),
almost_std_fd(inner_grid),
rtol=1e-3,
)

outer_grid = grid[:inner_grid_limit] + grid[-inner_grid_limit:]
np.testing.assert_allclose(
std_fd(outer_grid),
almost_std_fd(outer_grid),
rtol=1e-2,
)