Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functional PLS for dimensionality reduction and regression #548

Merged
merged 54 commits into from
Nov 7, 2023

Conversation

Ddelval
Copy link
Contributor

@Ddelval Ddelval commented Jun 24, 2023

This implementation of FPLS accepts multivariate or functional data. Either X or Y can be multivariate or functional. Regularization in functional blocks is also implemented.
Regarding the regression functionality, PLSRegression also accepts functional or multivariate data as the regressor or the response. However, the functional response of the PLS regressor cannot be regularized.

skfda/preprocessing/dim_reduction/_fpls.py Outdated Show resolved Hide resolved
skfda/preprocessing/dim_reduction/_fpls.py Outdated Show resolved Hide resolved
skfda/preprocessing/dim_reduction/_fpls.py Outdated Show resolved Hide resolved
skfda/preprocessing/dim_reduction/_fpls.py Outdated Show resolved Hide resolved
skfda/preprocessing/dim_reduction/_fpls.py Outdated Show resolved Hide resolved
skfda/tests/test_fpls_regression.py Show resolved Hide resolved
skfda/tests/test_fpls_regression.py Show resolved Hide resolved
skfda/tests/test_fpls_regression.py Show resolved Hide resolved
skfda/tests/test_fpls_regression.py Show resolved Hide resolved
skfda/tests/test_fpls_regression.py Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jun 24, 2023

Codecov Report

Attention: 16 lines in your changes are missing coverage. Please review.

Comparison is base (4207c4d) 86.27% compared to head (e0c4ab5) 86.67%.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #548      +/-   ##
===========================================
+ Coverage    86.27%   86.67%   +0.40%     
===========================================
  Files          149      153       +4     
  Lines        11912    12390     +478     
===========================================
+ Hits         10277    10739     +462     
- Misses        1635     1651      +16     
Files Coverage Δ
skfda/ml/regression/_fpls_regression.py 100.00% <100.00%> (ø)
skfda/tests/test_fpls.py 100.00% <100.00%> (ø)
skfda/tests/test_fpls_regression.py 100.00% <100.00%> (ø)
skfda/preprocessing/dim_reduction/_fpls.py 93.54% <93.54%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Ddelval Ddelval marked this pull request as draft June 24, 2023 16:39
@Ddelval Ddelval requested a review from vnmabus August 1, 2023 05:54
docs/modules/ml/regression.rst Outdated Show resolved Hide resolved
skfda/preprocessing/dim_reduction/_fpls.py Outdated Show resolved Hide resolved
skfda/preprocessing/dim_reduction/_fpls.py Outdated Show resolved Hide resolved
skfda/preprocessing/dim_reduction/_fpls.py Outdated Show resolved Hide resolved
skfda/preprocessing/dim_reduction/_fpls.py Show resolved Hide resolved
skfda/ml/regression/_fpls_regression.py Show resolved Hide resolved


# Ignore too many public instance attributes
class _FPLSBlock(Generic[BlockType]): # noqa: WPS230
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather have a Block abstract class and then separate classes implementing it, one for each kind of data. Then there will be no need to switch on the type of data inside each method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Note that the factory function was still implemented with a switch. I tried to implement it using multidispatch. However, in that case, mypy does not identify the dependency between the type of data and the type of the returned FPLSBlock.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should, as BlockType is a TypeVar... What error did you get?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. It should. But it seems like mypy can no longer associate the BlockType variable to InputTypeX and InputTypeY, which leads to typing errors every time that a method from a block is called. Here are the errors:

skfda/preprocessing/dim_reduction/_fpls.py:814: error: Argument 1 to "transform" of "_FPLSBlock" has incompatible type "InputTypeX"; expected "BlockType"  [arg-type]
skfda/preprocessing/dim_reduction/_fpls.py:817: error: Argument 1 to "transform" of "_FPLSBlock" has incompatible type "InputTypeY"; expected "BlockType"  [arg-type]
skfda/preprocessing/dim_reduction/_fpls.py:839: error: Argument 1 to "transform" of "_FPLSBlock" has incompatible type "InputTypeX"; expected "BlockType"  [arg-type]
skfda/preprocessing/dim_reduction/_fpls.py:858: error: Argument 1 to "transform" of "_FPLSBlock" has incompatible type "InputTypeY"; expected "BlockType"  [arg-type]
skfda/preprocessing/dim_reduction/_fpls.py:886: error: Incompatible return value type (got "Tuple[BlockType, BlockType]", expected "Union[InputTypeX, Tuple[InputTypeX, InputTypeY]]")  [return-value]
skfda/preprocessing/dim_reduction/_fpls.py:888: error: Incompatible return value type (got "BlockType", expected "Union[InputTypeX, Tuple[InputTypeX, InputTypeY]]")  [return-value]
skfda/preprocessing/dim_reduction/_fpls.py:906: error: Incompatible return value type (got "BlockType", expected "InputTypeX")  [return-value]
skfda/preprocessing/dim_reduction/_fpls.py:924: error: Incompatible return value type (got "BlockType", expected "InputTypeY")  [return-value]

Note that the line numbers obviously do not match the contents of this pull request.
My implementation using multimethodis the following:

@multimethod.multidispatch
def _fpls_block_factory(
    data: BlockType,
    n_components: int,
    label: str,
    integration_weights: Union[NDArrayFloat, None],
    regularization: Union[L2Regularization[Any], None],
    weights_basis: Union[Basis, None],
) -> _FPLSBlock[BlockType]:
    raise TypeError("Invalid type for data")


@_fpls_block_factory.register
def _fpls_block_factory_array(
    data: np.ndarray,
    n_components: int,
    label: str,
    integration_weights: Union[np.ndarray, None],
    regularization: Union[L2Regularization[Any], None],
    weights_basis: Union[Basis, None],
) -> _FPLSBlock[NDArrayFloat]:
    return _FPLSBlockMultivariate(
        data=data,
        n_components=n_components,
        label=label,
    )


@_fpls_block_factory.register
def _fpls_block_factory_basis(
    data: FDataBasis,
    n_components: int,
    label: str,
    integration_weights: Union[np.ndarray, None],
    regularization: Union[L2Regularization[Any], None],
    weights_basis: Union[Basis, None],
) -> _FPLSBlock[FDataBasis]:
    return _FPLSBlockBasis(
        data=data,
        n_components=n_components,
        label=label,
        regularization=regularization,
        weights_basis=weights_basis,
    )


@_fpls_block_factory.register
def _fpls_block_factory_grid(
    data: FDataGrid,
    n_components: int,
    label: str,
    integration_weights: Union[np.ndarray, None],
    regularization: Union[L2Regularization[Any], None],
    weights_basis: Union[Basis, None],
) -> _FPLSBlock[FDataGrid]:
    return _FPLSBlockGrid(
        data=data,
        n_components=n_components,
        label=label,
        integration_weights=integration_weights,
        regularization=regularization,
    )

skfda/preprocessing/dim_reduction/_fpls.py Outdated Show resolved Hide resolved
skfda/preprocessing/dim_reduction/_fpls.py Show resolved Hide resolved
skfda/preprocessing/dim_reduction/_fpls.py Outdated Show resolved Hide resolved
@vnmabus
Copy link
Member

vnmabus commented Oct 14, 2023

The requested changes are minor. @Ddelval, if you do not have time/will for doing them I can finish this PR myself and merge it. However, I would really appreciate if you could spare a few minutes to answer my questions, as you are way more familiar with the algorithm.

@Ddelval
Copy link
Contributor Author

Ddelval commented Oct 14, 2023

Hi @vnmabus, I believe I should have some time towards the end of next week. I already had a look at your comments a while ago, but I didn't have the time to reply. I haven't had much time lately, but I'm hoping that'll improve a bit this week.

This simplifies the notation and removes unnecesary
parameters that are already stored in the FPLSClass
In sklearn, they set to zeros the values that are very close to zero.
Then they try to perform the power method. If all columns of the
Y matrix are too small, then they consider the algorihtm to have
converged.
This is a very similar approach, but taking into account the
magnitude of X and Y to calculate the epsilons
@Ddelval
Copy link
Contributor Author

Ddelval commented Oct 28, 2023

Hi @vnmabus. Sorry for the delay. It seems like I tend to overestimate the amount of free time I have 🙃.
Anyway, I replied to most of your comments. Additionally, I changed the behavior when not all requested components can be extracted. This will happen, for example, if the range of the data matrices is less than n_components. Right now, as many components as possible are returned., and a warning is raised when this happens.

skfda/tests/test_fpls.py Outdated Show resolved Hide resolved
@Ddelval Ddelval requested a review from vnmabus October 28, 2023 23:57
@Ddelval
Copy link
Contributor Author

Ddelval commented Nov 2, 2023

Hi @vnmabus, I'm getting an error in test_regression in some test runs. Could you have a look at it? I had a look, and I couldn't find any recent changes that could make the assert fail.

Copy link
Member

@vnmabus vnmabus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that I will merge this now. If I need to change things I think that I can do it by myself. Thank you for your help!

@vnmabus vnmabus merged commit 4a368bd into develop Nov 7, 2023
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants