-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Functional PLS for dimensionality reduction and regression #548
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## develop #548 +/- ##
===========================================
+ Coverage 86.27% 86.67% +0.40%
===========================================
Files 149 153 +4
Lines 11912 12390 +478
===========================================
+ Hits 10277 10739 +462
- Misses 1635 1651 +16
☔ View full report in Codecov by Sentry. |
This reverts commit ec5471c.
|
||
|
||
# Ignore too many public instance attributes | ||
class _FPLSBlock(Generic[BlockType]): # noqa: WPS230 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather have a Block abstract class and then separate classes implementing it, one for each kind of data. Then there will be no need to switch on the type of data inside each method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Note that the factory function was still implemented with a switch. I tried to implement it using multidispatch
. However, in that case, mypy
does not identify the dependency between the type of data
and the type of the returned FPLSBlock
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should, as BlockType
is a TypeVar
... What error did you get?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. It should. But it seems like mypy can no longer associate the BlockType variable to InputTypeX and InputTypeY, which leads to typing errors every time that a method from a block is called. Here are the errors:
skfda/preprocessing/dim_reduction/_fpls.py:814: error: Argument 1 to "transform" of "_FPLSBlock" has incompatible type "InputTypeX"; expected "BlockType" [arg-type]
skfda/preprocessing/dim_reduction/_fpls.py:817: error: Argument 1 to "transform" of "_FPLSBlock" has incompatible type "InputTypeY"; expected "BlockType" [arg-type]
skfda/preprocessing/dim_reduction/_fpls.py:839: error: Argument 1 to "transform" of "_FPLSBlock" has incompatible type "InputTypeX"; expected "BlockType" [arg-type]
skfda/preprocessing/dim_reduction/_fpls.py:858: error: Argument 1 to "transform" of "_FPLSBlock" has incompatible type "InputTypeY"; expected "BlockType" [arg-type]
skfda/preprocessing/dim_reduction/_fpls.py:886: error: Incompatible return value type (got "Tuple[BlockType, BlockType]", expected "Union[InputTypeX, Tuple[InputTypeX, InputTypeY]]") [return-value]
skfda/preprocessing/dim_reduction/_fpls.py:888: error: Incompatible return value type (got "BlockType", expected "Union[InputTypeX, Tuple[InputTypeX, InputTypeY]]") [return-value]
skfda/preprocessing/dim_reduction/_fpls.py:906: error: Incompatible return value type (got "BlockType", expected "InputTypeX") [return-value]
skfda/preprocessing/dim_reduction/_fpls.py:924: error: Incompatible return value type (got "BlockType", expected "InputTypeY") [return-value]
Note that the line numbers obviously do not match the contents of this pull request.
My implementation using multimethod
is the following:
@multimethod.multidispatch
def _fpls_block_factory(
data: BlockType,
n_components: int,
label: str,
integration_weights: Union[NDArrayFloat, None],
regularization: Union[L2Regularization[Any], None],
weights_basis: Union[Basis, None],
) -> _FPLSBlock[BlockType]:
raise TypeError("Invalid type for data")
@_fpls_block_factory.register
def _fpls_block_factory_array(
data: np.ndarray,
n_components: int,
label: str,
integration_weights: Union[np.ndarray, None],
regularization: Union[L2Regularization[Any], None],
weights_basis: Union[Basis, None],
) -> _FPLSBlock[NDArrayFloat]:
return _FPLSBlockMultivariate(
data=data,
n_components=n_components,
label=label,
)
@_fpls_block_factory.register
def _fpls_block_factory_basis(
data: FDataBasis,
n_components: int,
label: str,
integration_weights: Union[np.ndarray, None],
regularization: Union[L2Regularization[Any], None],
weights_basis: Union[Basis, None],
) -> _FPLSBlock[FDataBasis]:
return _FPLSBlockBasis(
data=data,
n_components=n_components,
label=label,
regularization=regularization,
weights_basis=weights_basis,
)
@_fpls_block_factory.register
def _fpls_block_factory_grid(
data: FDataGrid,
n_components: int,
label: str,
integration_weights: Union[np.ndarray, None],
regularization: Union[L2Regularization[Any], None],
weights_basis: Union[Basis, None],
) -> _FPLSBlock[FDataGrid]:
return _FPLSBlockGrid(
data=data,
n_components=n_components,
label=label,
integration_weights=integration_weights,
regularization=regularization,
)
The requested changes are minor. @Ddelval, if you do not have time/will for doing them I can finish this PR myself and merge it. However, I would really appreciate if you could spare a few minutes to answer my questions, as you are way more familiar with the algorithm. |
Hi @vnmabus, I believe I should have some time towards the end of next week. I already had a look at your comments a while ago, but I didn't have the time to reply. I haven't had much time lately, but I'm hoping that'll improve a bit this week. |
This simplifies the notation and removes unnecesary parameters that are already stored in the FPLSClass
In sklearn, they set to zeros the values that are very close to zero. Then they try to perform the power method. If all columns of the Y matrix are too small, then they consider the algorihtm to have converged. This is a very similar approach, but taking into account the magnitude of X and Y to calculate the epsilons
Hi @vnmabus. Sorry for the delay. It seems like I tend to overestimate the amount of free time I have 🙃. |
Hi @vnmabus, I'm getting an error in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that I will merge this now. If I need to change things I think that I can do it by myself. Thank you for your help!
This implementation of FPLS accepts multivariate or functional data. Either X or Y can be multivariate or functional. Regularization in functional blocks is also implemented.
Regarding the regression functionality, PLSRegression also accepts functional or multivariate data as the regressor or the response. However, the functional response of the PLS regressor cannot be regularized.