-
-
Notifications
You must be signed in to change notification settings - Fork 25.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Fixed NMF IndexError #11667
[MRG] Fixed NMF IndexError #11667
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except nitpicks.
Can you also add a bugfix entry in doc/whats_new/v0.20.rst
?
Thanks !
sklearn/decomposition/nmf.py
Outdated
@@ -305,8 +306,12 @@ def _initialize_nmf(X, n_components, init=None, eps=1e-6, | |||
check_non_negative(X, "NMF initialization") | |||
n_samples, n_features = X.shape | |||
|
|||
if init == 'nndsvd' and n_components > min(n_samples, n_features): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NNDSVD is performed also with init = 'nndsvda'
and init = 'nndsvdar'
, which also need the same constraint.
You can use if init != 'random'
to handle all three cases.
Please also update your unit test accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Thanks!
sklearn/decomposition/nmf.py
Outdated
@@ -831,7 +838,7 @@ def _fit_multiplicative_update(X, W, H, beta_loss='frobenius', | |||
|
|||
|
|||
def non_negative_factorization(X, W=None, H=None, n_components=None, | |||
init='random', update_H=True, solver='cd', | |||
init=None, update_H=True, solver='cd', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we changing this default init?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the super late reply and thanks for pointing this out. I should have explain it better in the PR notes.
I made this changes because the documentation says here
Default: 'nndsvd' if n_components <= min(n_samples, n_features),
otherwise random.
But in init
is set to 'nndsvd'
or 'random'
only if init is None
. See here
if init is None:
if n_components <= min(n_samples, n_features):
init = 'nndsvd'
else:
init = 'random'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we need to update the documentation to reflect the true default, not make a backwards-incompatible change. You are welcome to update the documentation in a separate, focused PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Let me revert that and create a new PR on that.
sklearn/decomposition/nmf.py
Outdated
@@ -305,8 +306,14 @@ def _initialize_nmf(X, n_components, init=None, eps=1e-6, | |||
check_non_negative(X, "NMF initialization") | |||
n_samples, n_features = X.shape | |||
|
|||
if (init and init != 'random' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
always use init is not None
rather than just testing init
as a bool
doc/whats_new/v0.20.rst
Outdated
@@ -597,6 +597,12 @@ Support for Python 3.3 has been officially dropped. | |||
- |Feature| A scorer based on :func:`metrics.brier_score_loss` is also | |||
available. :issue:`9521` by :user:`Hanmin Qin <qinhanmin2014>`. | |||
|
|||
- Fixed a bug in :class:`decomposition.NMF` where `init = 'nndsvd'`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in the wrong place and should be prefixed by |Fix|
@jnothman Thank you for your quick turnaround. The test is failing because Per your comment above, we want to update the documentation instead of making a backwards-incompatible change. However, not changing the code means that I am wondering what is your suggestion on this. |
I am okay to leave them inconsistent for now, or we could change the
default in `non_negative_factorization` with a deprecation process (see
http://scikit-learn.org/dev/developers/contributing.html#change-the-default-value-of-a-parameter
).
|
Tests are now failing |
doc/whats_new/v0.21.rst
Outdated
`n_components < n_features` instead of | ||
`n_components <= min(n_samples, n_features)`. | ||
:issue:`11650` by :user:`Hossein Pourbozorg <hossein-pourbozorg>` and | ||
`Zijie (ZJ) Poh <zjpoh>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use :user:
Thanks @zjpoh ! |
This reverts commit 147745e.
This reverts commit 147745e.
Reference Issues/PRs
Fixes #11650
What does this implement/fix? Explain your changes.
I changed the condition for using$A\in\mathbb{R}^{m\times n}_+$ , the condition for
nndsvd
to initialize NMF to k <= min(m,n) and raised an ValueError if k > min(m,n) andinit = 'nndsvd'
.Why?
According to the table 1 of SVD based initialization paper, for matrix
nndsvd
initialization is k < min(m, n). However, in the code,nndsvd
is used when k < n instead of k < min(m, n).Note that in the code, k = n_components, m = n_samples and n = n_features.
In addition, from my understanding of the paper, k <= min(m,n) is a sufficient condition because the initialization method is based on SVD and SVD only requires k <= min(m, n) instead of k < min(m, n). I have verified that setting k <= min(m, n) gives the correct solution and passes all unit tests.
I set
init = None
as the default parameter ofnon_negative_factorization
.Why?
non_negative_factorization
hasinit = 'random'
[here] while_initialize_nmf
setsinit = 'nndsvd'
only if it is called withinit = None
[here]. That is, the calculation ofnon_negative_factorization
with the default parameter will never useinit = 'nndsvd'
. Hence, the output ofnon_negative_factorization
andNMF
class will be different unlessinit = 'random'
is passed as a parameter to NMF.Note:
non_negative_factorization
haveinit = 'random'
as the default parameter is also inconsistent with the documentation.Any other comments?
I am aware that the changes that I made are inconsistent with the suggestions in the issue. But this is what I found out after reading the code and the paper. Please let me know if this makes sense. Thanks.