Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Fixed NMF IndexError #11667

Merged
merged 17 commits into from
Feb 12, 2019
Merged

[MRG] Fixed NMF IndexError #11667

merged 17 commits into from
Feb 12, 2019

Conversation

zjpoh
Copy link
Contributor

@zjpoh zjpoh commented Jul 24, 2018

Reference Issues/PRs

Fixes #11650

What does this implement/fix? Explain your changes.

  1. I changed the condition for using nndsvd to initialize NMF to k <= min(m,n) and raised an ValueError if k > min(m,n) and init = 'nndsvd'.
    Why?
    According to the table 1 of SVD based initialization paper, for matrix $A\in\mathbb{R}^{m\times n}_+$, the condition for nndsvd initialization is k < min(m, n). However, in the code, nndsvd is used when k < n instead of k < min(m, n).
    Note that in the code, k = n_components, m = n_samples and n = n_features.
    In addition, from my understanding of the paper, k <= min(m,n) is a sufficient condition because the initialization method is based on SVD and SVD only requires k <= min(m, n) instead of k < min(m, n). I have verified that setting k <= min(m, n) gives the correct solution and passes all unit tests.

  2. I set init = None as the default parameter of non_negative_factorization.
    Why?
    non_negative_factorization has init = 'random' [here] while _initialize_nmf sets init = 'nndsvd' only if it is called withinit = None [here]. That is, the calculation of non_negative_factorization with the default parameter will never use init = 'nndsvd'. Hence, the output of non_negative_factorization and NMF class will be different unless init = 'random' is passed as a parameter to NMF.
    Note: non_negative_factorization have init = 'random' as the default parameter is also inconsistent with the documentation.

Any other comments?

I am aware that the changes that I made are inconsistent with the suggestions in the issue. But this is what I found out after reading the code and the paper. Please let me know if this makes sense. Thanks.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test

@zjpoh zjpoh changed the title [MRG] Fixed NMF IndexError [WIP] Fixed NMF IndexError Jul 24, 2018
@zjpoh zjpoh changed the title [WIP] Fixed NMF IndexError [MRG] Fixed NMF IndexError Jul 25, 2018
Copy link
Member

@TomDLT TomDLT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except nitpicks.

Can you also add a bugfix entry in doc/whats_new/v0.20.rst?

Thanks !

@@ -305,8 +306,12 @@ def _initialize_nmf(X, n_components, init=None, eps=1e-6,
check_non_negative(X, "NMF initialization")
n_samples, n_features = X.shape

if init == 'nndsvd' and n_components > min(n_samples, n_features):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NNDSVD is performed also with init = 'nndsvda' and init = 'nndsvdar', which also need the same constraint.
You can use if init != 'random' to handle all three cases.
Please also update your unit test accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Thanks!

@@ -831,7 +838,7 @@ def _fit_multiplicative_update(X, W, H, beta_loss='frobenius',


def non_negative_factorization(X, W=None, H=None, n_components=None,
init='random', update_H=True, solver='cd',
init=None, update_H=True, solver='cd',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we changing this default init?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the super late reply and thanks for pointing this out. I should have explain it better in the PR notes.

I made this changes because the documentation says here

        Default: 'nndsvd' if n_components <= min(n_samples, n_features),
            otherwise random.

But in init is set to 'nndsvd' or 'random' only if init is None. See here

    if init is None:
        if n_components <= min(n_samples, n_features):
            init = 'nndsvd'
        else:
            init = 'random'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we need to update the documentation to reflect the true default, not make a backwards-incompatible change. You are welcome to update the documentation in a separate, focused PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Let me revert that and create a new PR on that.

@@ -305,8 +306,14 @@ def _initialize_nmf(X, n_components, init=None, eps=1e-6,
check_non_negative(X, "NMF initialization")
n_samples, n_features = X.shape

if (init and init != 'random'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always use init is not None rather than just testing init as a bool

@@ -597,6 +597,12 @@ Support for Python 3.3 has been officially dropped.
- |Feature| A scorer based on :func:`metrics.brier_score_loss` is also
available. :issue:`9521` by :user:`Hanmin Qin <qinhanmin2014>`.

- Fixed a bug in :class:`decomposition.NMF` where `init = 'nndsvd'`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in the wrong place and should be prefixed by |Fix|

@zjpoh
Copy link
Contributor Author

zjpoh commented Sep 14, 2018

@jnothman Thank you for your quick turnaround.

The test is failing because NMF and non_negative_factorization have different default init. NMF has init=None while non_negative_factorization has init='random'.

Per your comment above, we want to update the documentation instead of making a backwards-incompatible change. However, not changing the code means that NMF and non_negative_factorization are inconsistent.

I am wondering what is your suggestion on this.

@jnothman
Copy link
Member

jnothman commented Sep 15, 2018 via email

@TomDLT TomDLT added the Bug label Sep 24, 2018
@jnothman
Copy link
Member

Tests are now failing

@zjpoh zjpoh changed the title [MRG] Fixed NMF IndexError [WIP] Fixed NMF IndexError Jan 23, 2019
@zjpoh zjpoh changed the title [WIP] Fixed NMF IndexError [MRG] Fixed NMF IndexError Jan 23, 2019
`n_components < n_features` instead of
`n_components <= min(n_samples, n_features)`.
:issue:`11650` by :user:`Hossein Pourbozorg <hossein-pourbozorg>` and
`Zijie (ZJ) Poh <zjpoh>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use :user:

@jnothman jnothman merged commit 42073c2 into scikit-learn:master Feb 12, 2019
@jnothman
Copy link
Member

Thanks @zjpoh !

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants