Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sklearn using sparse data representation #27

Open
szilard opened this issue Nov 6, 2015 · 2 comments
Open

sklearn using sparse data representation #27

szilard opened this issue Nov 6, 2015 · 2 comments

Comments

@szilard
Copy link
Owner

szilard commented Nov 6, 2015

I know from @glouppe that "RFs in sklearn now support sparse matrices too"
https://twitter.com/glouppe/status/660012865554903040

It would be interesting to see the results with sparse for RF and for logistic regression too. We should see lower memory footprint and perhaps faster runs. Anyone wants to help w the code (PR)?

@ghost
Copy link

ghost commented May 5, 2016

Good guess but maybe cruel reality, sparse matrices can reduce a lot of memory using, but No significant speedup... sklearn depends on scipy, if wanna try:
in 2-rf/2.py, using http://docs.scipy.org/doc/scipy/reference/sparse.html instead of pandas to create the the training matrix.

@szilard
Copy link
Owner Author

szilard commented May 5, 2016

Yeah, scipy's sparse is what I was thinking/hoping someone can take a look. You could try this simplified setup https://github.com/szilard/benchm-ml/tree/master/z-other-tools with the initial python code here https://github.com/szilard/benchm-ml/blob/master/z-other-tools/2.py You could time this and also sparse and submit results here/PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant