skglm
is a library that provides better sparse generalized linear models for scikit-learn.
Its main features are:
- speed: problems with millions of features can be solved in seconds. Default solvers rely on efficient coordinate descent with Numba just in time compilation.
- flexibility: virtually any combination of datafit and penalty can be implemented in a few lines of code.
- scikit-learn API: all estimators are drop-in replacements for scikit-learn.
- scope: support for many missing models in scikit-learn - weighted Lasso, arbitrary group penalties, non-convex sparse penalties, etc.
Currently, the package handles any combination of the following datafits:
- quadratic
- logistic loss
- multitask quadratic
and the following penalties:
- L1 norm
- weighted L1 norm
- L1 + L2 squared norm (elastic net)
- MCP
- L05 and L2/3 penalties
The estimators follow the scikit-learn API, come with automated parallel cross-validation, and support both sparse and dense data.
Please visit https://contrib.scikit-learn.org/skglm/ for the latest version of the documentation.
First clone the repository available at https://github.com/scikit-learn-contrib/skglm:
$ git clone https://github.com/scikit-learn-contrib/skglm.git $ cd skglm/
Then, install the package with:
$ pip install -e .
To check if everything worked fine, you can do:
$ python -c 'import skglm'
and it should not give any error message.
In the example section of the documentation, you will find numerous examples on real-life datasets, timing comparison with other estimators, easy and fast ways to perform cross-validation, etc.
All dependencies are specified in the setup.py
file.
They are installed automatically when pip install -e .
is run.
If you use this code, please cite
@online{skglm,
title={Beyond L1: Faster and Better Sparse Models with skglm},
author={Q. Bertrand and Q. Klopfenstein and P.-A. Bannier and G. Gidel and M. Massias},
year={2022},
url={https://arxiv.org/abs/2204.07826}
}
ArXiv links: