-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python-package] [ci] switch to PEP 517 / 518 builds (remove setup.py
) (fixes #5061)
#5759
Conversation
Happy to say this is finally ready for review! I understand that what I'm proposing is a major breaking change... but I don't think we have a choice. We're relying on a pattern for the Python package which the Python packaging tools ( Given how impactful this change could be, I won't merge it until there are at least two approvals from some mix of @shiyu1994 , @guolinke , and @jmoralez . Thanks in advance for your thorough review on this, take all the time you need and please ask any questions you have. I'll do my best to answer them. |
python-package/README.rst
Outdated
@@ -84,7 +84,7 @@ Build GPU Version | |||
|
|||
.. code:: sh | |||
|
|||
pip install lightgbm --install-option=--gpu | |||
pip install lightgbm --config-setting=cmake.define.USE_GPU=ON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pip exposes --config-settings
option: https://pip.pypa.io/en/stable/cli/pip_install/#cmdoption-C. Is --config-setting
coming from somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow, thank you! No you're absolutely right, this is a mistake on my part.
I was mixing two things... pip
has --config-settings
(with an s
), and build
has --config-setting
(no s
) (code link).
I was following from https://github.com/scikit-build/scikit-build-core/blob/a145985892e4c904f4e4f089f987b4fef491543d/README.md?plain=1#L112-L113 and didn't read closely enough.
Just pushed 41badff fixing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it interesting that you were able to use scikit-build-core
without adding any custom glue code. I suppose that build-python.sh
does much of the heavy-lifting of packaging the right files in the right place?
For now, XGBoost adds a custom PEP 517 glue code to marshal C++ source files, but in the future I may consider writing a bash script instead, should we decide to transition to scikit-build-core
. Thanks for the inspiration!
Thanks so much for the review help @hcho3 ! I know you're busy enough with the other projects you work on, I always really appreciate your help here.
Yep, exactly! I described my perspective on this in #5837 if you want to see a bit more detail on how I'm thinking about it. Essentially, I think any Python user of LightGBM whose needs aren't met by I'm also hoping that having a zero-glue-code setup with We've been used a similar approach with the R package in this repo for a few years, and I think it's worked well so far: https://github.com/microsoft/LightGBM/blob/master/build-cran-package.sh. |
"Operating System :: POSIX", | ||
"Operating System :: Unix", | ||
"Programming Language :: Python :: 3", | ||
"Programming Language :: Python :: 3.7", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't Python 3.6 be here given that we have requires-python = ">=3.6"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh sure, I can add that.
It's not there in setup.py
and I was just copying from that.
LightGBM/python-package/setup.py
Lines 341 to 355 in a323c75
classifiers=['Development Status :: 5 - Production/Stable', | |
'Intended Audience :: Science/Research', | |
'License :: OSI Approved :: MIT License', | |
'Natural Language :: English', | |
'Operating System :: MacOS', | |
'Operating System :: Microsoft :: Windows', | |
'Operating System :: POSIX', | |
'Operating System :: Unix', | |
'Programming Language :: Python :: 3', | |
'Programming Language :: Python :: 3.7', | |
'Programming Language :: Python :: 3.8', | |
'Programming Language :: Python :: 3.9', | |
'Programming Language :: Python :: 3.10', | |
'Topic :: Scientific/Engineering :: Artificial Intelligence', | |
'Typing :: Typed']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refer to #4891
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, thanks @guolinke for reminding me ... in #4891 we dropped support, meaning:
- dropped that trove classifier
'Programming Language :: Python :: 3.6'
from the PyPI labels - do not have any CI jobs testing Python 3.6
And we decided not to move up to requires-python = ">=3.7"
so that people could technically try to build for Python 3.6 if they needed that a bit longer (as @jmoralez also referred to in #5765 (comment)).
By the way, I tried tonight and it looks like lightgbm
is still at least minimally compatible with Python 3.6 🎉 .
docker run \
--rm \
-it python:3.6 \
bash
apt-get update
apt-get install \
--no-install-recommends \
-y \
cmake
git clone --recursive \
https://github.com/microsoft/LightGBM.git
cd LightGBM
sh ./build-python.sh install --nomp
pip install pandas
python ./examples/python-guide/simple_example.py
logs:
Loading data...
Starting training...
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.002058 seconds.
You can set `force_col_wise=true` to remove the overhead.
Training until validation scores don't improve for 5 rounds
Did not meet early stopping. Best iteration is:
[20] valid_0's l2: 0.198063 valid_0's l1: 0.431129
Saving model...
Starting predicting...
The RMSE of prediction is: 0.4450426449744025
I don't feel strongly either way about whether or not to have the Python 3.6 trove classifier. Since @StrikerRUS removed it in #4891, I think we should defer to his judgment and continue to omit it. Is that ok @jmoralez ? If, after reading this, you still feel strongly that it should be included since lightgbm
at least minimally supports Python 3.6, I'll add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's ok, I agree with not adding it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, thanks for talking through it with us! In another PR, I might put up a minimal CI job that runs exactly that dockerized little script I just shared above. So we can rely on CI to catch issues of the form "this PR would be a hard compatibility break because it's not even syntactically valid Python 3.6".
I don't want to take on any more support than that for Python 3.6, but would be useful (I think) to get automated feedback at that level at least, and I'd expect that job shouldn't require too much maintenance burden.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For those finding this conversation from search... here's the PR I promised: #5936.
Thanks again for the reviews (on this PR and the many that led up to it)!!! |
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Fixes #5061.
This PR proposes removing the project's
setup.py
, and instead usingscikit-build-core
(https://github.com/scikit-build/scikit-build-core) as a build backend.scikit-build-core
has been developed with support from an NSF grant (link) and is being developed by the same team developingscikit-build
, which is used by other popular stats and machine learning projects, like:cudf
(Enable builds with scikit-build rapidsai/cudf#10919)cuml
(Update Python build to scikit-build rapidsai/cuml#4818)opencv
(code link)vaex
(🐎🔨 large refactor of aggregator for better performance vaexio/vaex#1847)(I got that list from https://github.com/orgs/scikit-build/discussions/976)
scikit-build-core
is purpose-built for Python packages that useCMake
to compile one or more other artifacts (in our case, thelib_lightgbm
shared library).What would happen if we did nothing?
Users would have to pin to older versions of
setuptools
andpip
to be able to installlightgbm
(which will makelightgbm
harder to use alongside other packages). We've already started seeing evidence of that pain:Benefits of this approach
pip
,setuptools
, andbuild
in environments where they installlightgbm
setup.py
macosx-arm64
for the M1/M2 Macs)What other approaches could we take?
hatchling
likexgboost
has recently done (see Replace setup.py with pyproject.toml dmlc/xgboost#9021 (comment))lightgbm
packaging currently supports, at the expense of needing to still maintain significant Python glue codescikit-build
(scikit-build-core
's predecessor)scikit-build-core
as "The next generation of scikit-build," and the author of both tools referred toscikit-build
as "scikit-build classic" in https://github.com/orgs/scikit-build/discussions/815... implying thatscikit-build-core
is where most active development is goingcatboost
was doing this until very recentlymeson
,autotools
,bazel
, etc. and the corresponding options for Python build backendsnumpy
, for example, is in the process of switching tomeson-python
: BLD: switch to meson-python as the default build backend numpy/numpy#23838These are all possible routes to resolve #5061.
I'm proposing using
scikit-build-core
because it's powerful while also significantly simplifying the code that we have to manage in this project, and because it's maintained by members of PyPA who will keep it up to date with the latest changes in Python packaging (i.e. the latest accepted PEPs related to packaging).Existing features preserved with this approach
list of features (click me)
All of the follow commands are currently supported, and we should try to preserve as many of them as possible.
{command}
ispython setup.py
right now{dist}
is either the namelightgbm
(pull from PyPI) or a path to a*.tar.gz
or*.whl
Installing from source tree.
Building an sdist.
Building wheels.
{command} bdist_wheel --plat-name=macosx --python-tag py3 {command} bdist_wheel --integrated-opencl --plat-name=$PLATFORM --python-tag py3 {command} bdist_wheel --integrated-opencl --plat-name=win-amd64 --python-tag py3 {command} bdist_wheel --gpu {command} bdist_wheel --cuda {command} bdist_wheel --mpi
Customized installation of an sdist (possibly from PyPI).
Existing features lost with this approach
MSBuild
to compile withpip install {sdist}
, then multiple versions of MSVC (link)scikit-build
will check for multiple versions of MSVC: https://github.com/scikit-build/scikit-build/blob/master/docs/generators.rstMSBuild
will be to clone the repo and invokeMSBuild.exe
directlypip install {sdist}
(link)sh build-python.sh install --bit32