-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ci] raise floors on CI dependencies #6375
Merged
Merged
Changes from all commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
8364031
[ci] raise floors on CI dependencies
jameslamb b9a45e1
fix matplotlib-base uninstall, relax pyarrow pin
jameslamb 673b324
Merge branch 'master' into ci/conda-floors
jameslamb 5a301ec
Merge branch 'master' into ci/conda-floors
jameslamb 4ae15d6
looser pyarrow floor
jameslamb 73fd870
Merge branch 'master' of github.com:microsoft/LightGBM into ci/conda-…
jameslamb a4828ed
Merge branch 'ci/conda-floors' of github.com:microsoft/LightGBM into …
jameslamb 5b4685f
even looser pyarrow
jameslamb f6a8d99
add environment files
jameslamb 5828e0d
missed backtick
jameslamb 30909e9
more powershell
jameslamb 620daaa
how does powershell work
jameslamb 2d45f71
remove channel specifier
jameslamb 9d1da9a
add README
jameslamb 6898502
Merge branch 'master' into ci/conda-floors
jameslamb 730247d
use bash task type
jameslamb ff7fb3b
filePath not fileType
jameslamb 95fff25
Revert "filePath not fileType"
jameslamb bfcf254
Revert "use bash task type"
jameslamb 7e8b0c4
Merge branch 'master' into ci/conda-floors
jameslamb fc9679c
older wheel, less strict Python 3.7
jameslamb File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# conda-envs | ||
|
||
This directory contains files used to create `conda` environments for development | ||
and testing of LightGBM. | ||
|
||
The `.txt` files here are intended to be used with `conda create --file`. | ||
|
||
For details on that, see the `conda` docs: | ||
|
||
* `conda create` docs ([link](https://conda.io/projects/conda/en/latest/commands/create.html)) | ||
* "Managing Environments" ([link](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
# [description] | ||
# | ||
# Similar to ci-core.txt, but specific to Python 3.7. | ||
# | ||
# Unlike ci-core.txt, this includes a Python version and uses | ||
# `=` and `<=` pins to make solves faster and prevent against | ||
# issues like https://github.com/microsoft/LightGBM/pull/6370. | ||
# | ||
# [usage] | ||
# | ||
# conda create \ | ||
# --name test-env \ | ||
# --file ./.ci/conda-envs/ci-core-py37.txt | ||
# | ||
|
||
# python | ||
python=3.7.* | ||
|
||
# direct imports | ||
cffi=1.15.* | ||
# older versions of Dask are incompatible with pandas>=2.0, but not all conda packages' metadata accurately reflects that | ||
# | ||
# ref: https://github.com/microsoft/LightGBM/issues/6030 | ||
dask=2022.2.* | ||
distributed=2022.2.* | ||
joblib=1.3.* | ||
matplotlib-base=3.5.* | ||
numpy=1.21.* | ||
pandas=1.3.* | ||
pyarrow=9.0.* | ||
# python-graphviz 0.20.2 is not compatible with Python 3.7 | ||
# ref: https://github.com/microsoft/LightGBM/pull/6370 | ||
python-graphviz=0.20.1 | ||
scikit-learn=1.0.* | ||
scipy=1.7.* | ||
|
||
# testing-only dependencies | ||
cloudpickle=2.2.* | ||
pluggy=1.0.* | ||
psutil=5.9.3 | ||
pytest=7.4.* | ||
|
||
# other recursive dependencies, just | ||
# pinned here to help speed up solves | ||
bokeh=2.4.* | ||
fsspec=2023.1.* | ||
msgpack-python=1.0.* | ||
pluggy=1.0.* | ||
pytz=2024.1 | ||
setuptools=59.8.* | ||
snappy=1.1.* | ||
tomli=2.0.* | ||
tornado=6.1.* | ||
wheel=0.42.* | ||
zict=2.2.* | ||
zipp=3.15.* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
# [description] | ||
# | ||
# Core dependencies used across most LightGBM continuous integration (CI) jobs. | ||
# | ||
# 'python' constraint is intentionally omitted, so this file can be reused across | ||
# Python versions. | ||
# | ||
# These floors are not the oldest versions LightGBM supports... they're here just to make conda | ||
# solves faster, and should generally be the latest versions that work for all CI jobs using this. | ||
# | ||
# [usage] | ||
# | ||
# conda create \ | ||
# --name test-env \ | ||
# --file ./.ci/conda-envs/ci-core.txt \ | ||
# python=3.10 | ||
# | ||
|
||
# direct imports | ||
cffi>=1.16 | ||
dask>=2023.5.0 | ||
joblib>=1.3.2 | ||
matplotlib-base>=3.7.3 | ||
numpy>=1.24.4 | ||
pandas>2.0 | ||
pyarrow>=6.0 | ||
python-graphviz>=0.20.3 | ||
scikit-learn>=1.3.2 | ||
scipy>=1.1 | ||
|
||
# testing-only dependencies | ||
cloudpickle>=3.0.0 | ||
psutil>=5.9.8 | ||
pytest>=8.1.1 | ||
|
||
# other recursive dependencies, just | ||
# pinned here to help speed up solves | ||
pluggy>=1.4.0 | ||
setuptools>=69.2 | ||
wheel>=0.43 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really convinced why a
.txt
file would be more appropriate than a.yml
file here 🤔 I've been happily using.yml
files for environment creation for years although I've been usingmicromamba
instead ofconda
.While I think that the former would be more fitting for use in CI in general, does anything prevent us from using
conda env create -f <file>.yml
? I'm almost certain that this works.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've also used
micromamba install
on the base env(micromamba doesn't have a base env so we could keep the current name) which allows you to specify the python version and a yaml file for the dependencies (example)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I described this in "Why are these .txt files and not conda .yaml files?" in the PR description. I'll add more details here.
I wanted to be able to pass in a Python constraint from the command line, to avoid these alternatives I could think of for constraining Python version:
python=${PYTHON_VERSION}
in anenv.yml
file and needing to involve some templater likeenvsubst
(docs) to use thempython=3.10.*
As of the latest
conda
(v24.3.0),conda env create
does not support mixing files and command-line constraints.Given a file
env.yaml
with these contents:This
Yields this
conda env create
can read files like that... but you can't pass additional constraints from the command line.I looked some more into this tonight and found this relevant issue with discussion about changes to these APIs in
conda
:env.yaml
files (was closed with no changes toconda
)environment.yaml
v2 conda/conda#11341 = meta-issue covering various changes to environment creation from YAML filesenvironment.yml
configuration with command like arguments conda/conda#9506 = feature request from 2019 that was closed with the suggestion to do exactly what I'm proposing in this PR ...conda create
with a.txt
fileOh cool! I didn't realize
micromamba
allowed for that. That behavior looks like exactly what I'm trying to get here.however ... I don't support switching to
micromamba
here in LightGBM's CI. It's described as a "reimplementation" ofconda
/mamba
in its docs (docs link). Part of the motivation of moving to environment files in this PR is to help make the development of LightGBM easier. I'd really like to avoid having that experience start with "install this specific otherconda
alternative". I also really would not be excited about the prospect of rebuilding the images from https://github.com/guolinke/lightgbm-ci-docker to includemicromamba
.Especially just for the benefit of "use .yml files because .txt files seem weird".
@jmoralez @borchero can you think of any other functional reasons that what I've proposed in this PR (using
conda create
+ .txt files +python=${PYTHON_VERSION}
from the command line) is problematic?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just don't think txt files are widely adopted in the conda commands. For example, suppose someone already has an environment with LightGBM in it and just wants to add the dependencies to develop, they could just use
conda/mamba env update -f env.yaml
.conda env update
also allows passing multiple files, so if we end up having more than one file they could be passed in that single command.The two step creation (python version first and dependencies second) isn't really problematic (I think that's what most people do) and I don't think it slows down the second resolution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my experience, it absolutely does speed up the total time-to-environment-ready to do everything in one step instead of creating an environment and then updating it: #5743.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's reasonable, we're extensively using this mechanism at our company 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
With the constraints "create the environment with all dependencies in a single command" and "keep using
conda
/mamba
", there are 2 approaches we've identified:conda create --file .ci/ci-core.txt python=${PYTHON_VERSION}
conda env create --file ./.ci/conda-envs/ci-core-py${PYTHON_VERSION}.yml
If I'm reading this correctly, @jmoralez prefers Option 1 and you prefer Option 2.
I also slightly prefer Option 1, at least right now where all not-Python-3.7 environments can share the same dependency constraints.
If we were to have more Python-version-specific lists of dependencies, I think I'd prefer Option 2 more.
@borchero is it alright with you if we move forward with the Option 1 approach right now and see how it goes? Since this is just affecting CI and commands that we document in a README, I think it should be low-risk to be wrong and end up changing this pattern in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, fine for me, please move ahead with option 1 to improve the CI setup :)
On a slightly related note, this PR prompted me to think about the use of lock files... this would ensure entirely reproducible environments (i.e. fewer CI failures) and would essentially fully alleviate the need for solving in CI jobs. Potentially something for a future PR... 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thanks very much! I'll get this PR building and and merge it on @jmoralez 's existing approval of the current approach.
I think we're almost immediately gonna be talking about this again when we decide how far to go in "dropping" Python 3.8 support like discussed further up in this PR.
I'd support using lock files (or just generally
==
types constraints) for the jobs using Python versions that are using end-of-life Python versions or operating systems.But for the others, I think it's valuable to let dependencies float when we can. That means that LightGBM is continually tested against new releases of all its dependencies, which more evenly spreads out the work of reacting to breaking changes over time, and reduces the total amount of effort required.
For example, I'm thinking about cases like this:
Because CI broke here shortly after a new
dask
release came out, there was a fairly small changeset indask
to investigate to try to find the root cause. If instead we use==
pins here and then only hit that error say a month or 2 months later, the debugging effort would have been much higher. And we would have missed the opportunity to get in a fix like dask/dask#11007 quickly to limit how many versions of the dependency exhibit the behavior that's problematic forlightgbm
.Letting things float also more closely makes CI match the real-world experience of someone starting a new project and running
pip install lightgbm scikit-learn numpy
or something, where they'd get the latest versions.Since this project publishes a Python library used in a lot of different contexts, not an application that's distributed as a binary or container image, I think it's valuable to continue that practice of letting the dependencies somewhat float + regularly bumping up the floors to continue getting fast environment solves.
All that said though... if you disagree or see some other opportunities to improve the way we get dependencies in CI, I definitely encourage you to write up a proposal in an issue! We have a loosely-enforced convention of doing those in issues tagged
[RFC]
("request for comment"). You can see some at https://github.com/microsoft/LightGBM/issues?q=%22%5BRFC%5D%22+is%3Aissue+is%3Aclosed.Thanks to both of you for talking through all this with me, it's good that we keep challenging the current state of how things are set up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I didn't know about the
[RFC]
-type issues yet, I'll think about it a bit more (taking into account your comment, thanks for the elaboration ;) and might come back to this then 😄