Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] Fix R 3.6 tests, dask tests, compatibility with dask>=2024.3.1 #6357

Merged
merged 8 commits into from
Mar 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .ci/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -126,9 +126,9 @@ fi
# older versions of Dask are incompatible with pandas>=2.0, but not all conda packages' metadata accurately reflects that
#
# ref: https://github.com/microsoft/LightGBM/issues/6030
CONSTRAINED_DEPENDENCIES="'dask-core>=2023.5.0' 'distributed>=2023.5.0' 'pandas>=2.0'"
CONSTRAINED_DEPENDENCIES="'dask>=2023.5.0' 'distributed>=2023.5.0' 'pandas>=2.0'"
if [[ $PYTHON_VERSION == "3.7" ]]; then
CONSTRAINED_DEPENDENCIES="'dask-core' 'distributed' 'pandas<2.0'"
CONSTRAINED_DEPENDENCIES="'dask' 'distributed' 'pandas<2.0'"
fi

# including python=version[build=*cpython] to ensure that conda doesn't fall back to pypy
Expand Down Expand Up @@ -322,7 +322,7 @@ matplotlib.use\(\"Agg\"\)\
# importing the library should succeed even if all optional dependencies are not present
conda uninstall -n $CONDA_ENV --force --yes \
cffi \
dask-core \
dask \
distributed \
joblib \
matplotlib \
Expand Down
2 changes: 2 additions & 0 deletions .ci/test_r_package.sh
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,8 @@ fi

# Installing R precompiled for Mac OS 10.11 or higher
if [[ $OS_NAME == "macos" ]]; then
brew update-reset --auto-update
brew update --auto-update
if [[ $R_BUILD_TYPE == "cran" ]]; then
brew install automake || exit 1
fi
Expand Down
6 changes: 5 additions & 1 deletion .ci/test_r_package_windows.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,11 @@ if ($env:R_MAJOR_VERSION -eq "3") {
$env:R_LIB_PATH = "$env:BUILD_SOURCESDIRECTORY/RLibrary" -replace '[\\]', '/'
$env:R_LIBS = "$env:R_LIB_PATH"
$env:PATH = "$env:RTOOLS_BIN;" + "$env:RTOOLS_MINGW_BIN;" + "$env:R_LIB_PATH/R/bin/x64;"+ $env:PATH
$env:CRAN_MIRROR = "https://cran.rstudio.com"
if ([version]$env:R_VERSION -lt [version]"4.0") {
$env:CRAN_MIRROR = "https://cran-archive.r-project.org"
} else {
$env:CRAN_MIRROR = "https://cran.rstudio.com"
}
$env:MIKTEX_EXCEPTION_PATH = "$env:TEMP\miktex"

# don't fail builds for long-running examples unless they're very long.
Expand Down
3 changes: 1 addition & 2 deletions .github/workflows/r_package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -275,7 +275,6 @@ jobs:
clang-version:
- 16
- 17
- 18
runs-on: ubuntu-latest
container: rhub/debian-clang-devel
env:
Expand Down Expand Up @@ -316,7 +315,7 @@ jobs:
all-r-package-jobs-successful:
if: always()
runs-on: ubuntu-latest
needs: [test, test-r-sanitizers, test-r-debian-clang]
needs: [test, test-r-debian-clang]
steps:
- name: Note that all tests succeeded
uses: re-actors/alls-green@v1.2.2
Expand Down
12 changes: 11 additions & 1 deletion python-package/lightgbm/compat.py
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,17 @@ class _LGBMRegressorBase: # type: ignore
from dask.distributed import Client, Future, default_client, wait

DASK_INSTALLED = True
except ImportError:
# catching 'ValueError' here because of this:
# https://github.com/microsoft/LightGBM/issues/6365#issuecomment-2002330003
#
# That's potentially risky as dask does some significant import-time processing,
# like loading configuration from environment variables and files, and catching
# ValueError here might hide issues with that config-loading.
#
# But in exchange, it's less likely that 'import lightgbm' will fail for
# dask-related reasons, which is beneficial for any workloads that are using
# lightgbm but not its Dask functionality.
except (ImportError, ValueError):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, we might catch ValueError and re-raise it if the error message is not "Must install dask-expr to activate query planning". Obviously, this isn't ideal either as dask may change the message at any time...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, reading through #6365, I find it surprising that the ValueError is raised at all. If dask is a non-installed optional dependency, the ImportError should be raised first. If it is installed, it does not seem to be an issue, at least, for conda-forge -- is the pip package for dask not configured correctly? 🤔

Copy link
Collaborator

@jameslamb jameslamb Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-raise it if the error message is not "Must install dask-expr to activate query planning"

I'd rather not take on any coupling to specific error messages. I hope my PR to dask will be accepted, and then we could remove this addition of ValueError.

If dask is a non-installed optional dependency, the ImportError should be raised first

The ValueError doesn't happen at install time of dask. It happens at import time.

If you haven't yet, see the description and diff at dask/dask#11007. This ValueError comes from code that only runs when you try to import dask.dataframe.

is the pip package for dask not configured correctly?

The dask package on conda-forge lists dask-expr as a strong runtime dependency (code link).

The dask wheel on PyPI does not... but it does list dask-expr as a dependency of dask[dataframe] (code link).

Neither of those is "correct" or "incorrect"... they're just different.

DASK_INSTALLED = False

dask_array_from_delayed = None # type: ignore[assignment]
Expand Down
12 changes: 8 additions & 4 deletions tests/python_package_test/test_dask.py
Original file line number Diff line number Diff line change
Expand Up @@ -213,13 +213,17 @@ def _create_data(objective, n_samples=1_000, output="array", chunk_size=500, **k


def _r2_score(dy_true, dy_pred):
numerator = ((dy_true - dy_pred) ** 2).sum(axis=0, dtype=np.float64)
denominator = ((dy_true - dy_true.mean(axis=0)) ** 2).sum(axis=0, dtype=np.float64)
return (1 - numerator / denominator).compute()
y_true = dy_true.compute()
y_pred = dy_pred.compute()
numerator = ((y_true - y_pred) ** 2).sum(axis=0)
denominator = ((y_true - y_true.mean(axis=0)) ** 2).sum(axis=0)
return 1 - numerator / denominator


def _accuracy_score(dy_true, dy_pred):
return da.average(dy_true == dy_pred).compute()
y_true = dy_true.compute()
y_pred = dy_pred.compute()
return (y_true == y_pred).mean()


def _constant_metric(y_true, y_pred):
Expand Down
Loading