-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-package] require lgb.Dataset, remove support for passing 'colnames' and 'categorical_feature' for lgb.train() and lgb.cv() #6714
Conversation
…ing 'feature_name' and 'categorical_feature' to lgb.train() and lgb.cv()
Setting up a CI job to run reverse dependency checks is taking too long (#6734), let's not block this on that. Tomorrow, I will just test all the reverse dependencies manually. I won't merge this until I've done that and posted a comment here summarizing that. But opening this up now for review... @StrikerRUS or @jmoralez could you review? This + #5010 are the last 2 things I'd like to try to get into v4.6.0, and as a reminder we only have 5 more days to put up a v4.6.0 release on CRAN (#6791). Neither of those is critical, so if you do not have time to review please just let me know and we can skip them for this release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you for removing deprecated args!
Reverse dependency checksListing out the reverse dependencies (from https://cran.r-project.org/web/packages/lightgbm/index.html). Setup details
Downloaded all the source tarballs of the latest releases from CRAN. get-reverse-deps.R (click me)loadNamespace("crandep")
PKG_DIR <- "~/repos/LightGBM/delete-me"
.log <- function(msg) {
cat(sprintf("[download-revdeps] %s\n", msg))
}
# get all of lightgbm's reverse dependencies
depDF <- crandep::get_dep(
name = "lightgbm"
, type = "all"
, reverse = TRUE
)
reverse_deps <- depDF[["to"]]
.log(sprintf("found %i reverse deps", length(reverse_deps)))
# skip some dependencies with known issues:
#
# * 'misspi' (https://github.com/microsoft/LightGBM/issues/6741)
# * 'qeML' (checks take 45+ minutes to run)
# deps_to_skip <- c(
# "misspi"
# , "qeML"
# )
# .log(sprintf("excluding %i reverse deps: %s", length(deps_to_skip), toString(deps_to_skip)))
# reverse_deps <- reverse_deps[!reverse_deps %in% deps_to_skip]
.log(sprintf("checking the following packages: %s", toString(reverse_deps)))
# get the superset of all their dependencies
# (all of the packages needed to run 'R CMD check' on lightgbm's reverse deps)
their_deps <- unlist(
unname(
sapply(
X = reverse_deps
, FUN = function(pkg) {
return(
crandep::get_dep(
name = pkg
, type = "all"
, reverse = FALSE
)[["to"]]
)
}
)
)
)
all_deps <- sort(unique(c(their_deps, reverse_deps)))
# don't try to install 'lightgbm', or packages that ship with the R standard library
all_deps <- all_deps[!all_deps %in% c("grid", "methods", "lightgbm", "parallel", "stats", "utils")]
# https://catboost.ai/docs/en/installation/r-installation-binary-installation
remotes::install_url(
"https://github.com/catboost/catboost/releases/download/v1.2.7/catboost-R-darwin-universal2-1.2.7.tgz"
, INSTALL_opts = c("--no-multiarch", "--no-staged-install", "--no-test-load")
)
# mixOmics moved to Bioconductor
# ref: https://www.bioconductor.org/packages/release/bioc/html/mixOmics.html
install.packages("BiocManager")
# this installs about 50 million other packages, ugh
bioconductor_deps <- "mixOmics"
BiocManager::install(
bioconductor_deps
, lib = .libPaths()[1]
, update = TRUE
, ask = FALSE
)
.log(sprintf("CRAN packages required to run these checks: %i", length(all_deps)))
.log(sprintf("Biocondutor packages required to run these checks: %i", length(bioconductor_deps)))
.log("installing all packages required to check reverse dependencies")
# install only the strong dependencies of all those packages
install.packages( # nolint: undesirable_function
pkgs = all_deps
, repos = "https://cran.r-project.org"
, dependencies = c("Depends", "Imports", "LinkingTo")
, type = "both"
, Ncpus = parallel::detectCores()
)
# get source tarballs, to be checked with 'R CMD check'
print(sprintf("--- downloading reverse dependencies to check (%i)", length(reverse_deps)))
download.packages(
pkgs = reverse_deps
, destdir = PKG_DIR
, repos = "https://cran.r-project.org"
, type = "source"
) Ran run-checks.sh (click me)#!/bin/bash
# Fix issues like the following that can show up running 'R CMD check' on
# specific clang versions:
#
# * checking for detritus in the temp directory ... NOTE
# Found the following files/directories:
# ‘dsymutil-63923a’ ‘dsymutil-9aa721’ ‘dsymutil-b7e1bb’
#
# These are unlikely to show up in CRAN's checks. They come from
# 'dsymutil ---gen-reproducer' being run (not something LightGBM explicitly does).
#
# ref:
# - https://github.com/llvm/llvm-project/issues/61920
# - https://github.com/golang/go/issues/59026#issuecomment-1520487072
export DSYMUTIL_REPRODUCER_PATH=/dev/null
# parallelize compilation (extra important for Linux, where CRAN doesn't supply pre-compiled binaries)
export MAKEFLAGS="-j4"
# hack to get around this:
# https://stat.ethz.ch/pipermail/r-package-devel/2020q3/005930.html
export _R_CHECK_SYSTEM_CLOCK_=0
# ignore R CMD CHECK NOTE checking how long it has
# been since the last submission
export _R_CHECK_CRAN_INCOMING_REMOTE_=0
# to skip errors like this in {sae.projection}:
# Error in .check_ncores(length(names)) : 8 simultaneous processes spawned
export _R_CHECK_LIMIT_CORES_=false
# CRAN ignores the "installed size is too large" NOTE
export _R_CHECK_PKG_SIZES_THRESHOLD_=100
# ensure that the locally-installed version of 'tidy' is used
# ref: https://cran.r-project.org/doc/manuals/R-exts.html#Checking-packages
export R_TIDYCMD='/usr/local/bin/tidy'
Rscript -e "remove.packages('lightgbm')"
sh build-cran-package.sh --no-build-vignettes
R CMD INSTALL --with-keep.source ./lightgbm_*.tar.gz
pushd ./delete-me
rm -rf *.Rcheck
# install Java and tell R about it
# brew install openjdk
# R CMD javareconf
# export JAVA_HOME=/opt/homebrew/Cellar/openjdk/23.0.2/libexec/openjdk.jdk/Contents/Home
# Imports
# NOTE: cbl has 0 tests (https://github.com/cran/cbl)
R CMD check --no-manual --run-donttest ./cbl_*.tar.gz || true
R CMD check --no-manual --run-donttest ./misspi_*.tar.gz || true
R CMD check --no-manual --run-donttest ./predhy_*.tar.gz || true
R CMD check --no-manual --run-donttest ./predhy.GUI_*.tar.gz || true
R CMD check --no-manual --run-donttest ./sae.projection_*.tar.gz || true
R CMD check --no-manual --run-donttest ./ubair_*.tar.gz || true
# Suggests
R CMD check --no-manual --run-donttest ./bonsai_*.tar.gz || true
R CMD check --no-manual --run-donttest ./EIX_*.tar.gz || true
R CMD check --no-manual --run-donttest ./fastml_*.tar.gz || true
R CMD check --no-manual --run-donttest ./mllrnrs_*.tar.gz || true
R CMD check --no-manual --run-donttest ./PatientLevelPrediction_*.tar.gz || true
R CMD check --no-manual --run-donttest ./qeML_*.tar.gz || true
R CMD check --no-manual --run-donttest ./r2pmml_*.tar.gz || true
R CMD check --no-manual --run-donttest ./SHAPforxgboost_*.tar.gz || true
R CMD check --no-manual --run-donttest ./stackgbm_*.tar.gz || true
R CMD check --no-manual --run-donttest ./treeshap_*.tar.gz || true
# Enhances
R CMD check --no-manual --run-donttest ./fastshap_*.tar.gz || true
R CMD check --no-manual --run-donttest ./shapviz_*.tar.gz || true
R CMD check --no-manual --run-donttest ./vip_*.tar.gz || true Those scripts are a little rough, will clean them up in #6734. Reverse Imports✅ cbl - OK (click me)
❌ misspi - 1 ERROR (click me)
This is reproducible with ✅ predhy - OK (click me)
✅ predhy.GUI - OK (click me)
✅ sae.projection - OK (click me)
✅ ubair - OK (click me)
Reverse Suggests✅ bonsai - OK (click me)
✅ EIX - OK (click me)
✅ fastml - OK (click me)
✅ mllrnrs - OK (click me)
✅ PatientLevelPrediction - OK (click me)
✅ qeML - OK (click me)
✅ r2pmml - OK (click me)
✅ SHAPforxgboost - OK (click me)
✅ stackgbm - OK (click me)
✅ treeshap - OK (click me)
Reverse Enhances✅ fastshap - OK (click me)
✅ shapviz - OK (click me)
✅ vip - OK (click me)
|
I like these changes,thanks for working on this. |
Looks like a new version of Leading to the
And this:
The for those is very non-controversial and small, so because we're getting close to release time, I just pushed them directly here. |
Also posting because edits wouldn't have generated a notification.... I've finished reverse dependency checks. Details here: #6714 (comment) Happy to say that I don't think v4.6.0 will cause any problems for projects on CRAN depending on |
Going to merge this on the stale approval to keep the release moving, and because the small set of linter-related changes I made should be non-controversial. #6796 will need to be approved anyway, so @jmoralez @StrikerRUS if you have any follow-up comments please make them there and we can incorporate them before releasing. Thanks! |
Fixes #6435
Enforces deprecations that have been in
{lightgbm}
sinvce v4.4.0:lgb.Dataset
todata
argument oflgb.cv()
colnames
andcategorical_feature
arguments fromlgb.train()
andlgb.cv()
Also fixes some new warnings from
{lintr}
3.2.0: #6714 (comment)Notes for Reviewers
Still need to do reverse dependency checks
Before this is merged, we should check that it won't break any projects on CRAN that depend on{lightgbm}
. I might propose introducing a comment-triggered CI job we can use for this purpose, maybe with https://github.com/r-lib/revdepcheck.Looks like no reverse dependencies would be broken by the v4.6.0 release!
#6714 (comment)
xgboost
did something similar recentlyAs @mayer79 mentioned on #6435. See dmlc/xgboost#10031