-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-package] make package installable with CRAN toolchain (fixes #2960) #3188
Conversation
542e41b
to
3d91a63
Compare
Ok, I think this was just because of the issues fixed in #3193 , I think this is ready for review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jameslamb Wow, impressive! We are so close to CRAN!
Just had a chance to give a first look at this PR. Please see some initial comments below:
From macOS logs:
I think we should check for LightGBM/.ci/test_r_package_windows.ps1 Lines 175 to 192 in 4f8c32d
|
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
I think it makes sense to port compiler version checks from our CMake configuration Lines 24 to 42 in 4f8c32d
Some parts can be borrowed from UPD: something better: https://github.com/duckmayr/gpirt/blob/6948bb0d482a23e32494b913ab911fa6d91c80da/configure.ac#L35-L42 |
yeah, i think it is okay. |
Ok, I just uploaded a binary for the R package, built against R 4.0 on Windows! It can be installed like this url <- "https://github.com/microsoft/LightGBM/releases/download/v3.0.0rc1/lightgbm-3.0.0-1-r40.zip"
download.file(
url = url
, destfile = "lightgbm.zip"
)
install.packages(
pkgs = "lightgbm.zip"
, type = "binary"
, repos = NULL
) I'll upload Mac and Linux shortly. I also will open up a PR with docs on how to make these and how to download them. Why the code above doesn't use
|
@jaredlander binaries for the R package are now available, thanks for the nudge! This is great timing, because we just put up a major release candidate this week. You can change the LGB_RELEASE <- "https://github.com/microsoft/LightGBM/releases/download/v3.0.0rc1/"
pkg_urls <- c(
"linux" = file.path(LGB_RELEASE, "lightgbm_3.0.0-1-r40-linux.tgz")
"mac" = file.path(LGB_RELEASE, "lightgbm_3.0.0-1-r40-macos.tgz")
"windows" = file.path(LGB_RELEASE, "lightgbm-3.0.0-1-r40-windows.zip")
)
download.file(
url = pkg_urls["mac"]
, destfile = "lightgbm.zip"
)
install.packages(
pkgs = "lightgbm.zip"
, type = "binary"
, repos = NULL
) |
Thanks for pulling this off. Some issues with the instructions though. First, is a simple typo. pkg_urls <- c(
"linux" = file.path(LGB_RELEASE, "lightgbm_3.0.0-1-r40-linux.tgz")
"mac" = file.path(LGB_RELEASE, "lightgbm_3.0.0-1-r40-macos.tgz")
"windows" = file.path(LGB_RELEASE, "lightgbm-3.0.0-1-r40-windows.zip")
) This needs commas ( pkg_urls <- c(
"linux" = file.path(LGB_RELEASE, "lightgbm_3.0.0-1-r40-linux.tgz"),
"mac" = file.path(LGB_RELEASE, "lightgbm_3.0.0-1-r40-macos.tgz"),
"windows" = file.path(LGB_RELEASE, "lightgbm-3.0.0-1-r40-windows.zip")
) While I installed successfully on Windows, I am having no such luck on Linux. When running install.packages(
pkgs = "lightgbm.zip"
, type = "binary"
, repos = NULL
) I get this reasonable error message Error in (function (pkgs, lib, repos = getOption("repos"), contriburl = contrib.url(repos, :
type 'binary' is not supported on this platform So I change Error: package or namespace load failed for ‘lightgbm’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/jared/consulting/talks/renv/library/R-4.0/x86_64-pc-linux-gnu/lightgbm/libs/lightgbm.so':
/lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found (required by /home/jared/consulting/talks/renv/library/R-4.0/x86_64-pc-linux-gnu/lightgbm/libs/lightgbm.so) I am running Ubuntu 18.04, and from what I've read, 2.27 is the highest you can get on 18.04, though I could be mistaken about this. Interestingly, when searching about this, I came across an Ubuntu help page relating specifically to R. Apparently there are ways to bump the version of glibc on Ubuntu 18.04, but with it being such a core library I am hesitant to make any changes to my machine. |
More bad news. Testing it on windows (due to the Linux issues) in RStudio. library(lightgbm)
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
mod_light <- lightgbm(data=credit_light, nrounds=100, obj='binary') The code above came from the documentation. It hangs my session. However, it runs when using R from the terminal, in this case Git Bash. Both are running R 4.0.2. Here's my sessioninfo::session_info()
- Session info ------------------------------------------------------------------------------------
setting value
version R version 4.0.2 (2020-06-22)
os Windows 10 x64
system x86_64, mingw32
ui RStudio
language (EN)
collate English_United States.1252
ctype English_United States.1252
tz America/New_York
date 2020-08-09
- Packages ----------------------------------------------------------------------------------------
package * version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2)
backports 1.1.8 2020-06-17 [1] CRAN (R 4.0.2)
BBmisc 1.11 2017-03-10 [1] CRAN (R 4.0.2)
checkmate 2.0.0 2020-02-06 [1] CRAN (R 4.0.2)
class 7.3-17 2020-04-26 [1] CRAN (R 4.0.2)
cli 2.0.2 2020-02-28 [1] CRAN (R 4.0.2)
clipr 0.7.0 2019-07-23 [1] CRAN (R 4.0.2)
codetools 0.2-16 2018-12-24 [1] CRAN (R 4.0.2)
colorspace 1.4-1 2019-03-18 [1] CRAN (R 4.0.2)
crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.2)
data.table 1.12.8 2019-12-09 [1] CRAN (R 4.0.0)
DBI 1.1.0 2019-12-15 [1] CRAN (R 4.0.2)
desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.2)
details 0.2.1 2020-01-12 [1] CRAN (R 4.0.2)
digest 0.6.25 2020-02-23 [1] CRAN (R 4.0.2)
doParallel 1.0.15 2019-08-02 [1] CRAN (R 4.0.2)
dplyr 1.0.0 2020-05-29 [1] CRAN (R 4.0.2)
DT 0.14 2020-06-24 [1] CRAN (R 4.0.2)
dygraphs 1.1.1.6 2018-07-11 [1] CRAN (R 4.0.2)
ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.2)
evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.2)
fansi 0.4.1 2020-01-08 [1] CRAN (R 4.0.2)
fastmatch 1.1-0 2017-01-28 [1] CRAN (R 4.0.0)
FNN 1.1.3 2019-02-15 [1] CRAN (R 4.0.2)
foreach 1.5.0 2020-03-30 [1] CRAN (R 4.0.2)
generics 0.0.2 2018-11-29 [1] CRAN (R 4.0.2)
ggplot2 3.3.2 2020-06-19 [1] CRAN (R 4.0.2)
ggthemes 4.2.0 2019-05-13 [1] CRAN (R 4.0.2)
glue 1.4.1 2020-05-13 [1] CRAN (R 4.0.2)
gower 0.2.2 2020-06-23 [1] CRAN (R 4.0.2)
gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.2)
here 0.1 2017-05-28 [1] CRAN (R 4.0.2)
hms 0.5.3 2020-01-08 [1] CRAN (R 4.0.2)
htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.2)
htmlwidgets 1.5.1 2019-10-08 [1] CRAN (R 4.0.2)
httr 1.4.1 2019-08-05 [1] CRAN (R 4.0.0)
ipred 0.9-9 2019-04-28 [1] CRAN (R 4.0.2)
iterators 1.0.12 2019-07-26 [1] CRAN (R 4.0.2)
knitr 1.29 2020-06-23 [1] CRAN (R 4.0.2)
lattice 0.20-41 2020-04-02 [1] CRAN (R 4.0.2)
lava 1.6.7 2020-03-05 [1] CRAN (R 4.0.2)
lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.2)
lubridate 1.7.9 2020-06-08 [1] CRAN (R 4.0.2)
magrittr 1.5 2014-11-22 [1] CRAN (R 4.0.2)
MASS 7.3-51.6 2020-04-26 [1] CRAN (R 4.0.2)
Matrix 1.2-18 2019-11-27 [1] CRAN (R 4.0.2)
mlr 2.17.1 2020-03-24 [1] CRAN (R 4.0.2)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.2)
nnet 7.3-14 2020-04-26 [1] CRAN (R 4.0.2)
parallelMap 1.5.0 2020-03-26 [1] CRAN (R 4.0.2)
ParamHelpers 1.14 2020-03-24 [1] CRAN (R 4.0.2)
pillar 1.4.6 2020-07-10 [1] CRAN (R 4.0.2)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.2)
plyr 1.8.6 2020-03-03 [1] CRAN (R 4.0.2)
png 0.1-7 2013-12-03 [1] CRAN (R 4.0.0)
pROC 1.16.2 2020-03-19 [1] CRAN (R 4.0.2)
prodlim 2019.11.13 2019-11-17 [1] CRAN (R 4.0.2)
purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.2)
R6 2.4.1 2019-11-12 [1] CRAN (R 4.0.2)
RANN 2.6.1 2019-01-08 [1] CRAN (R 4.0.2)
Rcpp 1.0.5 2020-07-06 [1] CRAN (R 4.0.2)
readr 1.3.1 2018-12-21 [1] CRAN (R 4.0.2)
recipes 0.1.13 2020-06-23 [1] CRAN (R 4.0.2)
rlang 0.4.7 2020-07-09 [1] CRAN (R 4.0.2)
rmarkdown 2.3 2020-06-18 [1] CRAN (R 4.0.2)
ROSE 0.0-3 2014-07-15 [1] CRAN (R 4.0.2)
rpart 4.1-15 2019-04-12 [1] CRAN (R 4.0.2)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 4.0.2)
rstudioapi 0.11 2020-02-07 [1] CRAN (R 4.0.2)
scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.2)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2)
stringi 1.4.6 2020-02-17 [1] CRAN (R 4.0.0)
stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2)
survival 3.1-12 2020-04-10 [1] CRAN (R 4.0.2)
themis 0.1.1 2020-05-17 [1] CRAN (R 4.0.2)
tibble 3.0.3 2020-07-10 [1] CRAN (R 4.0.2)
tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.2)
timeDate 3043.102 2018-02-21 [1] CRAN (R 4.0.0)
unbalanced 2.0 2015-06-26 [1] CRAN (R 4.0.2)
vctrs 0.3.2 2020-07-15 [1] CRAN (R 4.0.2)
withr 2.2.0 2020-04-20 [1] CRAN (R 4.0.2)
xfun 0.15 2020-06-21 [1] CRAN (R 4.0.2)
xml2 1.3.2 2020-04-23 [1] CRAN (R 4.0.2)
yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
yardstick 0.0.7 2020-07-13 [1] CRAN (R 4.0.2)
zoo 1.8-8 2020-05-02 [1] CRAN (R 4.0.2)
[1] C:/Users/jared/Documents/R/R-4.0.2/library |
Hi @jaredlander sorry, this is the first time I've ever built binaries myself (since you never do this when submitting to CRAN), so I don't know the gotchas. I just followed the instructions in "Writing R Extensions" as closely as I could. I can't comment on how our library might interact with I also can't comment on the library breaking in RStudio but working from a Git Bash for Windows shell, other than to say that your effective PATH is almost certainly different in RStudio than it is in that shell, and maybe conflicting versions of some library are being linked in. I did just update the names of the artifacts....two that had Is there are a reason you're opposed to installing from source? Until recently I understand that LightGBM's R package had a reputation for being difficult to configure and install, but we've done a loot of work to make source installation smoother. I just put up a source distribution on our release...exactly the package we would submit to CRAN. The only issue I know it has (that makes it not-quite-CRAN-able yet) is for 32-bit Windows (#3187 ), but I'm guessing that won't be a problem for your or most others. You can install it like this: lightgbm_source <- "https://github.com/microsoft/LightGBM/releases/download/v3.0.0rc1/lightgbm-3.0.0-1-cran.tar.gz"
remotes::install_url(lightgbm_source) Unlike the binaries (which I just made for the first time and which we do not test), this source package is rigorously tested. |
For the binaries, I'm guessing you built on the wrong machine? Did you use Now that cmake isn't needed I'll try to install from source on Ubuntu, though I'm still worried about glibc. My main issue about installing from source is that if I find it complicated then it'll most likely be even harder for other users, most of whom don't have necessarily have a lot of - terminal experience. And I like to show people tools that they can turn the key and run, so they can focus on data, not installation issues. Could you tell me more about what LightGBM does for paths? What is it looking for? Also, now that cmake is removed, perhaps it can be installed by |
I built the Windows on a Windows machine, Mac on a Mac machine, and Linux in a docker container running the
I did not use Since this is the first time I've ever done this, I just opened a PR yesterday to document the process. You can see what was done there: #3285.
No, as I mentioned in #3188 (comment), I created them manually. We've opened an issue to track the work to automate building these artifacts: #3283 .
Makes sense! That
I actually have no idea if that's an issue, sorry. My main role here is as an R maintainer and I don't have a full grasp of which things we link to dynamically vs. statically. Since this is the first time we've distributed a binary of the CRAN package, we also don't have any experience with users reporting issues on it...you're probably the first person other than me to try to use those artifacts 😬
Because we support these different installation paths, the code in |
and sorry to change the names on you again @jaredlander , but I just had to change the name of that source distribution. I forgot to add a lightgbm_source <- "https://github.com/microsoft/LightGBM/releases/download/v3.0.0rc1/lightgbm-3.0.0-1-r-cran.tar.gz"
remotes::install_url(lightgbm_source) |
Good news! From your latest comment, installing from "https://github.com/microsoft/LightGBM/releases/download/v3.0.0rc1/lightgbm-3.0.0-1-r-cran.tar.gz" using No such luck with Windows though. This is what happens * installing *source* package 'lightgbm' ...
** using staged installation
checking whether MM_PREFETCH works...yes
checking whether MM_MALLOC works...yes
** libs
Error: (converted from warning) this package has a non-empty 'configure.win' file,
so building only the main architecture
* removing 'C:/Users/jared/Documents/R/R-4.0.2/library/lightgbm'
* restoring previous 'C:/Users/jared/Documents/R/R-4.0.2/library/lightgbm'
Error: Failed to install 'unknown package' from URL:
(converted from warning) installation of package ‘C:/Users/jared/AppData/Local/Temp/Rtmp8yJ5px/file86e0277e32fe/lightgbm_3.0.0-1.tar.gz’ had non-zero exit status Normally I write all my talks on Windows, but for this talk I've been using both Windows and Linux for the extra horsepower my server provides. So I should be able to work this into the content. Would be excellent to tell people they can recreate in both OSes.
I stopped using the command line for building packages as
The |
Thanks for trying it! I'm pretty confused by both of those results...I regularly develop LightGBM in RStudio without issue, and I do not have any special customizations in my local environment. I don't have a We also test that source package on Windows for R 3.6 and 4.0, and it's passing
ugh it's confusing that this is showing up as an You've already been very patient with us, so feel free to say "I don't have time for this" at any point. But if you do have the time, could you try passing
In theory it shouldn't need it. That field is specific for linking to other R packages that are used to distribute headers of libraries. I don't think we should need a |
Gonna give it another try on Windows. Then I might have to punt to the next time I give the talk, since the conference is this week and I'm the host!
The folks behind Since I did get it working on Linux, could you point me to where I can find out about computing metrics? I can't seem to get |
I am now seeing |
sure! So for metrics, you'll probably want to pass a validation set. You can see these tests as an example:
data(agaricus.train, package = "lightgbm")
data(agaricus.test, package = "lightgbm")
train <- agaricus.train
test <- agaricus.test
metrics <- list("binary_error", "auc", "binary_logloss")
bst <- lightgbm(
data = train$data
, label = train$label
, num_leaves = 4L
, learning_rate = 1.0
, nrounds = 10L
, objective = "binary"
, metric = metrics
) If you train with You can also run See
For the last year all of the energy going into our R package has been focused on getting to CRAN. It's taking a lot of work. I wish I could point you to beautiful vignettes (#1944) or tell you we have really compelling visualizations (#1222), but we're just not there yet. |
This is what I have so far: library(lightgbm)
library(dplyr)
library(recipes)
data(bank)
bank_char <- bank %>% select(-y) %>%
purrr::map_lgl(~is.character(.x)) %>%
which()
bank_rec <- bank %>%
mutate(across(where(is.character), ~as.integer(factor(.x))) - 1) %>%
recipe(formula=y ~ .) %>%
step_mutate(y_class=factor(y)) %>%
themis::step_upsample(y_class) %>%
step_rm(y_class) %>%
prep()
bank_x <- bank_rec %>% juice(all_predictors(), composition='dgCMatrix')
bank_y <- bank_rec %>% juice(all_outcomes(), composition='dgCMatrix')
bank_train <- lgb.Dataset(data=bank_x, label=bank_y, categorical_feature=bank_char)
mod2 <- lightgbm(data=bank_train, nrounds=100, obj='binary', metric=list('AUC'))
mod2 %>% lgb.importance() %>% lgb.plot.importance()
I can see how much work you're putting in from all the back and forth in the issues. Thanks for doing that. With The way categorical features are handled is interesting. Took some digging to figure out what to do. Have you seen how I'm a big fan of Back to getting |
thanks for all the background! Yes, we still have a long way to go, including in messaging the way LightGBM works compared to those other libraries. If it doesn't get into the talk, no worries at all. We appreciate the time and attention you've given to LightGBM already! |
Thanks for all your work. The slides are at https://jaredlander.com/content/2020/08/TallestTree.html and the video will be at rstats.ai in a few weeks. |
Exactly, I would like I'd calling |
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
This pull request contains a proposal for the next step to get the LightGBM R package onto CRAN: building without CMake.
See conversation in #629 for some background.
Essentially, CRAN very particular about how source packages with C++ code are built. It enforces a lot of checks to ensure portability, and will reject packages that require any of the following:
The R package does not currently comply with CRAN's preferred build toolchain. This PR fixes that 😀
Overview
As of this PR, LightGBM's R package gains a CRAN-compliant installation toolchain using
autoconf
. From "Writing R Extensions"The details of how this is used are explained in the proposed changes to
R-package/README.md
added to this PR.Notes for Reviewers
Thanks in advance for your time and thorough reviews!