-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[python-package] Allow to pass Arrow table with boolean columns to dataset #6353
Conversation
@jameslamb can you help out with the failing R CI jobs? 👀 seems to be unrelated to this PR |
Looks to me like RStudio has stopped hosting older R releases: https://cran.rstudio.com/bin/windows/base/old/ Could you put up a separate PR to fix the Windows R 3.6 jobs? Should hopefully be as simple as using this URL instead:
Also can you please create a branch here in LightGBM, instead of from your fork? That'd make it a little easier for me to push to it if necessary, and you should have permission to create branches here now. |
Will do!
Ah yes, sure, will do from now on! I hadn't realized that I can do this now 😄 |
Depends on #6357 |
This should be ready for review. Can someone of you @guolinke @shiyu1994 have a look? 🙏🏼 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The C++ part looks good to me!
@jameslamb can you have a look at the Python changes? 😄 I'd then merge after CI only shows known failures 🥴 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you have a look at the Python changes? 😄 I'd then merge after CI only shows known failures 🥴
Python changes and tests all look great to me, thanks!
But I think we should continue the practice of not admin-merging PRs when CI is failing. I don't want us to get in the habit of ignoring when CI fails.
In the cases where it's taking too long to resolve a difficult CI issue, I'd prefer that we make explicit decisions to temporarily turn off or make optional failing jobs, like was done in #6357.
I hope that all the known issues are now not blocking merge any more, and that if CI fails here it'll be only because of this PR's changes.
Absolutely! I wasn't intending to bypass any required status checks but the |
Absolutely! I made it non-required in #6357 with the intent that it wouldn't block PRs. Sounds like we're thinking about it the same way 🤝 |
Thanks very much! |
Motivation
Currently, LightGBM's interface only supports integer and floating point types in the columns of Arrow tables. As a result, columns that are represented as booleans in Arrow cannot be passed to LightGBM without converting them to
uint8
and increasing the column's memory consumption eightfold (as Arrow bit-packs booleans).The pandas interface already supports passing boolean columns to LightGBM although -- due to the way pandas handles null values -- these columns must be non-nullable. Hence, there is no reason not to include support for boolean columns in Arrow.
Changes