-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multiple train data on single machine #3900
Conversation
13d6629
to
ee5164d
Compare
@shiyu1994 can you help to review this PR? |
auto interval_time = std::chrono::duration<double, std::milli>(now - prev_time) * 1e-3; | ||
Log::Info("Read %.1f GBs in %.1f seconds (from %s), last interval speed %.1f MB/s", | ||
1.0 * (*bytes_read) / kGbs, total_time, | ||
1.0 * read_progress_interval_bytes_ / kGbs * 1024.0 / interval_time.count(), filename); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exchange the last two arguments in Log::Info?
|
||
void SetHeader(const char* filename); | ||
void SetHeader(const char* filenames); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change filenames
back to filename
? Since only the first file is used for SetHeader
.
std::vector<std::string> SampleTextDataFromFile(const char* filename, const Metadata& metadata, int rank, int num_machines, int* num_global_data, std::vector<data_size_t>* used_data_indices); | ||
std::vector<std::string> SampleTextDataFromFile(const std::vector<const char*>& filenames, const Metadata& metadata, int rank, int num_machines, int* num_global_data, std::vector<data_size_t>* used_data_indices); | ||
|
||
std::vector<std::string> SampleTextDataFromFile(const char* filename, const Metadata& metadata, int rank, int num_machines, int* num_global_data, std::vector<data_size_t>* used_data_indices) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this function (SampleTextDataFromFile
with single filename
) is now unused. Maybe we can remove this.
@shiyu1994 Thanks for taking time to review this PR. As I said this is a quick and dirty hack. As I ignored many use cases, I guess this PR is far from being acceptable to be merged. If that's the case, I'd rather close this PR instead of continue working on this feature. I'm now adding new features for The core idea is to support
|
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
This is a quick and dirty hack which I implemented a month ago.
As there are requests #2031 #2638 for this feature, I'd like to send a PR and see if you are interested in adding this feature.
I've tested correctness by running on higgs dataset with similary config from boosting_tree_benchmarks. With
deterministic=true
and compared the output model file and prediction result.