Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Did every trail need to load data? #3294

Closed
liouxy opened this issue Jan 12, 2021 · 3 comments
Closed

Did every trail need to load data? #3294

liouxy opened this issue Jan 12, 2021 · 3 comments
Assignees

Comments

@liouxy
Copy link

liouxy commented Jan 12, 2021

train_data, val_data, X_test, y_test = load_data()
default_params = {'min_data_in_leaf': 0, 'min_sum_hessian_in_leaf': 100}
received_params = nni.get_next_parameter()
default_params.update(received_params)
run(train_data, val_data, default_params, X_test, y_test)

I found that every trail will load data, which is time consuming.
Is this normal or there is another way to avoid reading data every trail?
Thank you very much!

@sharpe5
Copy link

sharpe5 commented Jan 12, 2021

This is by design.

However, there is an experimental switch which avoids this, see https://nni.readthedocs.io/en/stable/Tutorial/ExperimentConfig.html and reuse. But this means that using the Accessor to early-stop a process will result in a reload of data on all early-killed processes.

To speed things up, cache everything in Apache Arrow .parquet format, which loads blazingly fast compared to other formats such as .csv.

Most GPUs are limited to somewhere around 10GB of GPU RAM. Loading enough data to max out that RAM should take about 5 to 10 seconds with a fast SSD, and if training takes 20 minutes, that is a small overhead compared to the whole. As long as the loading overhead is only a few percentage points of the training time, this is not so much of a disadvantage as it would seem.

@liouxy
Copy link
Author

liouxy commented Jan 13, 2021

This is by design.

However, there is an experimental switch which avoids this, see https://nni.readthedocs.io/en/stable/Tutorial/ExperimentConfig.html and reuse. But this means that using the Accessor to early-stop a process will result in a reload of data on all early-killed processes.

To speed things up, cache everything in Apache Arrow .parquet format, which loads blazingly fast compared to other formats such as .csv.

Most GPUs are limited to somewhere around 10GB of GPU RAM. Loading enough data to max out that RAM should take about 5 to 10 seconds with a fast SSD, and if training takes 20 minutes, that is a small overhead compared to the whole. As long as the loading overhead is only a few percentage points of the training time, this is not so much of a disadvantage as it would seem.

Thanks for your response!

@kvartet
Copy link
Contributor

kvartet commented Jun 10, 2021

We are discussing this feature and will support it in the future. Thanks for your issue again.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants