Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Refactor code storage logic for trial #2403

Merged
merged 62 commits into from
May 19, 2020

Conversation

SparkSnail
Copy link
Contributor

@SparkSnail SparkSnail commented May 5, 2020

  1. in local platform, copy code files to trial's workding dir.
  2. in other platform, copy code files to remote machine firstly, and then copy data in remote machine to trial's working folder, not submitted from local every time.

SparkSnail and others added 30 commits August 6, 2019 18:58
Filter prune algo implementation (microsoft#1655)
Support monitor mode when creating or resuming a new experiment (microsoft#1933)
Add test for documentation build (microsoft#1924)
}
const azureKubeflowClusterConfig: FrameworkControllerClusterConfigAzure = <FrameworkControllerClusterConfigAzure>this.fcClusterConfig;
return await this.uploadFolderToAzureStorage(srcDirectory, destDirectory, azureKubeflowClusterConfig.uploadRetryCount);
} else if (this.fcClusterConfig.storage === 'nfs' || this.fcClusterConfig.storage === undefined) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or the storage is undefined? This should be an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In original design, the storage field is optional, if it is not set, the default value is nfs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but when storage is undefined, there is no config for storage, how do you mount it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is supposed to be check in NNICTL side, if users didn't set storage field, nnictl should check if usres set nfs configuration.

@QuanluZhang
Copy link
Contributor

@SparkSnail should also verify the logic of code dir in DLTS training service, and update the logic if needed.

@QuanluZhang
Copy link
Contributor

would also better to make sure IT is tested for this pr

@QuanluZhang
Copy link
Contributor

for different training services, there should be exactly the same directory structure for trial working directory. suggest to use a folder named code under the working directory to store user code. The advantage of this way is removing possible name conflict.

@SparkSnail
Copy link
Contributor Author

for different training services, there should be exactly the same directory structure for trial working directory. suggest to use a folder named code under the working directory to store user code. The advantage of this way is removing possible name conflict.

Sure, there are two place to store code for a trial, one path is the common folder for all trials under experiment folder, the path is {experimentId}/nni-code, another path is under trial's working folder, the path is {experimentId}/{trialId}. do you mean adding a code folder under {experimentId}/{trialId} path? do not store run.sh together with code files? then the real working directory will become {experimentId}/{trialId}/code, not {experimentId}/{trialId}.

@SparkSnail
Copy link
Contributor Author

@SparkSnail should also verify the logic of code dir in DLTS training service, and update the logic if needed.

Yes, the trial jobs in DLTS share same codeDir folder, and also need to be refactored.

@QuanluZhang QuanluZhang requested a review from squirrelsc May 19, 2020 00:37
@QuanluZhang
Copy link
Contributor

for different training services, there should be exactly the same directory structure for trial working directory. suggest to use a folder named code under the working directory to store user code. The advantage of this way is removing possible name conflict.

Sure, there are two place to store code for a trial, one path is the common folder for all trials under experiment folder, the path is {experimentId}/nni-code, another path is under trial's working folder, the path is {experimentId}/{trialId}. do you mean adding a code folder under {experimentId}/{trialId} path? do not store run.sh together with code files? then the real working directory will become {experimentId}/{trialId}/code, not {experimentId}/{trialId}.

We can leave this refactor to the next pr.

for (const [rmMeta, executorManager] of this.machineExecutorManagerMap.entries()) {
const executor: ShellExecutor = await executorManager.getAvailableExecutor();
if (executor !== undefined) {
await executor.createFolder(this.remoteExpCodeDir);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line can also put in async promise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@squirrelsc
Copy link
Member

Please explain what's changes on folders in PR description.

@SparkSnail SparkSnail merged commit d7456c1 into microsoft:master May 19, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants