-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Conversation
try to fix a bug on not found executor.
increase shellexecutor count to reduce ssh connections.
to use default GPU if there is.
This reverts commit 9e240dc.
*/ | ||
public getFirstExecutor(): ShellExecutor { | ||
return this.executorArray[0]; | ||
// init a new executor if no free one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comments does not match the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
return this.executorArray[0]; | ||
// init a new executor if no free one. | ||
if (executor === undefined) { | ||
throw new Error("executor shouldn't be undefined before return!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this block is duplicated with above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
public get getUsedConnectionNumber(): number { | ||
return this.usedConnectionNumber; | ||
return this.usedCount; | ||
} | ||
|
||
public addUsedConnectionNumber(): void { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this function addUsedConnectionNumber
be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
await executor.createFolder(trialWorkingFolder); | ||
await executor.createFolder(unixPathJoin(trialWorkingFolder, '.nni')); | ||
await executor.createFolder(trialJobDetail.workingDirectory); | ||
await executor.createFolder(executor.joinPath(trialJobDetail.workingDirectory, '.nni')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use a function support create folder recursively?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
} | ||
} | ||
|
||
async function getRemoteFileContentLoop(executor: ShellExecutor): Promise<void> { | ||
for (let i: number = 0; i < 10; i++) { | ||
const remoteFullName = executor.joinPath(executor.getTempPath(), REMOTEFILE); | ||
for (let i: number = 0; i < 3; i++) { | ||
// console.log(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -41,14 +44,16 @@ describe('ShellExecutor test', () => { | |||
rmMeta = JSON.parse(fs.readFileSync('../../.vscode/rminfo.json', 'utf8')); | |||
console.log(rmMeta); | |||
} catch (err) { | |||
console.log(`Please configure rminfo.json to enable remote machine test.${err}`); | |||
console.log(`Please configure rminfo.json to enable remote machine test. ${err}`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove console.log() in TS code, use this.log()
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tried, it doesn't work in test code.
@@ -14,7 +14,6 @@ assessor: | |||
trial: | |||
codeDir: ../../../examples/trials/cifar10_pytorch | |||
command: python3 main.py --epochs 1 --batches 1 | |||
gpuNum: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove gpuNum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks the test environment is used by two test suites sometime. So if set it to 1, NNI will wait another finished, but it causes timeout error. Remove this setting, it lets test cases to use default GPU, not wait each other, and reduced failure chance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to add examples to test gpuScheduler in NNI, if remove this configuration, we could not test gpu related functions in pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I will add it back. But it's test is very limited.
Fixed some bugs,
Small improvements
1
to reduce concurrency issues./
to\
in windows, and versus in Linux.