Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

support tf2 NAS with non-weight-sharing mode #2541

Merged
merged 25 commits into from
Jun 27, 2020

Conversation

QuanluZhang
Copy link
Contributor

@QuanluZhang QuanluZhang commented Jun 9, 2020

  1. support non-weight-sharing mode for tf2
  2. change naive-tf example to run in non-weight-sharing mode
  3. add IT for classic nas pytorch

@QuanluZhang QuanluZhang marked this pull request as ready for review June 12, 2020 07:33
@chicm-ms chicm-ms requested review from liuzhe-lz and ultmaster June 12, 2020 08:33
if epoch % 1 == 0:
print("Epoch {:03d}: Loss: {:.3f}, Accuracy: {:.3%}".format(epoch,
epoch_loss_avg.result(),
epoch_accuracy.result()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we report intermediate results here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to report intermediate result, because this example does not use it. do you think this example should use assessor?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's for demo purposes. It's fine without it.

@QuanluZhang QuanluZhang requested a review from chicm-ms June 18, 2020 04:45
@@ -39,6 +39,20 @@ def update_training_service_config(config, training_service):
deep_update(config, it_ts_config['all'])
deep_update(config, it_ts_config[training_service])

def nnictl_generate_search_space(test_yml_config, test_case_config, args):
trial_command = test_yml_config['trial']['command']
code_dir = args.nni_source_dir + test_case_config['ssgenCodeDir']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use os.path.join

def nnictl_generate_search_space(test_yml_config, test_case_config, args):
trial_command = test_yml_config['trial']['command']
code_dir = args.nni_source_dir + test_case_config['ssgenCodeDir']
ss_file_path = args.nni_source_dir + test_case_config['ssFilePath']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use os.path.join

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good suggestion, I just followed the previous code

@@ -51,6 +65,12 @@ def prepare_config_file(test_case_config, it_config, args):
if sys.platform == 'win32' and args.ts == 'local':
test_yml_config['trial']['command'] = test_yml_config['trial']['command'].replace('python3', 'python')

# generate search space file for classic nas
if test_case_config.get('doSsgen') is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doSsGen?

ssgenCodeDir: examples/nas/classic_nas
# this file is automatically generated by nnictl ss_gen
ssFilePath: test/config/examples/nni-nas-search-space.json


Copy link
Contributor

@chicm-ms chicm-ms Jun 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can split this test case into 2 cases, then we will not need to handle the special key 'doSsgen' etc only for this case.
1.

name: classic-nas-gen-ss
configFile: test/config/examples/classic-nas-pytorch.yml
launchCommand: nnictl ss_gen --trial_command="python3 mnist.py --epochs 1" --trial_dir=../examples/nas/classic_nas --file=test/config/examples/nni-nas-search-space.json
stopCommand:
experimentStatusCheck: False
name: classic-nas-pytorch
configFile: test/config/examples/classic-nas-pytorch.yml
# remove search space file
stopCommand: nnictl stop; python3 -c 'import os; os.remove("test/config/examples/nni-nas-search-space.json")'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by the way, are the two cases guaranteed to be tested sequentially one after the other?

Copy link
Contributor

@chicm-ms chicm-ms Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the order is guaranteed. You can run the 2 cases on local system to check whether the case is OK:

python3 nni_test/nnitest/run_tests.py --config config/integration_tests.yml --case classic-nas

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems stopCommand cannot be the combine of two commands....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it seems the combination of two commands not working for stopCommand due to the shlex.split,
it seems it can be fixed by replacing proc = subprocess.run(shlex.split(launch_command)) with:

proc = subprocess.run(launch_command, shell=True)

@@ -72,6 +72,14 @@ testCases:
- name: nested-ss
configFile: test/config/examples/mnist-nested-search-space.yml

- name: classic-nas-pytorch
Copy link
Contributor

@chicm-ms chicm-ms Jun 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a tf2 test case? if yes, need to be moved into test/config/integration_tests_tf2.yml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one can be tested with tf 1.x, I will add another one in test/config/integration_tests_tf2.yml

@@ -75,7 +75,9 @@ def run_test_case(test_case_config, it_config, args):
stop_command = get_command(test_case_config, 'stopCommand')
print('Stop command:', stop_command, flush=True)
if stop_command:
subprocess.run(shlex.split(stop_command))
for command in stop_command.split('&'):
print('Command:', command, flush=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to fix it like this to support multiple commands:

subprocess.run(launch_command, shell=True)

@QuanluZhang QuanluZhang merged commit 9a1fb17 into microsoft:master Jun 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants