Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Port trial examples' config file to v2 #3721

Merged
merged 7 commits into from
Jun 8, 2021

Conversation

liuzhe-lz
Copy link
Contributor

@liuzhe-lz liuzhe-lz commented Jun 3, 2021

Notes:

  • config_pai and other non-local training services in most examples are removed. Instead added a comment asking users to check mnist-pytorch for training service examples.
  • Shared storage example is not ported. I don't have environment to test.
  • system_auto_tuning is not ported. I don't know how to test.
  • mnist-tfv1 and -keras are not ported. They are deprecated.
  • Non-reusable k8s training services recommends v1 config for now.
  • Added remote example for mnist-pytorch.
  • Added "config_detailed.yml" for mnist-pytorch and -tfv2.
  • mnist-distributed is renamed to mnist-distributed-tfv1, to match mnist-distributed-pytorch.

Fixed bugs:

  • When GPU indices contains only one index, the YAML field becomes a int instead of string. This causes problems.
  • Custom tuner's field codeDirectory has different name in code and doc.
  • Expanding annotation only uses v1 trial code directory.

@SparkSnail
Copy link
Contributor

SparkSnail commented Jun 4, 2021

Add example for config_hybrid.yml and config_windows_v2.yml?

@liuzhe-lz
Copy link
Contributor Author

Add example for config_hybrid.yml and config_windows_v2.yml?

I added a comment line in every "basic" example to inform windows users. I think a separate Windows example is bad because it makes me think NNI behaves differently on Windows and Linux. Let's discuss it in the meeting.
A hybrid example should make sense. I'm testing it now.

@@ -0,0 +1,15 @@
searchSpaceFile: search_net.json
trialCodeDirectory: EfficientNet-PyTorch
trialCommand: python main.py /data/imagenet -j 12 -a efficientnet --batch-size 48 --lr 0.048 --wd 1e-5 --epochs 5 --request-from-nni
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python3?

@QuanluZhang QuanluZhang requested a review from scarlett2018 June 4, 2021 11:36
@@ -1,23 +1,14 @@
authorName: default
experimentName: example_pytorch_cifar10
searchSpaceFile: search_space.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest at least for this simple example, we add "experimentName", because if users really use nni to run experiments, they want to give each experiment an easy-to-remember name (not experiment ID). If this field is not added in the example, users have to check config references, which is not friendly

experimentName: example_mnist_pytorch
# This is the minimal config file for an NNI experiment.
# Use "nnictl create --config config.yml" to launch this experiment.
# Afterwards, you can check "config_detailed.yml" for more explaination.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explaination -> explanation

@@ -0,0 +1,42 @@
# This example shows more configurable fields comparing to the minimal "config.yml"
# You can use "nnictl create --config config_detailed.yml" to launch this experiment.
# If you see an error message saying "port 8080 is used", use "nnictl stop --all" to stop previous experiment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-> use "nnictl stop --port 8080" to stop that experiment, or use "nnictl stop --all" to stop all the previous experiments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer not to provide that much details in example. At this point the user does not need to know how to manage multiple experiments.
I'm afraid that "--port 8080" might threaten newbie users.


trialCommand: python3 mnist.py # The command to launch a trial. NOTE: change "python3" to "python" if you are using Windows.
trialCodeDirectory: . # The path of trial code. By default it's ".", which means the same directory of this config file.
trialGpuNumber: 1 # How many GPUs should each trial use. CUDA is required when it's greator than zero.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

greator -> greater

momentum:
_type: uniform
_value: [0, 1]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, "experimentName" should be put here, we can tell users that they can omit it if they don't want to write it.

@QuanluZhang
Copy link
Contributor

looks great!

@ultmaster ultmaster merged commit eb65bc3 into microsoft:master Jun 8, 2021
@liuzhe-lz liuzhe-lz deleted the v2-example branch June 9, 2021 03:44
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants