-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoScheduler] Add a tutorial on auto-scheduling a network for x86 CPU #7019
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just nits.
# ------------------------------------------------- | ||
# | 0 | 0.010 | 0.40 | 64 | | ||
# | 1 | 0.087 | 47.19 | 64 | | ||
# | 2 | 0.008 | -0.00 | 64 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @masahi pointed out in the forum, it would be better to explain why we got -0.00
for this task.
We then use the auto-scheduler to construct a search space of this DAG and search | ||
for good schedules (low-level optimizations). | ||
|
||
Different from the template-based :ref:`autotvm <tutorials-autotvm-sec>` which relies on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sentence a little difficult to read; perhaps go with : "The autoscheduler does not require any schedule templates. Therefore it greatly improves upon the template-based autoTVM..."
# correctly with any layout, we found the best performance is typically | ||
# achieved with NHWC layout. We also implemented more optimizations for | ||
# NHWC layouts with the auto-scheduler. | ||
# So it is recommended to convert your models to NHWC layout to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: is it recommended or mandatory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is recommended. Auto-scheduler can work correctly with any layout. But the performance for NCHW is just not guaranteed.
# the auto-scheduler. | ||
|
||
|
||
def get_network(name, batch_size, layout="NHWC", dtype="float32"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note on the restriction: while relay.testing library is really convenient, not all models offer the choice to change the layout (VGG). In addition, many importers are fixed layout. It would greatly benefit this tutorial if we showed how to transform the layout of a whole graph that is NCHW since many folks will hit this limitation coming from MxNet, Pytorch, ONNX etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I will add a link to the convert layout pass
# Extract tasks from the network | ||
print("Extract tasks...") | ||
mod, params, input_shape, output_shape = get_network(network, batch_size, layout, dtype=dtype) | ||
tasks, task_weights = auto_scheduler.extract_tasks(mod["main"], params, target) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another quality of life improvement here would be to error out if the tasks are for NCHW layout in which case no tasks would get extracted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Auto-scheduler can work with any layout. For NCHW, it can correctly extract tasks and tune them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @merrymercy for the tutorial, this is excellent! I left some comments / questions to address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. A bit nitty comments
@@ -170,11 +172,11 @@ def get_network(name, batch_size, layout="NHWC", dtype="float32"): | |||
# Typically, we recommend a value >= 300 ms. | |||
# * :code:`num_measure_trials` is the number of measurement trials we can use during the tuning. | |||
# You can set it to a small number (e.g., 200) for a fast demonstrative run. | |||
# In practice, we recommend setting it around :code:`1000 * len(tasks)`, | |||
# In practice, we recommend setting it around :code:`900 * len(tasks)`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason changing 1000
to 900
? Any experiment or principle?
# | ||
# * :code:`num_measure_trials` is the number of measurement trials we can use during the tuning. | ||
# You can set it to a small number (e.g., 200) for a fast demonstrative run. | ||
# In practice, we recommend setting it around :code:`800 * len(tasks)`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is 800 now. GPU is 900. I think it is a bit confused. If there is no special reason, unified them into 1000?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GPU has a larger search space so it should use a larger value.
1000 is typically too much.
…PU (apache#7019) * [AutoScheduler] Add tutorial on auto-scheduling a network for CPU * update * update * update * improve * improve * address comments * add help on layout conversion * add help for layout conversion * update target string * update cuda logs
…PU (apache#7019) * [AutoScheduler] Add tutorial on auto-scheduling a network for CPU * update * update * update * improve * improve * address comments * add help on layout conversion * add help for layout conversion * update target string * update cuda logs
…PU (apache#7019) * [AutoScheduler] Add tutorial on auto-scheduling a network for CPU * update * update * update * improve * improve * address comments * add help on layout conversion * add help for layout conversion * update target string * update cuda logs
Add a tutorial on auto-scheduling a network for CPU.
With #6987 #6903, we can get good performance and fast tuning speed for CNN on CPU now.
I will upstream more optimizations for winograd conv2d, conv3d, matmul in follow up PRs.