[AUTOTVM] TOPI integration for ARM CPU #1487

merrymercy · 2018-07-25T13:51:26Z

This PR includes:

Autotvm integration for topi
Winograd for arm cpu
Pre-tuned operator parameters for 5 ARM devices: Huawei P20 Pro (Kirin 970), Google Pixel 2 (Snapdragon 835), raspberry pi 3b, rk3399, pyqn. (The parameters are in https://github.com/uwsaml/tvm-distro/blob/master/tophub/arm_cpu.log, TVM will donwload them during compilation)
Tutorial on how to tune a whole nnvm graph for ARM CPU. (https://docs.tvm.ai/tutorials/autotvm/tune_nnvm_arm.html)
Improve and fix some old tutorials
Make two cpp warnings silent : vectorization warning and target overwriting warning

benchmark results

Firefly-RK3399 : 2 x Cortex A73 1.8Ghz

--------------------------------------------------
Network Name         Mean Inference Time (std dev)
--------------------------------------------------
squeezenet v1.1      48.87 ms            (1.07 ms)
mobilenet            82.16 ms            (0.09 ms)
resnet-18            162.55 ms           (0.14 ms)
vgg-16               912.44 ms           (0.32 ms)

Raspberry Pi 3B : 4 x Cortex A53 1.2Ghz

--------------------------------------------------
Network Name         Mean Inference Time (std dev)
--------------------------------------------------
squeezenet v1.1      92.34 ms            (0.07 ms)
mobilenet            145.22 ms           (0.11 ms)
resnet-18            325.06 ms           (0.23 ms)
vgg-16               crashed due to out of memeory

Huawei P20 Pro / Mate10 Pro (Soc: HiSilicon Kirin 970) : (4 x Cortex A73 2.36GHz)

--------------------------------------------------
Network Name         Mean Inference Time (std dev)
-------------------------------------------------
squeezenet v1.1      27.53 ms            (1.14 ms)
mobilenet            46.53 ms            (0.31 ms)
resnet-18            76.74 ms            (0.18 ms)
vgg-16               479.84 ms           (0.92 ms)

Google Pixel 2 (Soc: Qualcomm Snapdragon 835) : (4 × Kyro 2.35 GHz)

--------------------------------------------------
Network Name         Mean Inference Time (std dev)
--------------------------------------------------
squeezenet v1.1      23.57 ms            (0.42 ms)
mobilenet            40.73 ms            (0.11 ms)
resnet-18            63.95 ms            (0.03 ms)
vgg-16               407.75 ms           (9.57 ms)

PYNQ (2 x Cortex-A9 650MHz)

--------------------------------------------------
Network Name         Mean Inference Time (std dev)
--------------------------------------------------
squeezenet v1.1      452.40 ms           (0.09 ms)
mobilenet            772.16 ms           (0.25 ms)
resnet-18            1243.49 ms          (0.67 ms)
vgg-16               crashed due to out of memeory

tqchen · 2018-07-25T16:22:58Z

apps/benchmark/README.md

@@ -0,0 +1,123 @@
+# Performance Benchmark
+
+## ARM CPU


consider put performance benchmark results in wiki for now, later we can have hosted website for the result, because they can change over time

https://github.com/dmlc/tvm/wiki

How can I edit the wiki?

tqchen · 2018-07-25T16:23:16Z

apps/benchmark/README.md

+Note: If a board has big.LITTLE archtiecture, we will use all big cores.
+Otherwise, we will use all cores.
+
+- **Firefly-RK3399 : 2 x Cortex A73 1.8Ghz+ 4 x Cortex A53 1.5Ghz**


only mark the cores being used(in this case big)

tqchen · 2018-07-25T16:25:52Z

apps/benchmark/README.md

+  parameters in [this repo](https://github.com/uwsaml/tvm-distro).
+  During compilation, TVM will download these operator parameters automatically.
+
+  But we don't tune for other devices, so you can only run benchmark for these devices.


Remove this line, and after this, have a quick section on how to do tuning for a new device

merrymercy · 2018-07-25T19:28:36Z

cc who may be interested in PR
@kevinthesun @yzhliu @Laurawly (autotuning, topi)
@masahi (cpu winograd)
@ajtulloch (mobile cpu)

tqchen · 2018-07-25T19:42:29Z

python/tvm/autotvm/record.py

@@ -239,7 +245,8 @@ def load(self, records):
                            best_by_model[key] = (inp, res)
                    break

-        logging.info("Finish loading %d records", counter)
+        if verbose:
+            logging.info("Finish loading %d records", counter)


consider just use logging.debug?

tqchen · 2018-07-25T19:43:17Z

python/tvm/autotvm/tuner/graph_tuning.py

+
+def tune_tasks(tasks,
+               rpc_device_key,
+


no empty line between arguments

tqchen · 2018-07-25T19:44:32Z

python/tvm/autotvm/tuner/graph_tuning.py

+               early_stopping=200,
+               log_filename='tuning.log',
+
+               mea_number=5,


mea-> measure

tqchen · 2018-07-25T19:45:56Z

python/tvm/autotvm/tuner/graph_tuning.py

+               mea_number=5,
+               mea_parallel_num=1,
+               mea_timeout=20,
+               mea_use_ndk=False,


is it possible to pass in MeasureOption here? The options seem to be a bit duplicating with MeasureOption

tqchen · 2018-07-25T19:46:17Z

python/tvm/autotvm/tuner/xgboost_cost_model.py

-                     len(xs) - np.sum(valid_index),
-                     self.feature_cache.size(self.fea_type))
+        if self.verbose:
+            logging.info("train: %.2f\tobs: %d\terror: %d\tn_cache: %d",


consider use logging.debug and allow user to set the level

tqchen · 2018-07-25T19:46:42Z

python/tvm/autotvm/tuner/xgboost_tuner.py

@@ -40,16 +40,21 @@ class XGBTuner(ModelBasedTuner):
        If is not None, the tuner will first select
        top-(plan_size * diversity_filter_ratio) candidates according to the cost model
        and then pick batch_size of them according to the diversity metric.
+    verbose: int


consider directly rely on logging level for verbosity

This is int not bool, so we leave this.

verbose usually do not have the same meaning, so this argument is confusing. A better name is log_interval.

tqchen · 2018-07-25T19:47:55Z

python/tvm/target.py

+    }
+    pre_defined_opt = opt_table.get(model, [])
+
+    if not os.path.isfile(os.path.join(AUTOTVM_PRETUNED_PARAM_ROOT_PATH, "arm_cpu.log")):


consolidate all the logics of file system manipulation and autotvm cache into one file, say autotvm.tophub

tqchen · 2018-07-25T19:48:31Z

topi/python/topi/nn/conv2d.py

+        The raw kernel tensor
+    tile_size: int
+        Tile size of winograd transform. e.g. 2 for F(2x2, 3x3) and 4 for F(4x4, 3x3)
+    """


need to add return arguments

tqchen · 2018-07-25T19:49:18Z

tutorials/autotvm/tune_nnvm_arm.py

+these operators, it will query this log file to get the best knob values.
+
+We also released pre-tuned parameters for some arm devices. You can go to
+`ARM CPU Benchmark <https://github.com/merrymercy/tvm/blob/arm_cpu/apps/benchmark/README.md#arm-cpu>`_


link to the master version

tqchen · 2018-07-25T19:50:24Z

Please also confirm the VTA CPU test cases, since these depend on availability of old rasp schedule which get removed here

eqy · 2018-07-25T20:49:38Z

apps/benchmark/README.md

+
+  E.g. For my RK3399, I use `python3 -m tvm.exec.rpc_sever --tracker=10.77.1.123:9190 --key=rk3399`
+
+* For Andoird device


nit: Android

eqy · 2018-07-25T20:50:16Z

apps/benchmark/README.md

+  ```
+
+  If you do not do tuning and run the benchmark for other devices directly,
+  the performance is not gauranteed (This is still doable, you can pick a most


nit: guaranteed

masahi · 2018-07-25T20:58:58Z

src/pass/vectorize_loop.cc

@@ -300,7 +300,6 @@ class Vectorizer : public IRMutator {
    CHECK(!op->condition.type().is_vector());
    Expr condition = this->Mutate(op->condition);
    if (condition.type().is_vector()) {
-      LOG(WARNING) << "Detect vector condition in Vectorized Loop, scalarizing...";


why remove this ?

Currently it seems to pollute the logged data; ideally we would just print this once?

This message is important, because it tells us when vectorization isn't working because of vectorize axis length vs input shape mismatch.

I'd imagine this message will mess up log during auto tuning, though.

OK, I reverted this change.

It does not pollute logging data. It occurs when I use "llvm" as target to build resnet-18.

https://github.com/dmlc/tvm/blob/f33fd5c03d8a2b3972e3b69a79a89d0c9754cd9e/topi/python/topi/x86/conv2d.py#L214-L218
@masahi Can I fix this by checking the length of w and only vectorize it when the length of w is a multiple of 16?

@merrymercy Sure. I am aware of this issue. Probably 8 is a better default split factor than 16 for imagenet model.

I am planning to remove this old schedule completely and adapt AVX schedules for SSE target.

masahi · 2018-07-25T21:01:14Z

nnvm/src/top/nn/convolution.cc

+  //                param.kernel_size[1]});
+  // wshape = ConvertLayout(wshape, kOIHW, kernel_layout);
+  // wshape[kernel_layout.indexof('O')] *= param.groups;
+  // NNVM_ASSIGN_INPUT_SHAPE(attrs, *in_shape, Conv2DParam::kWeight, wshape);


Instead of commenting out, I'd sugguest remove them and leave more informative comment on why you don't do weight shape inference here.

masahi · 2018-07-25T21:09:10Z

topi/python/topi/arm_cpu/conv2d.py

+        pre_packed = False
+        CO, _, KH, KW = get_const_tuple(kernel.shape)
+    else:
+        pre_packed = True


I'd sugguest pre_packed -> pre_computed, as this is not simply pre packing.

masahi · 2018-07-25T21:10:22Z

topi/python/topi/arm_cpu/conv2d.py

+            copy_inputs[1] = weight
+            new_attrs['tile_size'] = tile_size
+            return sym.contrib.conv2d_winograd_without_weight_transform(*copy_inputs, **new_attrs)
+    else:


No need for else: block here. I think lint should catch this.

masahi · 2018-07-25T21:17:44Z

topi/python/topi/arm_cpu/conv2d.py

+            return sym.contrib.conv2d_winograd_without_weight_transform(*copy_inputs, **new_attrs)
+    else:
+        # do nothing for depthwise convolution
+        return sym.conv2d(*copy_inputs, **new_attrs)


Better to return None here. When I was doing cuda winograd, returning a new conv2d symbol here caused a strange issue during InferShape. Returning None here solved the issue for me.

masahi · 2018-07-25T21:46:02Z

@merrymercy For winograd input/output transform, I was able to achieve minimal amount of math, like this for F(2x2, 3x3), for example.

produce temp {
  temp[0] = (d[0] - d[8])
  temp[1] = (d[1] - d[9])
  temp[2] = (d[2] - d[10])
  temp[3] = (d[3] - d[11])
  temp[4] = (d[4] + d[8])
  temp[5] = (d[5] + d[9])
  temp[6] = (d[6] + d[10])
  temp[7] = (d[7] + d[11])
  temp[8] = (d[8] - d[4])
  temp[9] = (d[9] - d[5])
  temp[10] = (d[10] - d[6])
  temp[11] = (d[11] - d[7])
  temp[12] = (d[4] - d[12])
  temp[13] = (d[5] - d[13])
  temp[14] = (d[6] - d[14])
  temp[15] = (d[7] - d[15])
}
produce V {
  V[0] = (temp[0] - temp[2])
  V[1] = (temp[1] + temp[2])
  V[2] = (temp[2] - temp[1])
  V[3] = (temp[1] - temp[3])
  V[4] = (temp[4] - temp[6])
  V[5] = (temp[5] + temp[6])
  V[6] = (temp[6] - temp[5])
  V[7] = (temp[5] - temp[7])
  V[8] = (temp[8] - temp[10])
  V[9] = (temp[9] + temp[10])
  V[10] = (temp[10] - temp[9])
  V[11] = (temp[9] - temp[11])
  V[12] = (temp[12] - temp[14])
  V[13] = (temp[13] + temp[14])
  V[14] = (temp[14] - temp[13])
  V[15] = (temp[13] - temp[15])
}

For F(2x2, 3x3), this reduces the number of add/sub for each 4x4 input tile from 64 to 32. Similar reduction exists for F(4x4, 3x3) and it is even more effective. It also allows completely removing matmul from compute definition of input/output transform.

Check out here for a simple test case for this and here for how I integrated this reduction to my implementation of x86 F(4x4, 3x3).

masahi · 2018-07-25T21:55:27Z

topi/python/topi/arm_cpu/conv2d.py

+    s[V].unroll(r_nu)
+    s[V].parallel(b)
+    s[DD].compute_at(s[V], bb)
+


Can you add vectorize here somehow? I'm using different layout from yours, but I can do vectroized input/output transform. My implementation is here

Functionally, would we expect vectorization coverage from this template already? e.g., if a configuration produces an easy-to-vectorize pattern here, would we expect llvm to vectorize already?

I don't think llvm can auto-vectorize this.

masahi · 2018-07-26T06:20:09Z

topi/python/topi/arm_cpu/conv2d.py

+        co, vc = cfg.define_split('tile_co', co, num_outputs=2)
+        oh, vh = cfg.define_split('tile_oh', oh, num_outputs=2)
+        ow, vw = cfg.define_split('tile_ow', ow, num_outputs=2)
+    elif num_tile == 3:   # for gpu


Seems irrelevant for arm cpu.

Yes, it is for ARM Mali GPU.. They can share this function. But I didn't send code of mali in this PR

masahi · 2018-07-26T07:10:12Z

topi/python/topi/generic/nn.py

+    Parameters
+    ----------
+    outs: Array of Tensor
+          The computation graph description of conv2d_nchw


conv2d_nchw -> conv2d_winograd_weight_transform

FrozenGene · 2018-07-26T18:22:47Z

apps/benchmark/arm_cpu_imagenet_bench.py

+    for network in networks:
+        net, params, shape, out_shape = get_network(network, batch_size=1)
+
+        with nnvm.compiler.build_config(opt_level=2, add_pass=['AlterOpLayout']):


Don't your case (AlterOpLayout optimization) not enter into conv2d_NCHWc and x86 schedule? which is only suitable for x86 now. At least for me, it will report error.

merrymercy · 2018-07-26T18:29:40Z

topi/python/topi/arm_cpu/conv2d.py

+    return s
+
+
+@conv2d_alter_layout.register(["arm_cpu", "mali"])


@FrozenGene I registered alter_layout for arm_cpu here. I didn't get any error.

@FrozenGene I added pre-tuned parameters for pynq board, which has a Cortex-A9 cpu.

merrymercy · 2018-07-27T18:00:23Z

@masahi Did you test the performance difference between non-minimal math version and minimal math version? I tried your compute deceleration but cannot get speedup. Have llvm handled this case already?

masahi · 2018-07-27T19:57:00Z

@merrymercy yes, I have scripts to compare minimal version vs non minimal version. You can run them yourself to see the difference. The scripts will dump total execution time as well as time taken for input transform, batched gemm, and output transform separately.

Obviously, if your winograd kernel is completely bottlenecked by gemm, there should be no performance difference. I observed this with my GPU version and x86 avx2 version.

For x86 sse target, my minimal version is consistently faster than non minimal one. The above two scripts will benchmark with sse target. I have tested on recent CPUs (Coffee lake) and old high core count Xeon (12-16 core, Sandybridge and Nehalem). On recent CPUs difference is small. On old Xeon,
where my non minimal version was surprisingly slow, I've seen big difference.

I don't think LLVM can do this non trivial common subexpression elimination. Even if LLVM can detect common subexpressions, it is not supposed to eliminate them I believe, because this is float ops.

merrymercy · 2018-07-27T20:25:30Z

Thanks for the explanation! We can keep the non minimal version for ARM CPU in this PR, since it is more readable.

masahi · 2018-07-27T20:27:51Z

yes, you can follow up with another PR if you find a way to improve performance later. Let's merge this first.

merrymercy · 2018-07-28T00:06:40Z

python/tvm/autotvm/measure/measure.py

+
+        'ndk': use Android NDK to create shared library. Use this for android target.
+
+        callable; customized build function for other backends (e.g. VTA)


see measure/measure_methods.py default_build_func for example

merrymercy · 2018-07-28T00:08:07Z

python/tvm/autotvm/measure/measure.py


-        'local-nofork': use local device for measure but does not use multiprocessing.
-        This mode is suitable for debug, but does not support timeout and parallel.
+        callable: It is a customized function for measurement.


see measure/measure_methods.py measure_rpc for example

tqchen · 2018-07-28T01:26:24Z

apps/benchmark/README.md

+
+  If your device has a same SoC of the above device, you can reuse these parameters
+  (e.g. use `llvm -device=arm_cpu -mode=rk3399 -target=aarch64-linux-gnu` as target).
+  Otherwise, you need to tune for your own device, please follow this [tutorial](please_fix_this_later.html).


fix this or remove for now?

tqchen · 2018-07-28T01:31:58Z

python/tvm/autotvm/tophub.py

+
+AUTOTVM_TOPHUB_ROOT_PATH = os.path.join(os.path.expanduser('~'), ".tvm", "tophub")
+
+def load_context(target, rootpath=AUTOTVM_TOPHUB_ROOT_PATH):


it is still better to allow the with block

with tophub.context(target): my code

Is it possible to allow user also specify their customized location of tunning logs?

tqchen · 2018-07-28T01:32:21Z

python/tvm/autotvm/tophub.py

+"""
+TopHub: Tensor Operator Hub
+To get the best performance, we typically need auto-tuning for the specific devices.
+TVM releases pre-tuned parameters in TopHub (https://github.com/uwsaml/tvm-distro)


since tvm-distro location can change, do not use url for now

tqchen · 2018-07-28T01:32:52Z

python/tvm/autotvm/tophub.py

+    """
+    path = tempdir()
+    filename = path.relpath("info.json")
+    print("Download meta info for pre-tuned parameters")


use logging instead of print

tqchen · 2018-07-28T16:09:20Z

nnvm/src/top/nn/convolution.cc

@@ -130,11 +130,110 @@ inline bool Conv2DInferShape(const nnvm::NodeAttrs& attrs,
  return true;
 }

+inline bool WinogradConv2DInferShape(const nnvm::NodeAttrs& attrs,
+                             std::vector<TShape>* in_shape,


argument alignment

tqchen · 2018-07-28T16:12:56Z

python/tvm/autotvm/measure/measure.py

@@ -101,20 +101,29 @@ def measure_option(mode,
        The number of measurement task that can run in parallel.
        Set this according to the number of cpu cores (for compilation) and
        the number of devices you have (for measuring generate code).
+    do_fork: bool, optional


Principle one of interface design is to simplify and hide options user do not use, in this case, do_fork is only used in local. I think we should remove this, and allow user to pass in

measure_func = autotvm.measure.local_nofork(measure args)

similarly, if pack_size, rpc_device_key etc are only arguments to rpc. I think we should have good default, and allow user to do

measure_func = autotvm.measure.rpc_(rpc_key=xxxx)

do_fork is used in local_executor not measure_func. It can be used in any mode.

tqchen · 2018-07-28T16:15:09Z

python/tvm/autotvm/measure/measure.py

+                   build_func='default',
+
+                   replay_db=None,
+                   save_to_replay_db=True):


can save_to_display_db become an optional callback function?

tqchen · 2018-07-28T16:16:18Z

need to rebase against the master

merrymercy · 2018-08-01T14:19:41Z

I am doing some refactor. Do not merge.

ajtulloch · 2018-08-01T17:45:31Z

Is it worth preserving the function tvm.target.rasp() which just redirects to tvm.target.arm_cpu('rasp3b')? There's a bunch of tutorial/discuss/stackoverflow code that mentions it, and it seems like an easy way to not break existing out-of-tree code?

tqchen · 2018-08-01T17:48:36Z

@merrymercy can you add tvm.target.rasp() as per comment by @ajtulloch ?

eqy · 2018-08-01T18:32:28Z

I noticed that there are some things deleted; are we removing check_correctness and automatic sanity checking for CUDA/OpenCL GPU targets, or is that currently being refactored?

merrymercy · 2018-08-01T21:32:06Z

@ajtulloch tvm.target.rasp added.
@eqy They are moved to another file (measure_methods.py)

FrozenGene · 2018-08-02T09:01:20Z

@merrymercy Have we updated the related docs? I get your PR code into tvm/master and follow the tutorial https://docs.tvm.ai/tutorials/autotvm/tune_nnvm_arm.html, but I find that I can not train and get this information:
[Task 1/19] Current/Best: 0.00/ 0.00 GFLOPS | Progress: (201/1000) | 499.30 s Done
The result is 0.00 / 0.00. Didn't your doc sync with your code? Or some place I omit?

BTW, I register one remote device named custome_device non contained in your predefined device table.

tqchen · 2018-08-02T15:59:21Z

Thanks @merrymercy @masahi @ajtulloch @eqy @FrozenGene This is merged

tqchen · 2018-08-02T16:00:08Z

@FrozenGene can you open an discuss thread in the https://discuss.tvm.ai/ so we can followup the discussion?

FrozenGene · 2018-08-02T16:13:12Z

@tqchen Of course, https://discuss.tvm.ai/t/autotvm-could-not-train-for-the-remote-device/563

FrozenGene · 2018-08-23T14:52:52Z

topi/python/topi/arm_cpu/conv2d.py

+
+    new_attrs = {k: attrs[k] for k in attrs.keys()}
+
+    assert attrs.get_int_tuple("dilation") == (1, 1), "Does not support dilation " \


I know we have merged it. But when I ran one model today, I find we can have better mechanism @merrymercy . Move this line after line 491.

assert attrs.get_int_tuple("dilation") == (1, 1), "Does not support dilation " \ "when alter_op_layout is enabled" (if groups == 1)

Because we will not change the kernel layout for depthwise conv2d.

Or we can support it in compute_conv2d function use topi.nn.dialate(inputs[1], (1, 1, dialate_h, dialate_w, 1).

merrymercy mentioned this pull request Jul 25, 2018

[AUTOTVM] Automated Operator Optimization #1311

Closed

tqchen requested changes Jul 25, 2018

View reviewed changes

tqchen added the status: need review label Jul 25, 2018

tqchen requested changes Jul 25, 2018

View reviewed changes

eqy reviewed Jul 25, 2018

View reviewed changes

masahi reviewed Jul 25, 2018

View reviewed changes

masahi reviewed Jul 26, 2018

View reviewed changes

tqchen mentioned this pull request Jul 26, 2018

[TOPI] Winograd #899

Closed

4 tasks

FrozenGene reviewed Jul 26, 2018

View reviewed changes

merrymercy commented Jul 26, 2018

View reviewed changes

merrymercy commented Jul 28, 2018

View reviewed changes

tqchen requested changes Jul 28, 2018

View reviewed changes

autotvm update & fix

cb8cc3b

merrymercy force-pushed the arm_cpu branch from 0a0f489 to 040a7de Compare August 1, 2018 08:01

merrymercy added 3 commits August 1, 2018 08:11

update root dispatch context

f12707d

update vta

9bf3ba4

fix target for vta

56366cc

tqchen self-assigned this Aug 1, 2018

trigger CI

a3bfcb2

use with for tophub context

5afc504

merrymercy added 4 commits August 1, 2018 14:39

fix typo

7e4c94b

Merge branch 'arm_cpu' of github.com:merrymercy/tvm into arm_cpu

c671a22

lint

e6f6f0f

use local session in tutorial

a8dbf9c

tqchen approved these changes Aug 2, 2018

View reviewed changes

tqchen merged commit d3ca9c2 into apache:master Aug 2, 2018

tqchen added status: accepted and removed status: need review labels Aug 2, 2018

merrymercy mentioned this pull request Aug 3, 2018

[DOC] Fix docs and dependencies #1535

Merged

merrymercy deleted the arm_cpu branch August 3, 2018 23:55

tqchen pushed a commit to tqchen/tvm that referenced this pull request Aug 4, 2018

[AUTOTVM] TOPI integration for ARM CPU (apache#1487)

32076df

sergei-mironov pushed a commit to sergei-mironov/tvm that referenced this pull request Aug 8, 2018

[AUTOTVM] TOPI integration for ARM CPU (apache#1487)

0354c31

FrozenGene reviewed Aug 23, 2018

View reviewed changes

tqchen pushed a commit to tqchen/tvm that referenced this pull request Mar 29, 2020

[AUTOTVM] TOPI integration for ARM CPU (apache#1487)

43ce06b


		E.g. For my RK3399, I use `python3 -m tvm.exec.rpc_sever --tracker=10.77.1.123:9190 --key=rk3399`

		* For Andoird device


		'ndk': use Android NDK to create shared library. Use this for android target.

		callable; customized build function for other backends (e.g. VTA)


		AUTOTVM_TOPHUB_ROOT_PATH = os.path.join(os.path.expanduser('~'), ".tvm", "tophub")

		def load_context(target, rootpath=AUTOTVM_TOPHUB_ROOT_PATH):


		new_attrs = {k: attrs[k] for k in attrs.keys()}

		assert attrs.get_int_tuple("dilation") == (1, 1), "Does not support dilation " \

[AUTOTVM] TOPI integration for ARM CPU #1487

[AUTOTVM] TOPI integration for ARM CPU #1487

Conversation

merrymercy commented Jul 25, 2018 • edited Loading

benchmark results

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

merrymercy commented Jul 25, 2018 • edited by tqchen Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Jul 25, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

masahi Jul 25, 2018 • edited Loading

Choose a reason for hiding this comment

merrymercy Jul 29, 2018 • edited Loading

Choose a reason for hiding this comment

merrymercy Jul 29, 2018 • edited Loading

Choose a reason for hiding this comment

masahi Jul 29, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

masahi Jul 25, 2018 • edited Loading

Choose a reason for hiding this comment

masahi commented Jul 25, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

masahi Jul 25, 2018 • edited Loading

Choose a reason for hiding this comment

masahi Jul 26, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

merrymercy Aug 1, 2018 • edited Loading

Choose a reason for hiding this comment

merrymercy commented Jul 27, 2018 • edited Loading

masahi commented Jul 27, 2018 • edited Loading

merrymercy commented Jul 27, 2018

masahi commented Jul 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tqchen commented Jul 28, 2018

merrymercy commented Aug 1, 2018 • edited Loading

ajtulloch commented Aug 1, 2018

tqchen commented Aug 1, 2018

eqy commented Aug 1, 2018

merrymercy commented Aug 1, 2018

FrozenGene commented Aug 2, 2018 • edited Loading

tqchen commented Aug 2, 2018

tqchen commented Aug 2, 2018

FrozenGene commented Aug 2, 2018

merrymercy commented Jul 25, 2018 •

edited

Loading

merrymercy commented Jul 25, 2018 •

edited by tqchen

Loading

masahi Jul 25, 2018 •

edited

Loading

merrymercy Jul 29, 2018 •

edited

Loading

merrymercy Jul 29, 2018 •

edited

Loading

masahi Jul 29, 2018 •

edited

Loading

masahi Jul 25, 2018 •

edited

Loading

masahi commented Jul 25, 2018 •

edited

Loading

masahi Jul 25, 2018 •

edited

Loading

masahi Jul 26, 2018 •

edited

Loading

merrymercy Aug 1, 2018 •

edited

Loading

merrymercy commented Jul 27, 2018 •

edited

Loading

masahi commented Jul 27, 2018 •

edited

Loading

merrymercy commented Aug 1, 2018 •

edited

Loading

FrozenGene commented Aug 2, 2018 •

edited

Loading

FrozenGene Aug 23, 2018 •

edited

Loading