Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for macos_arm64 platform #664

Merged
merged 1 commit into from
Jun 14, 2021
Merged

Conversation

simonmaurer
Copy link
Contributor

@simonmaurer simonmaurer commented Jun 11, 2021

What do these changes do?

With the latest LCE release v0.6 the TensorFlow dependency has been upgraded to 2.5.0 including relevant changes to support compilation for the macos_arm64 platform using the recent Apple M1 ARM processor.
This also included upstream dependencies on XNNPACK and pthreadpool as needed by the LCE (lce_benchmark_model/lce_minimal) build process.

Starting with latest bazel versions for arm64 this small PR (given all upstream changes in TF) allows building LCE for Apple M1.

Feel free to check out the discussions here: XNNPACK/pthreadpool/TF pip package/TFLite

How Has This Been Tested?

bazel-3.7.2-arm64 build -c opt --config=macos_arm64 --macos_cpus=arm64 //larq_compute_engine/tflite/benchmark:lce_benchmark_model`

or with a more recent version of bazel (as of bazel@492829)

bazel-4.0.0-arm64 build -c opt --macos_cpus=arm64 //larq_compute_engine/tflite/benchmark:lce_benchmark_model`

Benchmark Results

The table below presents single-/multi-threaded performance of Larq Compute Engine v0.6 on
different versions of QuickNet (trained on ImageNet dataset, released on Larq Zoo)
on an Apple mac mini 2020 (M1):

Model Top-1 Accuracy mac mini M1, ms (1 thread) mac mini M1, ms (1 thread w/ XNNPACK)
QuickNetSmall (.h5) 59.4 % 4.0 3.5
QuickNet (.h5) 63.3 % 5.8 5.4
QuickNetLarge (.h5) 66.9 % 9.9 9.5
Model Top-1 Accuracy mac mini M1, ms (4 threads) mac mini M1, ms (4 threads w/ XNNPACK)
QuickNetSmall (.h5) 59.4 % 1.8 1.9
QuickNet (.h5) 63.3 % 2.5 2.6
QuickNetLarge (.h5) 66.9 % 3.9 4.1

Related issue number

#604

@AdamHillier
Copy link
Contributor

That's awesome, thanks @simonmaurer! Since it's not currently possible for our CI to test macos_arm64 binaries, could you possibly run our kernel tests on your M1 machine just to double check they pass?

bazel test larq_compute_engine/tests:cc_tests --copt=-O2 --distinct_host_configuration=false --test_output=all

@simonmaurer
Copy link
Contributor Author

@AdamHillier oh absolutely, my pleasure. the numbers are just crazy ;)

@simonmaurer
Copy link
Contributor Author

simonmaurer commented Jun 11, 2021

..in the process of compiling:

test --macos_cpus=arm64 larq_compute_engine/tests:cc_tests --copt=-O2 --distinct_host_configuration=false --test_output=all

and here's the result:
kernel_tests_macos_arm64

Copy link
Contributor

@AdamHillier AdamHillier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

@AdamHillier AdamHillier requested a review from a team June 11, 2021 15:46
@simonmaurer
Copy link
Contributor Author

updated the PR with benchmark results

Copy link
Member

@lgeiger lgeiger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@lgeiger
Copy link
Member

lgeiger commented Jun 14, 2021

tensorflow/tensorflow#47639 (comment) and tensorflow/tensorflow#47639 (comment) mention that TFLite can make use of the Accelerate framework on macOS.
@simonmaurer do you think it makes sense to enables this for the benchmarks for LCE as well?

@lgeiger lgeiger merged commit 72e5150 into larq:main Jun 14, 2021
@simonmaurer
Copy link
Contributor Author

tensorflow/tensorflow#47639 (comment) and tensorflow/tensorflow#47639 (comment) mention that TFLite can make use of the Accelerate framework on macOS.
@simonmaurer do you think it makes sense to enables this for the benchmarks for LCE as well?

yes, totally agree. according to these comments and evaluations this can further improve latency (even more than xnnpack) depending on the model. could we control this somehow via an additional cmdline parameter when executing the lce_benchmark_model binary? if not this can just be activated with the mentioned build flags

@lgeiger
Copy link
Member

lgeiger commented Jun 14, 2021

could we control this somehow via an additional cmdline parameter when executing the lce_benchmark_model binary

This seems to be a build flag, so I don't think we can have a commandline parameter to toggle this behaviour in the benchmark binary.

@lgeiger lgeiger added the feature New feature or request label Jun 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants