Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp Android Benchmarks #15452

Closed
mariecwhite opened this issue Nov 7, 2023 · 6 comments · Fixed by #18144
Closed

Revamp Android Benchmarks #15452

mariecwhite opened this issue Nov 7, 2023 · 6 comments · Fixed by #18144
Assignees
Labels
enhancement ➕ New feature or request

Comments

@mariecwhite
Copy link
Contributor

Request description

Given recent changes to how cpu-features are getting picked up, we need to update the Android benchmarks.

  1. Use --iree-llvmcpu-target-cpu instead of --iree-llvmcpu-target-cpu-features. This means no separation between quant and non-quant models and manually adding CPU features such as +dotprod.

This does complicate the pipeline. We'll need a mapping for each phone and core used. So using a (taskset, target-cpu) tuple:

Pixel 6: (c0, cortex-x1), (30, cortex-a76), (0f, cortex-a55)
Pixel 8 Pro: (100, cortex-x3), (f0, cortex-a715), (0f, cortex-a510)
Motorola Edge+: (100, cortex-x3), (c0, cortex-a715), (30, cortex-a710), (0f, cortex-a510)

  1. Change the number of threads and cpu core group naming. At the moment we bundle cores into either big and little, but there's actually a spectrum. Another reason why GPT-2 is noisy is that it's actually running on 2 different cores (cortex-x1 2.8Ghz and cortex-a76 2.25 Ghz).

To reduce noise, let's make sure the benchmarks run on homogeneous cores. So using a (number of threads, taskset) tuple:

Pixel 6: (1, 80), (2, c0)
Pixel 8 Pro: (1, 100), (4, f0)
Motorola Edge+: (1, 100), (2, c0)

It may not be the fastest configuration but should reduce noise. We can run the fastest configurations e.g. (5, 1f0) either under benchmark-large.yml or the openxla-benchmark repo.

What component(s) does this issue relate to?

Other

Additional context

No response

@mariecwhite
Copy link
Contributor Author

mariecwhite commented Nov 7, 2023

@KoolJBlack and I did some debugging on Android and it is more stable to set the task_topology_cpu_ids instead of a combination of taskset and task_topology_group_count.

The mapping of cpu_id to physical core is non-intuitive and does not correspond to taskset semantics. For Pixel 8, the biggest core corresponds to cpu_id 0, and the 4 big cores have cpu_ids 1, 2, 3, 4 (whereas the taskset ordering is reverse).

We'll also need to get the mappings for Pixel 6 and Motorola phones.

@pzread
Copy link
Contributor

pzread commented Nov 9, 2023

I also found that --task_topology_cpu_ids seems to be local-task driver specific. For GPU and local-sync benchmarks, we might still need taskset to pin the thread.

@KoolJBlack
Copy link
Contributor

KoolJBlack commented Nov 10, 2023

Can't speak to GPU as I haven't run any models on it as of late. The --task_topology_cpu_ids I am using with the local-task driver. local-sync should be limited to a serialized worker, in which case the taskset might work.

pzread pushed a commit that referenced this issue Nov 13, 2023
This change adds the configurations of CPU pinning to the benchmark
definitions.

Also follows #15452 to remap multi-thread benchmarks on the 2
homogeneous big cores on Pixel 6
@mariecwhite
Copy link
Contributor Author

We've also come to the conclusion that using --iree-llvmcpu-target-cpu for mobile CPUs is not feasible. Each device implements the ARM spec differently so it's impossible to know exactly which CPU features to use given a CPU target. We'll need to set the --iree-llvmcpu-target-cpu-features flag instead of --iree-llvmcpu-target-cpu for ARM CPU.

For Pixel 8, we've found that this set of flags is supported:
+v9a,+fullfp16,fp-armv8,+neon,+aes,+sha2,+crc,+lse,+rdm,+complxnum,+rcpc,+sha3,+sm4,+dotprod,+fp16fml,+dit,+flagm,+ssbs,+sb,+altnzcv,+fptoint,+bf16,+i8mm,+bti

We'll need to do a similar exercise on the Pixel 6 and Moto devices. @dcaballe are there steps to identify which flags to use for each device?

@ScottTodd
Copy link
Member

We've also come to the conclusion that using --iree-llvmcpu-target-cpu for mobile CPUs is not feasible. Each device implements the ARM spec differently so it's impossible to know exactly which CPU features to use given a CPU target. We'll need to set the --iree-llvmcpu-target-cpu-features flag instead of --iree-llvmcpu-target-cpu for ARM CPU.

For Pixel 8, we've found that this set of flags is supported: +v9a,+fullfp16,fp-armv8,+neon,+aes,+sha2,+crc,+lse,+rdm,+complxnum,+rcpc,+sha3,+sm4,+dotprod,+fp16fml,+dit,+flagm,+ssbs,+sb,+altnzcv,+fptoint,+bf16,+i8mm,+bti

We'll need to do a similar exercise on the Pixel 6 and Moto devices. @dcaballe are there steps to identify which flags to use for each device?

This is good info! Can whoever is doing those investigations take notes along the way and contribute to documentation updates (tracked at #15487)?

@dcaballe
Copy link
Contributor

We'll need to do a similar exercise on the Pixel 6 and Moto devices. @dcaballe are there steps to identify which flags to use for each device?

I did this manually, unfortunately (cat /proc/cpuinfo and then map those features to LLVM target features`). However, Quentin provided some instructions to dump the llvm target features of a device. It's in an internal issue.

We also talked about adding aliases to avoid carrying over a huge list of target features. Basically, we could add Pixel8, Pixel6, etc... target cpus that expand to the corresponding feature flag lists.

ramiro050 pushed a commit to ramiro050/iree that referenced this issue Dec 19, 2023
This change adds the configurations of CPU pinning to the benchmark
definitions.

Also follows iree-org#15452 to remap multi-thread benchmarks on the 2
homogeneous big cores on Pixel 6
@pzread pzread assigned yuennancy and unassigned pzread Apr 23, 2024
@github-project-automation github-project-automation bot moved this from Inbox to Done in (Deprecated) IREE Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement ➕ New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants