Add performance benchmark config: MPS 8da4w #8461

manuelcandales · 2025-02-13T14:49:44Z

Adds a new performance benchmark config to keep track of performance on MPS backend when running Llama 3.2 1B inference with 8da4w quantization

pytorch-bot · 2025-02-13T14:49:49Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8461

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures, 1 Cancelled Job

As of commit b00fce1 with merge base 0222074 ():

NEW FAILURES - The following jobs have failed:

apple-perf / upload-benchmark-results (gh)
Process completed with exit code 1.
pull / unittest-arm / linux-job (gh)
RuntimeError: Command docker exec -t 6fc38b6187780f05f878e9e235427a1e6f91fab3a70968c18eb5451bb02a6a35 /exec failed with exit code 1
trunk / test-arm-reference-delegation / linux-job (gh)
RuntimeError: Command docker exec -t 40e1785a37258496680555b06d6512d6ffef05ddc6fca70fd4825ab0307bd4a4 /exec failed with exit code 1
trunk / test-huggingface-transformers (google/gemma-2-2b) / linux-job (gh)
RuntimeError: Command docker exec -t ab44777fbcd5854b66bc6e8f008f50aae04e52cc59639ebcd43a9030bf33134e /exec failed with exit code 1

CANCELLED JOB - The following job was cancelled. Please retry:

apple-perf / benchmark-on-device (meta-llama/Llama-3.2-1B, llama3_mps_8da4w, apple_iphone_15, arn:aws:devicefa... / mobile-job (ios) (gh)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

guangy10 · 2025-02-13T18:30:22Z

Added a link to the Benchmark project here: #8473

guangy10 · 2025-02-13T18:31:23Z

Looks good! Please schedule an on-demand benchmark job to test this new config on your PR before merging

Bump up timeout threshold

guangy10

Job got cancelled due to timeout (after running 120min). Temporarily bump the threshold to 240mins to see if it can actually finish the run successfully. Debugging the slowness run can be done later.

guangy10 · 2025-02-19T02:30:11Z

@huydhn @yangw-dev do you have any idea why this benchmark job itself is running forever? Previous attempt was timed out after 2hours, and I can’t find any info why it’s taking so long. It looks like cancelled job won’t have any log? I temporarily bumped it up to 4hours to see if it can finish successfully, but per @manuelcandales the model shouldn’t run that slow.

https://github.com/pytorch/executorch/actions/runs/13402426162/job/37438073261

huydhn · 2025-02-19T02:48:50Z

From what I see in the previous run https://us-west-2.console.aws.amazon.com/devicefarm/home#/mobile/projects/02a2cf0f-6d9b-45ee-ba1a-a086587469e6/runs/4f1fcf14-2a4c-4364-ad4a-b9c7ecc0a783 and the current run https://us-west-2.console.aws.amazon.com/devicefarm/home#/mobile/projects/02a2cf0f-6d9b-45ee-ba1a-a086587469e6/runs/255ff74c-ad69-43b1-9014-4317e949d9ed, it's always iOS 18 that's hang. So, maybe this is something to do with the OS.

In other cases, the test failed with this error Assertion failed: (0 && "unexpected MPSDataType"), function getMLIRElementType, file MPSGraphUtilities.mm, line 149.

[DEVICEFARM] ########### Entering phase test ###########
 
[DeviceFarm] xcodebuild test-without-building -destination id=$DEVICEFARM_DEVICE_UDID -xctestrun $DEVICEFARM_TEST_PACKAGE_PATH/*.xctestrun -derivedDataPath $DEVICEFARM_LOG_DIR
Command line invocation:
    /Applications/Xcode_15.app/Contents/Developer/usr/bin/xcodebuild test-without-building -destination id=00008120-001A43462E52201E -xctestrun /tmp/devicefarm-workspace/execution-sfquu6rh/test-package-o6owemum/Benchmark_Tests_iphoneos17.5-arm64.xctestrun -derivedDataPath /tmp/devicefarm-workspace/execution-sfquu6rh/logs-e3ncx7ui

User defaults from command line:
    IDEDerivedDataPathOverride = /tmp/devicefarm-workspace/execution-sfquu6rh/logs-e3ncx7ui
    IDEPackageSupportUseBuiltinSCM = YES

2025-02-13 11:45:39.157 xcodebuild[1142:14060]  DVTDevice: Error locating DeviceSupport directory using Optional("arm64e") or Optional("arm64e"): nilError
Test Suite 'All tests' started at 2025-02-13 11:45:40.111.
Test Suite 'Tests.xctest' started at 2025-02-13 11:45:40.111.
Test Suite 'GenericTests' started at 2025-02-13 11:45:40.111.
Test Case '-[GenericTests test_forward_llama_3_2_1b_llama3_mps_8da4w_pte_iOS_17_2_1_iPhone15_4]' started.
Assertion failed: (0 && "unexpected MPSDataType"), function getMLIRElementType, file MPSGraphUtilities.mm, line 149.
2025-02-13 11:46:02.103 xcodebuild[1142:14065]  DVTDevice: Error locating DeviceSupport directory using Optional("arm64e") or Optional("arm64e"): nilError

Restarting after unexpected exit, crash, or test timeout in -[GenericTests test_forward_llama_3_2_1b_llama3_mps_8da4w_pte_iOS_17_2_1_iPhone15_4]; summary will include totals from previous launches.

Test Suite 'Selected tests' started at 2025-02-13 11:46:02.478.
Test Suite 'Tests.xctest' started at 2025-02-13 11:46:02.479.
Test Suite 'GenericTests' started at 2025-02-13 11:46:02.479.
Test Case '-[GenericTests test_load_llama_3_2_1b_llama3_mps_8da4w_pte_iOS_17_2_1_iPhone15_4]' started.
Assertion failed: (0 && "unexpected MPSDataType"), function getMLIRElementType, file MPSGraphUtilities.mm, line 149.
2025-02-13 11:46:24.513 xcodebuild[1142:14110]  DVTDevice: Error locating DeviceSupport directory using Optional("arm64e") or Optional("arm64e"): nilError

Restarting after unexpected exit, crash, or test timeout in -[GenericTests test_load_llama_3_2_1b_llama3_mps_8da4w_pte_iOS_17_2_1_iPhone15_4]; summary will include totals from previous launches.

Test Suite 'Selected tests' started at 2025-02-13 11:46:24.981.
Test Suite 'Tests.xctest' started at 2025-02-13 11:46:24.981.
Test Suite 'GenericTests' started at 2025-02-13 11:46:24.981.
Test Suite 'GenericTests' failed at 2025-02-13 11:46:24.981.
	 Executed 2 tests, with 2 failures (0 unexpected) in 0.000 (0.000) seconds
Test Suite 'LLaMATests' started at 2025-02-13 11:46:24.982.
Test Case '-[LLaMATests test_generate_llama_3_2_1b_llama3_mps_8da4w_pte_tokenizer_model_iOS_17_2_1_iPhone15_4]' started.
Assertion failed: (0 && "unexpected MPSDataType"), function getMLIRElementType, file MPSGraphUtilities.mm, line 149.
2025-02-13 11:46:46.595 xcodebuild[1142:14109]  DVTDevice: Error locating DeviceSupport directory using Optional("arm64e") or Optional("arm64e"): nilError

Restarting after unexpected exit, crash, or test timeout in -[LLaMATests test_generate_llama_3_2_1b_llama3_mps_8da4w_pte_tokenizer_model_iOS_17_2_1_iPhone15_4]; summary will include totals from previous launches.

Test Suite 'Selected tests' started at 2025-02-13 11:46:46.995.
Test Suite 'Tests.xctest' started at 2025-02-13 11:46:46.996.
Test Suite 'LLaMATests' started at 2025-02-13 11:46:46.996.
Test Suite 'LLaMATests' failed at 2025-02-13 11:46:46.996.
	 Executed 1 test, with 1 failure (0 unexpected) in 0.000 (0.000) seconds
Test Suite 'Tests.xctest' failed at 2025-02-13 11:46:46.996.
	 Executed 3 tests, with 3 failures (0 unexpected) in 0.000 (0.000) seconds
Test Suite 'Selected tests' failed at 2025-02-13 11:46:46.996.
	 Executed 3 tests, with 3 failures (0 unexpected) in 0.000 (0.001) seconds
2025-02-13 11:46:53.135 xcodebuild[1142:13639] [MT] IDETestOperationsObserverDebug: 176.335 elapsed -- Testing started completed.
2025-02-13 11:46:53.135 xcodebuild[1142:13639] [MT] IDETestOperationsObserverDebug: 0.000 sec, +0.000 sec -- start
2025-02-13 11:46:53.135 xcodebuild[1142:13639] [MT] IDETestOperationsObserverDebug: 176.335 sec, +176.335 sec -- end

Test session results, code coverage, and logs:
	/tmp/devicefarm-workspace/execution-sfquu6rh/logs-e3ncx7ui/Logs/Test/Test-Benchmark-2025.02.13_11-43-56--0800.xcresult

Failing tests:
	-[GenericTests test_forward_llama_3_2_1b_llama3_mps_8da4w_pte_iOS_17_2_1_iPhone15_4]
	-[GenericTests test_load_llama_3_2_1b_llama3_mps_8da4w_pte_iOS_17_2_1_iPhone15_4]
	-[LLaMATests test_generate_llama_3_2_1b_llama3_mps_8da4w_pte_tokenizer_model_iOS_17_2_1_iPhone15_4]

** TEST EXECUTE FAILED **

huydhn · 2025-02-19T03:08:38Z

Also, here is the test output from the hang iOS 18

[DEVICEFARM] ########### Entering phase test ###########
 
[DeviceFarm] xcodebuild test-without-building -destination id=$DEVICEFARM_DEVICE_UDID -xctestrun $DEVICEFARM_TEST_PACKAGE_PATH/*.xctestrun -derivedDataPath $DEVICEFARM_LOG_DIR
Command line invocation:
    /Applications/Xcode_16.app/Contents/Developer/usr/bin/xcodebuild test-without-building -destination id=00008120-00123D4E0CD1A01E -xctestrun /tmp/devicefarm-workspace/execution-_pntzxas/test-package-rgq2hkis/Benchmark_Tests_iphoneos17.5-arm64.xctestrun -derivedDataPath /tmp/devicefarm-workspace/execution-_pntzxas/logs-arz44yax

User defaults from command line:
    IDEDerivedDataPathOverride = /tmp/devicefarm-workspace/execution-_pntzxas/logs-arz44yax
    IDEPackageSupportUseBuiltinSCM = YES

2025-02-13 11:45:15.426 xcodebuild[1206:11246]  DVTDevice: Error locating DeviceSupport directory using Optional("arm64e") or Optional("arm64e"): nilError
Test Suite 'All tests' started at 2025-02-13 21:45:16.588.
Test Suite 'Tests.xctest' started at 2025-02-13 21:45:16.589.
Test Suite 'GenericTests' started at 2025-02-13 21:45:16.589.
Test Case '-[GenericTests test_forward_llama_3_2_1b_llama3_mps_8da4w_pte_iOS_18_0_iPhone15_4]' started.
2025-02-13 21:45:17.981382+0200 Benchmark[528:12959] fopen failed for data file: errno = 2 (No such file or directory)
2025-02-13 21:45:17.981422+0200 Benchmark[528:12959] Errors found! Invalidating cache...
2025-02-13 21:45:18.102307+0200 Benchmark[528:12959] fopen failed for data file: errno = 2 (No such file or directory)
2025-02-13 21:45:18.102337+0200 Benchmark[528:12959] Errors found! Invalidating cache...
2025-02-13 21:45:18.403760+0200 Benchmark[528:13239] Invalid layer: Tensor dimensions N1D1C128256H1W2048 are not within supported range, N[1-65536]D[1-16384]C[1-65536]H[1-16384]W[1-16384].
/Users/runner/work/executorch/executorch/pytorch/executorch/extension/benchmark/apple/Benchmark/Tests/GenericTests.mm:90: Test Case '-[GenericTests test_forward_llama_3_2_1b_llama3_mps_8da4w_pte_iOS_18_0_iPhone15_4]' measured [Memory Peak Physical, kB] average: 910610.813, relative standard deviation: 0.064%, values: [909624.496000, 909755.568000, 909903.024000, 910034.096000, 910099.632000, 910165.168000, 910197.936000, 910329.008000, 910394.544000, 910476.464000, 910623.920000, 910722.224000, 910853.296000, 911000.752000, 911115.440000, 911213.744000, 911262.896000, 911344.816000, 911492.272000, 911606.960000], performanceMetricID:com.apple.dt.XCTMetric_Memory.physical_peak, baselineName: "", baselineAverage: , polarity: prefers smaller, maxPercentRegression: 10.000%, maxPercentRelativeStandardDeviation: 10.000%, maxRegression: 0.000, maxStandardDeviation: 0.000
/Users/runner/work/executorch/executorch/pytorch/executorch/extension/benchmark/apple/Benchmark/Tests/GenericTests.mm:90: Test Case '-[GenericTests test_forward_llama_3_2_1b_llama3_mps_8da4w_pte_iOS_18_0_iPhone15_4]' measured [Memory Physical, kB] average: 105.677, relative standard deviation: 40.570%, values: [131.072000, 114.688000, 163.840000, 131.072000, 49.152000, -16.384000, 114.688000, 131.072000, 49.152000, 114.688000, 147.456000, 81.920000, 147.456000, 131.072000, 114.688000, 49.152000, 81.920000, 114.688000, 131.072000, 131.072000], performanceMetricID:com.apple.dt.XCTMetric_Memory.physical, baselineName: "", baselineAverage: , polarity: prefers smaller, maxPercentRegression: 10.000%, maxPercentRelativeStandardDeviation: 10.000%, maxRegression: 0.000, maxStandardDeviation: 0.000
/Users/runner/work/executorch/executorch/pytorch/executorch/extension/benchmark/apple/Benchmark/Tests/GenericTests.mm:90: Test Case '-[GenericTests test_forward_llama_3_2_1b_llama3_mps_8da4w_pte_iOS_18_0_iPhone15_4]' measured [Clock Monotonic Time, s] average: 0.133, relative standard deviation: 25.002%, values: [0.048478, 0.058375, 0.069154, 0.098666, 0.141264, 0.145176, 0.147966, 0.146008, 0.149075, 0.150631, 0.146958, 0.146468, 0.152349, 0.154769, 0.151167, 0.149614, 0.149809, 0.147247, 0.148472, 0.149282], performanceMetricID:com.apple.dt.XCTMetric_Clock.time.monotonic, baselineName: "", baselineAverage: , polarity: prefers smaller, maxPercentRegression: 10.000%, maxPercentRelativeStandardDeviation: 10.000%, maxRegression: 0.000, maxStandardDeviation: 0.000
Test Case '-[GenericTests test_forward_llama_3_2_1b_llama3_mps_8da4w_pte_iOS_18_0_iPhone15_4]' passed (5.926 seconds).
Test Case '-[GenericTests test_load_llama_3_2_1b_llama3_mps_8da4w_pte_iOS_18_0_iPhone15_4]' started.
2025-02-13 21:45:24.187354+0200 Benchmark[528:13246] Invalid layer: Tensor dimensions N1D1C128256H1W2048 are not within supported range, N[1-65536]D[1-16384]C[1-65536]H[1-16384]W[1-16384].
2025-02-13 21:45:25.706109+0200 Benchmark[528:13247] Invalid layer: Tensor dimensions N1D1C128256H1W2048 are not within supported range, N[1-65536]D[1-16384]C[1-65536]H[1-16384]W[1-16384].
2025-02-13 11:45:47.867 xcodebuild[1206:11246]  DVTDevice: Error locating DeviceSupport directory using Optional("arm64e") or Optional("arm64e"): nilError

Restarting after unexpected exit, crash, or test timeout; summary will include totals from previous launches.

Test Suite 'Selected tests' started at 2025-02-13 21:45:48.794.
Test Suite 'Tests.xctest' started at 2025-02-13 21:45:48.794.
Test Suite 'GenericTests' started at 2025-02-13 21:45:48.794.
Test Suite 'GenericTests' passed at 2025-02-13 21:45:48.794.
	 Executed 0 tests, with 0 failures (0 unexpected) in 0.000 (0.000) seconds
Test Suite 'LLaMATests' started at 2025-02-13 21:45:48.794.
Test Case '-[LLaMATests test_generate_llama_3_2_1b_llama3_mps_8da4w_pte_tokenizer_model_iOS_18_0_iPhone15_4]' started.
2025-02-13 21:45:49.959863+0200 Benchmark[533:14145] Invalid layer: Tensor dimensions N1D1C128256H1W2048 are not within supported range, N[1-65536]D[1-16384]C[1-65536]H[1-16384]W[1-16384].
, there was an old man who wanted to do some housework. He had two small sons, and they would not let him do the housework because he was old and they were small. One day, the old man sent
PyTorchObserver {"prompt_tokens":4,"generated_tokens":45,"model_load_start_ms":0,"model_load_end_ms":0,"inference_start_ms":1739475949665,"inference_end_ms":1739475956980,"prompt_eval_end_ms":1739475950188,"first_token_ms":1739475950188,"aggregate_sampling_time_ms":189,"SCALING_FACTOR_UNITS_PER_SECOND":1000}
 a big fish was swallowed by a little fish and then it was swallowed by a little frog. This happened so many times that the little frog was very hungry. So he thought of a plan. He wanted to eat as many fishes
PyTorchObserver {"prompt_tokens":4,"generated_tokens":45,"model_load_start_ms":0,"model_load_end_ms":0,"inference_start_ms":1739475956999,"inference_end_ms":1739475964462,"prompt_eval_end_ms":1739475957591,"first_token_ms":1739475957591,"aggregate_sampling_time_ms":381,"SCALING_FACTOR_UNITS_PER_SECOND":1000}
, Thomas Eggar wrote a book called The Black Book of the South Seas. The author was a journalist who set out to find the truth about the slave trade and the fate of the enslaved.
The book was published in 199
PyTorchObserver {"prompt_tokens":4,"generated_tokens":45,"model_load_start_ms":0,"model_load_end_ms":0,"inference_start_ms":1739475964481,"inference_end_ms":1739475971928,"prompt_eval_end_ms":1739475965067,"first_token_ms":1739475965067,"aggregate_sampling_time_ms":595,"SCALING_FACTOR_UNITS_PER_SECOND":1000}
 there was a man who was rich and had two sons. When the man died the sons inherited his possessions and fortune. The older son left home and entered the church and was ordained. The younger son went to a monastery and lived
PyTorchObserver {"prompt_tokens":4,"generated_tokens":45,"model_load_start_ms":0,"model_load_end_ms":0,"inference_start_ms":1739475971946,"inference_end_ms":1739475979364,"prompt_eval_end_ms":1739475972539,"first_token_ms":1739475972539,"aggregate_sampling_time_ms":789,"SCALING_FACTOR_UNITS_PER_SECOND":1000}
 in the Holy Land
Pil** BUILD INTERRUPTED **
Terminated: 15
[DEVICEFARM] ########### Stop received, exit testspec execution ###########
[DEVICEFARM] ########### Finish executing testspec ###########
 
[DEVICEFARM] ########### Setting upload permissions ###########
 
 
[DEVICEFARM] Tearing down your device. Your tests report will come shortly.

From what I see:

test_forward_llama_3_2_1b_llama3_mps_8da4w_pte_iOS_18_0_iPhone15_4 passed in 6s
test_generate_llama_3_2_1b_llama3_mps_8da4w_pte_tokenizer_model_iOS_18_0_iPhone15_4 was the one that hang?

guangy10 · 2025-02-19T21:18:50Z

Weird. When I checked the log, I didn't find the section for each device. Now I can see it.

@manuelcandales I think this pointer would help you debug the issue? #8461 (comment). Do you expect this benchmark config to run on both iOS 17 and 18? If 17 only, we should disable the run for 18. Per what Huy pointed out above, generate can produce new tokens but somehow fail to terminate.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 13, 2025

manuelcandales added the topic: not user facing label Feb 13, 2025

manuelcandales mentioned this pull request Feb 13, 2025

Add performance benchmark config: MPS 8da4w #8429

Closed

Add performance benchmark config: MPS 8da4w

df4b449

manuelcandales force-pushed the bench-mps-8da4w branch from 3b95dd2 to df4b449 Compare February 13, 2025 15:59

Merge branch 'main' into bench-mps-8da4w

0e46531

manuelcandales requested a review from guangy10 February 13, 2025 16:00

Merge branch 'main' into bench-mps-8da4w

1eb3bb1

manuelcandales temporarily deployed to upload-benchmark-results February 13, 2025 17:41 — with GitHub Actions Inactive

manuelcandales had a problem deploying to upload-benchmark-results February 13, 2025 21:39 — with GitHub Actions Failure

Update apple-perf.yml

b00fce1

Bump up timeout threshold

guangy10 reviewed Feb 18, 2025

View reviewed changes

guangy10 temporarily deployed to upload-benchmark-results February 19, 2025 01:10 — with GitHub Actions Inactive

guangy10 had a problem deploying to upload-benchmark-results February 19, 2025 04:57 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add performance benchmark config: MPS 8da4w #8461

Add performance benchmark config: MPS 8da4w #8461

manuelcandales commented Feb 13, 2025

pytorch-bot bot commented Feb 13, 2025 •

edited

Loading

guangy10 commented Feb 13, 2025

guangy10 commented Feb 13, 2025

guangy10 left a comment

guangy10 commented Feb 19, 2025

huydhn commented Feb 19, 2025

huydhn commented Feb 19, 2025

guangy10 commented Feb 19, 2025

Add performance benchmark config: MPS 8da4w #8461

Are you sure you want to change the base?

Add performance benchmark config: MPS 8da4w #8461

Conversation

manuelcandales commented Feb 13, 2025

pytorch-bot bot commented Feb 13, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/8461

❌ 4 New Failures, 1 Cancelled Job

guangy10 commented Feb 13, 2025

guangy10 commented Feb 13, 2025

guangy10 left a comment

Choose a reason for hiding this comment

guangy10 commented Feb 19, 2025

huydhn commented Feb 19, 2025

huydhn commented Feb 19, 2025

guangy10 commented Feb 19, 2025

pytorch-bot bot commented Feb 13, 2025 •

edited

Loading