Optimized binaries for all new architectures ? #26

NikosDi · 2024-09-23T06:54:54Z

Hello.
I see that for Intel AVX2 code you keep the same binary from Haswell (2013) architecture and the same goes for Intel AVX512 using SkylakePurley (2017) executable and for AMD AVX2/AVX512 you have the same Zen (2017) executable using version 3v2 (AVX512 added for Zen 4)

I know that the instruction sets of AVX2/ AVX512 haven't changed especially regarding the instructions used for this benchmark app, but the implementation of AVX2/ AVX512 even in the same brand/company - Intel / AMD - has definitely changed.

I mean Haswell (2013) executes differently AVX2 code compared to RaptorLake Refresh (2023).
The same goes for SkylakePurley compared to RocketLake or SapphireRapids regarding AVX512 and Zen 2 vs Zen 3 for AVX2 or Zen 4 vs Zen 5 for AVX512.

How come you keep the same executable for all these different architectures ?

TIA

Mysticial · 2024-09-23T07:07:54Z

The reason why Zen gained new binaries is because they have separate FADD and FMA hardware which can be utilized simultaneously for greater throughput.

This is not the case for the vast majority of Intel CPUs. With the exception of client Golden Cove, all of them can be fully utilized with just FMA. So the existing binaries should already be optimal.

For client Golden Cove (Alder/Raptor Lake), I think it is similar to Zen where you need simultaneous FADD and FMA. So in theory, it might get higher flops with the Zen1 binary than the Haswell binary. But I don't have an Alder/Raptor Lake machine to test and confirm.

I suppose the upcoming Arrow Lake will also fit squarely in this category. But again, I need hardware to test.

NikosDi · 2024-09-23T07:38:01Z

So, it's not the kind of instructions or the number or even the chain of the instructions that really matter, but the additional blocks of hardware and the available ports - probably due to the extra hardware.

So the last Zen executable is fully optimized for Zen 5 regarding AVX512 like Zen 4 ?

Mysticial · 2024-09-23T07:46:46Z

Actually, now that I look at it, it looks like I did something different. I added the FADD+FMA test to all the binaries. And I didn't make any new binaries because they wouldn't be any different from existing one.

Zen1-3 -> Use 2017-Zen
Zen4-5 -> 2017-SkylakePurley
Broadwell -> 2013-Haswell
Skylake Client -> 2013-Haswell
Ice Lake -> 2017-SkylakePurley
Tiger Lake -> 2017-SkylakePurley
Rocket Lake -> 2017-SkylakePurley
Alder/Raptor Lake -> 2013-Haswell
Sapphire Rapids -> 2017-SkylakePurley

NikosDi · 2024-09-23T08:08:59Z

Yes, I can see the phrase "Add 512-bit FADD + FMA test for Zen4" next to all binaries.

Maybe you can add a folder named Version 4 or Version 3.1 or Version 3 (v2) or something to clarify it better.

And probably change the ReadMe file too :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimized binaries for all new architectures ? #26

Optimized binaries for all new architectures ? #26

NikosDi commented Sep 23, 2024

Mysticial commented Sep 23, 2024 •

edited

Loading

NikosDi commented Sep 23, 2024

Mysticial commented Sep 23, 2024

NikosDi commented Sep 23, 2024

Optimized binaries for all new architectures ? #26

Optimized binaries for all new architectures ? #26

Comments

NikosDi commented Sep 23, 2024

Mysticial commented Sep 23, 2024 • edited Loading

NikosDi commented Sep 23, 2024

Mysticial commented Sep 23, 2024

NikosDi commented Sep 23, 2024

Mysticial commented Sep 23, 2024 •

edited

Loading