Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized binaries for all new architectures ? #26

Open
NikosDi opened this issue Sep 23, 2024 · 4 comments
Open

Optimized binaries for all new architectures ? #26

NikosDi opened this issue Sep 23, 2024 · 4 comments

Comments

@NikosDi
Copy link

NikosDi commented Sep 23, 2024

Hello.
I see that for Intel AVX2 code you keep the same binary from Haswell (2013) architecture and the same goes for Intel AVX512 using SkylakePurley (2017) executable and for AMD AVX2/AVX512 you have the same Zen (2017) executable using version 3v2 (AVX512 added for Zen 4)

I know that the instruction sets of AVX2/ AVX512 haven't changed especially regarding the instructions used for this benchmark app, but the implementation of AVX2/ AVX512 even in the same brand/company - Intel / AMD - has definitely changed.

I mean Haswell (2013) executes differently AVX2 code compared to RaptorLake Refresh (2023).
The same goes for SkylakePurley compared to RocketLake or SapphireRapids regarding AVX512 and Zen 2 vs Zen 3 for AVX2 or Zen 4 vs Zen 5 for AVX512.

How come you keep the same executable for all these different architectures ?

TIA

@Mysticial
Copy link
Owner

Mysticial commented Sep 23, 2024

The reason why Zen gained new binaries is because they have separate FADD and FMA hardware which can be utilized simultaneously for greater throughput.

This is not the case for the vast majority of Intel CPUs. With the exception of client Golden Cove, all of them can be fully utilized with just FMA. So the existing binaries should already be optimal.

For client Golden Cove (Alder/Raptor Lake), I think it is similar to Zen where you need simultaneous FADD and FMA. So in theory, it might get higher flops with the Zen1 binary than the Haswell binary. But I don't have an Alder/Raptor Lake machine to test and confirm.

I suppose the upcoming Arrow Lake will also fit squarely in this category. But again, I need hardware to test.

@NikosDi
Copy link
Author

NikosDi commented Sep 23, 2024

So, it's not the kind of instructions or the number or even the chain of the instructions that really matter, but the additional blocks of hardware and the available ports - probably due to the extra hardware.

So the last Zen executable is fully optimized for Zen 5 regarding AVX512 like Zen 4 ?

@Mysticial
Copy link
Owner

Actually, now that I look at it, it looks like I did something different. I added the FADD+FMA test to all the binaries. And I didn't make any new binaries because they wouldn't be any different from existing one.

Zen1-3 -> Use 2017-Zen
Zen4-5 -> 2017-SkylakePurley
Broadwell -> 2013-Haswell
Skylake Client -> 2013-Haswell
Ice Lake -> 2017-SkylakePurley
Tiger Lake -> 2017-SkylakePurley
Rocket Lake -> 2017-SkylakePurley
Alder/Raptor Lake -> 2013-Haswell
Sapphire Rapids -> 2017-SkylakePurley

@NikosDi
Copy link
Author

NikosDi commented Sep 23, 2024

Yes, I can see the phrase "Add 512-bit FADD + FMA test for Zen4" next to all binaries.

Maybe you can add a folder named Version 4 or Version 3.1 or Version 3 (v2) or something to clarify it better.

And probably change the ReadMe file too :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants