Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add detection for zen 5 #56967

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

add detection for zen 5 #56967

wants to merge 3 commits into from

Conversation

simeonschaub
Copy link
Member

@@ -236,6 +237,7 @@ constexpr auto znver2 = znver1 | get_feature_masks(clwb, rdpid, wbnoinvd);
constexpr auto znver3 = znver2 | get_feature_masks(shstk, pku, vaes, vpclmulqdq);
constexpr auto znver4 = znver3 | get_feature_masks(avx512f, avx512cd, avx512dq, avx512bw, avx512vl, avx512ifma, avx512vbmi,
avx512vbmi2, avx512vnni, avx512bitalg, avx512vpopcntdq, avx512bf16, gfni, shstk, xsaves);
constexpr auto znver5 = znver4 | get_feature_masks(avxvnni, movdiri, movdir64b, avx512vp2intersect, /*prefetchi,*/ avxvnni);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume prefetchi needs to be added to src/features_x86.h, but I didn't know how

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JL_FEATURE_DEF(avxvnni, 32 * 9 + 4, 120000)
needs to be added here.

Now you need to look in the CPU docs for how prefetchi is encoded.

image

From the "Processor Programming Reference" https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/57896.zip

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/llvm/llvm-project/blob/3edbe36c3eb01d1c35ac1761da108e3a493258ee/clang/lib/Headers/cpuid.h#L220 The bits are here, though you will to add the

// EAX=7,ECX=1: EDX 

branch IIUC

Copy link
Member Author

@simeonschaub simeonschaub Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hints! What I don't get is where the 32 * 8, 32 * 9 etc. is coming from.

Is this the correct patch or are the 32 * 9 bits incorrect?

diff --git a/src/features_x86.h b/src/features_x86.h
index 2ecc8fee32..b817781404 100644
--- a/src/features_x86.h
+++ b/src/features_x86.h
@@ -113,6 +113,9 @@ JL_FEATURE_DEF(wbnoinvd, 32 * 8 + 9, 0)
 JL_FEATURE_DEF(avxvnni, 32 * 9 + 4, 120000)
 JL_FEATURE_DEF(avx512bf16, 32 * 9 + 5, 0)
 
+// EAX=7,ECX=1: EDX
+JL_FEATURE_DEF(prefetchi, 32 * 9 + 20, 0)
+
 // EAX=0x14,ECX=0: EBX
 JL_FEATURE_DEF(ptwrite, 32 * 10 + 4, 0)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm implementing it and maybe adding some comments

@imciner2
Copy link
Contributor

imciner2 commented Jan 6, 2025

Won't we need to wait for #56130 to be merged before we can use Zen5 since that is only in LLVM 19?

@simeonschaub
Copy link
Member Author

Yes, to take full advantage of zen 5 features I believe LLVM 19 is needed, but this PR is still an improvement since we now fall back to the znver4 target instead of the generic one

Comment on lines 83 to 84
JL_FEATURE_DEF(avx512vnniw, 32 * 4 + 2, 0)
JL_FEATURE_DEF(avx512fmaps, 32 * 4 + 3, 0)
JL_FEATURE_DEF(uintr, 32 * 4 + 5, 140000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the last statement a comment which LLVM version introduced support?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it turns out those were never implemented :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants