[Target][TOPI] Use LLVM for x86 CPU feature lookup #15685

cbalint13 · 2023-09-06T14:42:39Z

Hi folks,

This PR leverage LLVM itself for CPU features lookup, replacing hard-coded lists.
In order to keep maintainability with X86 families & features we can rely on LLVM.

Changes:

Introduce a single target_has_feature(XXX) replacing all target_has_XXX()
PY+FFI: expose new llvm_x86_get_archlist, llvm_x86_get_features & llvm_x86_has_feature
PY: expose new target_has_feature wrapper to _ffi.llvm_x86_has_feature

There is a test unit for a comprehensive check with the old behaviour.
For better reliability, this way of feature checking can be implemented for other arches.

Thanks,
~Cristian.

Cc: @elvin-n , @vvchernov , @echuraev , @vinx13 , @jcf94 , @masahi

kparzysz-quic

Looks good. Thanks!

junrushao

This is an awesome feature I've been thinking of Thanks for the patch!

src/target/llvm/llvm_module.cc

echuraev

LGTM! Thank you for your contribution. It is very useful!

junrushao · 2023-09-07T23:50:01Z

The CI fails because the LLVM version on CI is pretty low (==10). I'm curious if there's any variant of this API on LLVM 10? If not, we should bump LLVM to 15 or 16

cbalint13 · 2023-09-08T10:24:30Z

The CI fails because the LLVM version on CI is pretty low (==10). I'm curious if there's any variant of this API on LLVM 10? If not, we should bump LLVM to 15 or 16

Folks,
@junrushao ,

Yes, I am aware of this llvm<=10 issue, so llvm==11 would be the minimum (tested).
Looking at llvm<=10 to see other way (little +extra code costs) of tapping differently into their API.

I am strongly opting for a backward compatibility for this case.

The API fracture, at a first glance:

llvm>=11 expose/uses a c_struct + a nice func X86::getFeaturesForCPU for it.
llvm<=10 expose a tablegen (same info) so need to see where it ends to use that via some API.

Allow a little time (1-2 day) to investigate a way going down llvm<11, then I'll be back with the results.

cbalint13 · 2023-09-10T15:44:39Z

I am strongly opting for a backward compatibility for this case.
Allow a little time (1-2 day) to investigate a way going down llvm<11, then I'll be back with the results.

Details of investigation:

Post llvm>=11:

Direct access via TargetParsers public headers.
Immediate, no need for target-machine, details are fetched via the public getters.

Implementation here is slim, infos coming from LLVM are precise and maintained.

Prior llvm<=10:

The useful data from tablegen descriptor in class MCSubtargetInfo is private without useful accessor.
There is a unhelpful way of passing std::string("help") at the creation of MCSubtargetInfo().
Useful queries can be done, only via full llvm taget-machine, this compatibility is implemented here.

There is a burden of LLVMint() and the target-machine creation, but final check/legalizer is precise w.r.t to the arch.

@junrushao , @kparzysz-quic, @echuraev

There are quite some changes now, please help re-reviewing them.

Thanks.

vvchernov

Hello @cbalint13! Thank you for big work and good improvement and unification of target check. For me it looks like that "avx512bw" is not the best name to split usual avx512 (e.g. for skylake) and avx512 with VNNI (e.g. for cascadelake), may be return "avx512", I do not see places where it uses in other context

python/tvm/target/x86.py

cbalint13 · 2023-09-11T08:57:55Z

Hello @cbalint13! Thank you for big work and good improvement and unification of target check. For me it looks like that "avx512bw" is not the best name to split usual avx512 (e.g. for skylake) and avx512 with VNNI (e.g. for cascadelake), may be return "avx512", I do not see places where it uses in other context

@vvchernov ,

Good question regarding avx512bw !

Please help me with a double check on these statements below:

To my understanding needed in tvm .avx512.pmaddubs.w.512 ,.avx512.pmaddw.d.512 are in avx512bw subset.
See for reference: https://reviews.llvm.org/D11351 , so the avx512bw flag is what we seek within LLVM.

There are also other avx512 subsets, but none holding instructions for our topi/tir intrinsics, on their names here:

"avx512vl", "avx512dq", "avx512cd", "avx512er", "avx512pf", "avx512vbmi", "avx512ifma",
"avx5124vnniw", "avx5124fmaps", "avx512vpopcntdq", "avx512vbmi2","avx512vnni", 
"avx512bitalg", "avx512bf16"

UPDATE: a much clear view on what avx512bw provides: llvm/clang/Basic/BuiltinsX86.def#L1057-L1058 .

Some arches investigated here:

The "skylake" don't have it:

print( tvm.target.codegen.llvm_x86_get_features("skylake") )
["cmov", "mmx", "popcnt", "sse", "sse2", "sse3", "ssse3", "sse4.1", "sse4.2", 
"avx", "avx2", "fma", "bmi", "bmi2", "aes", "pclmul", "adx", "clflushopt", "cx16",
"cx8", "crc32", "f16c", "fsgsbase", "fxsr", "invpcid", "lzcnt", "movbe", "prfchw", 
"rdrnd", "rdseed", "sahf", "sgx", "x87", "xsave", "xsavec", "xsaveopt", "xsaves"]

The "skylake-avx512" have it:

print( tvm.target.codegen.llvm_x86_get_features("skylake-avx512") )
["cmov", "mmx", "popcnt", "sse", "sse2", "sse3", "ssse3", "sse4.1", "sse4.2", "avx", "avx2", "fma",
 "avx512f", "bmi", "bmi2", "aes", "pclmul", "avx512vl", "avx512bw", "avx512dq", "avx512cd", 
"adx", "clflushopt", "clwb", "cx16", "cx8", "crc32", "f16c", "fsgsbase", "fxsr", "invpcid", 
"lzcnt", "movbe", "pku", "prfchw", "rdrnd", "rdseed", "sahf", "x87", "xsave", "xsavec", 
"xsaveopt", "xsaves"]

The "cascadelake" have it:

print( tvm.target.codegen.llvm_x86_get_features("cascadelake") )
["cmov", "mmx", "popcnt", "sse", "sse2", "sse3", "ssse3", "sse4.1", "sse4.2", 
"avx", "avx2", "fma", "avx512f", "bmi", "bmi2", "aes", "pclmul",
 "avx512vl", "avx512bw", "avx512dq", "avx512cd", "avx512vnni",
 "adx", "clflushopt", "clwb", "cx16", "cx8", "crc32", "f16c", "fsgsbase", 
"fxsr", "invpcid", "lzcnt", "movbe", "pku", "prfchw", "rdrnd", "rdseed", 
"sahf", "x87", "xsave", "xsavec", "xsaveopt", "xsaves"]

The "knl" which have some avx512* but not the desired avx512bw:

print( tvm.target.codegen.llvm_x86_get_features("knl") )
["cmov", "mmx", "popcnt", "sse", "sse2", "sse3", "ssse3", "sse4.1", "sse4.2", 
"avx", "avx2", "fma", "avx512f", "bmi", "bmi2", "aes", "pclmul", 
"avx512cd", "avx512er", "avx512pf", "adx", "cx16", "cx8", "crc32", "f16c",
 "fsgsbase", "fxsr", "invpcid", "lzcnt", "movbe", "prefetchwt1", "prfchw", 
"rdrnd", "rdseed", "sahf", "x87", "xsave", "xsaveopt"]

The odd "alderlake" have avxvnni but not avx512bw either avx512vnni:

print( tvm.target.codegen.llvm_x86_get_features("knl") )
["cmov", "mmx", "popcnt", "sse", "sse2", "sse3", "ssse3", "sse4.1", "sse4.2", 
"fma", "bmi", "bmi2", "aes", "pclmul", "gfni", "vpclmulqdq", "adx", "cldemote", 
"clflushopt", "clwb", "cx16", "cx8", "crc32", "f16c", "fsgsbase", "fxsr", "invpcid", 
"widekl", "lzcnt", "movbe", "movdir64b", "movdiri", "pconfig", "pku", "prfchw", 
"ptwrite", "rdpid", "rdrnd", "rdseed", "sahf", "serialize", "sgx", "sha", "shstk", 
"vaes", "waitpkg", "x87", "xsave", "xsavec", "xsaveopt", "xsaves", "hreset", 
"avxvnni"]

BTW, "alderlake" here, is the only "strangeness" having avxvnni but not avx512vnni (intel's early one ?).

The "sapphirerapids" have avx512bw avx512vnni , and amx-int8 among with lot of new interesting things.

print( tvm.target.codegen.llvm_x86_get_features("sapphirerapids") )
["cmov", "mmx", "popcnt", "sse", "sse2", "sse3", "ssse3", "sse4.1", "sse4.2", "avx", "avx2", "fma", 
"avx512f", "bmi", "bmi2", "aes", "pclmul", "avx512vl", "avx512bw", "avx512dq", "avx512cd", 
"avx512vbmi", "avx512ifma", "avx512vpopcntdq", "avx512vbmi2", "gfni", "vpclmulqdq", 
"avx512vnni", "avx512bitalg", "avx512bf16", "adx", "amx-bf16", "amx-int8", "amx-tile", 
"cldemote", "clflushopt", "clwb", "cx16", "cx8", "crc32", "enqcmd", "f16c", "fsgsbase", 
"fxsr", "invpcid", "lzcnt", "movbe", "movdir64b", "movdiri", "pconfig", "pku", "prfchw", 
"ptwrite", "rdpid", "rdrnd", "rdseed", "sahf", "serialize", "sgx", "sha", "shstk", "tsxldtrk", 
"uintr", "vaes", "waitpkg", "wbnoinvd", "x87", "xsave", "xsavec", "xsaveopt", "xsaves", 
"avx512fp16", "avxvnni"]

The full archlist as of llvm=17:

print( tvm.target.codegen.llvm_x86_get_archlist() )
["i386", "i486", "winchip-c6", "winchip2", "c3", "i586", "pentium", "pentium-mmx", "pentiumpro", 
"i686", "pentium2", "pentium3", "pentium3m", "pentium-m", "c3-2", "yonah", "pentium4", "pentium4m", 
"prescott", "nocona", "core2", "penryn", "bonnell", "atom", "silvermont", "slm", "goldmont", 
"goldmont-plus", "tremont", "nehalem", "corei7", "westmere", "sandybridge", "corei7-avx", 
"ivybridge", "core-avx-i", "haswell", "core-avx2", "broadwell", "skylake", "skylake-avx512", 
"skx", "cascadelake", "cooperlake", "cannonlake", "icelake-client", "rocketlake", "icelake-server", 
"tigerlake", "sapphirerapids", "alderlake", "raptorlake", "meteorlake", "sierraforest", "grandridge", 
"graniterapids", "graniterapids-d", "emeraldrapids", "knl", "knm", "lakemont", "k6", "k6-2", "k6-3", 
"athlon", "athlon-tbird", "athlon-xp", "athlon-mp", "athlon-4", "k8", "athlon64", "athlon-fx", "opteron", 
"k8-sse3", "athlon64-sse3", "opteron-sse3", "amdfam10", "barcelona", "btver1", "btver2", "bdver1", 
"bdver2", "bdver3", "bdver4", "znver1", "znver2", "znver3", "znver4", 
"x86-64", "x86-64-v2", "x86-64-v3", "x86-64-v4", "geode"]

I put Cc @Qianshui-Jiang (author of #13642) he might help us with a second opinion in this too.

vvchernov · 2023-09-11T14:34:17Z

Hello @cbalint13! Very hard work! One note I should say TVM needs three avx512 intrinsics: vpmaddwd, vpmaddubsw and vpaddd. For the latter "+" is used but it is automatically replaced by intrinsics by llvm, and it is in avx512f set (I checked it here).

cbalint13 · 2023-09-11T16:45:27Z

Hello @cbalint13! Very hard work! One note I should say TVM needs three avx512 intrinsics: vpmaddwd, vpmaddubsw and vpaddd. For the latter "+" is used but it is automatically replaced by intrinsics by llvm, and it is in avx512f set (I checked it here).

@vvchernov ,

See now, very good point !

Let's ask for booth, will change with these comments:

// avx512f:  llvm.x86.avx512.addpd.w.512 (LLVM auto, added)
// avx512bw: llvm.x86.avx512.pmaddubs.w.512 (TVM required)
           + llvm.x86.avx512.pmaddw.d.512
if target_has_features(["avx512f", "avx512bw"]):

Just as side note (curiosity) on this topic:

"F" would stand for foundation, probably avx512bw implies that CPU also have avx512f (but not vice-versa)
One example of partial avx512 is the "knl" having the avx512f (foundation) part but not having the avx512bw part.
Maybe one day someone can add separate _compute(), _update() for topi (more precise control) not letting LLVM itself.
I would be curious of outcome, in the first (i. case) LLVM would ommit addpd.w.512 by adding something suboptimal.

i. "llvm -mcpu=x86-64 -mattr=+avx512bw"
ii. "llvm -mcpu=x86-64 -mattr=+avx512bw,+avx512f"
As i. versus ii. , "x86-64" is the plainest configuration for llvm, allowing only explicit flags.

Anyway, let's go now with avx512bw && avx512f.

vvchernov · 2023-09-11T18:11:36Z

Hello @elvin-n! May be second opinion from your to double check

cbalint13 · 2023-09-11T20:14:08Z

"F" would stand for foundation, probably avx512bw implies that CPU also have avx512f (but not vice-versa)
One example of partial avx512 is the "knl" having the avx512f (foundation) part but not having the avx512bw part.

@vvchernov ,

One more informal experiment:

I made this experiment below, and results disprove that avx512bw implies avx512f (aka "foundation part").
The x86-64-v4 popping up is kind of "arch-generic", so let's enforce avx512bw && avx512f presence.
For this ODD case, something like llvm -mcpu=x86-64-v4 -mattr=+avx512f passes the test below (checked).

$ cat ./tvm-check-avx512bw.py 
#!/usr/bin/python3

import tvm
from tvm.target import codegen
from tvm.target.x86 import target_has_features

for mcpu in codegen.llvm_x86_get_archlist():
  with tvm.target.Target("llvm -mcpu=%s" % mcpu):
    if target_has_features("avx512bw"):
      has_avx512f = target_has_features("avx512f")
      print("ARCH [%s] having `avx512bw` has avx512f=[%i]" % (mcpu, has_avx512f))

With LLVM=17:

$ ./tvm-check-avx512bw.py
ARCH [skylake-avx512] having `avx512bw` has avx512f=[1]
ARCH [skx] having `avx512bw` has avx512f=[1]
ARCH [cascadelake] having `avx512bw` has avx512f=[1]
ARCH [cooperlake] having `avx512bw` has avx512f=[1]
ARCH [cannonlake] having `avx512bw` has avx512f=[1]
ARCH [icelake-client] having `avx512bw` has avx512f=[1]
ARCH [rocketlake] having `avx512bw` has avx512f=[1]
ARCH [icelake-server] having `avx512bw` has avx512f=[1]
ARCH [tigerlake] having `avx512bw` has avx512f=[1]
ARCH [sapphirerapids] having `avx512bw` has avx512f=[1]
ARCH [graniterapids] having `avx512bw` has avx512f=[1]
ARCH [graniterapids-d] having `avx512bw` has avx512f=[1]
ARCH [emeraldrapids] having `avx512bw` has avx512f=[1]
ARCH [znver4] having `avx512bw` has avx512f=[1]
ARCH [x86-64-v4] having `avx512bw` has avx512f=[0] <--- ODD !!! (but pass with -mattr=+avx512f).

With LLVM=10:

$ ./tvm-check-avx512bw.py 
ARCH [cannonlake] having `avx512bw` has avx512f=[1]
ARCH [cascadelake] having `avx512bw` has avx512f=[1]
ARCH [cooperlake] having `avx512bw` has avx512f=[1]
ARCH [icelake-client] having `avx512bw` has avx512f=[1]
ARCH [icelake-server] having `avx512bw` has avx512f=[1]
ARCH [skx] having `avx512bw` has avx512f=[1]
ARCH [skylake-avx512] having `avx512bw` has avx512f=[1]
ARCH [tigerlake] having `avx512bw` has avx512f=[1]

So once again lets enforce avx512bw && avx512f.

vvchernov

LGTM

Qianshui-Jiang · 2023-09-13T06:50:28Z

@cbalint13 @vvchernov big thanks for your hard work and dicussion! Here is few comments.
Actually before the CasecadeLake we use avx512 to handle the 8bit and 16bit integer, so pmaddubs used here is for SkyLake.

And start from CasecadeLake we have avx512vnni, so there is other more instrcutions like vpdpbusd and vpdpwssd, it fused some of pmadd instruction we used before,

Now when we move to SapphireRapids， we have amx-vnni, which use amx instrctions to handle the 8bit integer.

Yes ur right, avxvnni in AlderLake is due to the lack of its avx512 instruction set in 12th client CPU.

I support to let TVM use features names to decide the schedule method, seems much clear than using arch name.

cbalint13 · 2023-09-13T08:21:25Z

@Qianshui-Jiang ,

Thank you much for your time & clarifications !

cbalint13 · 2023-09-14T18:44:36Z

To sum it up:

compliant across all versions, llvm<=10 issue is solved ( Cc @junrushao ), tested versions: llvm={10,14,16,17}
implementation here exposes single target_has_features(XXX) replacing the old target_has_XXX() functions
it also expose friendly llvm_x86_get_archlist() and llvm_x86_get_features(arch) to obtain full arch related infos

There is a comprehensive test unit that also checks the old behaviour (by incorporating the old static lookup table).

This is the final state, passing the CI, until explicit review-requests.

junrushao · 2023-09-14T19:06:08Z

I am very excited about this feature and cannot wait to try it out myself! Thank you @cbalint13 for this super well-documented and well-tested PR, and it's going to be super useful for downstream applications!

cbalint13 · 2023-09-14T19:19:49Z

I am very excited about this feature and cannot wait to try it out myself! Thank you @cbalint13 for this super well-documented and well-tested PR, and it's going to be super useful for downstream applications!

Was just a simple idea of utility, crediting the work already done before: @vvchernov , @Qianshui-Jiang and @elvin-n (the original lookup).

Thanks folks !

kparzysz-quic approved these changes Sep 6, 2023

View reviewed changes

junrushao approved these changes Sep 6, 2023

View reviewed changes

junrushao reviewed Sep 6, 2023

View reviewed changes

src/target/llvm/llvm_module.cc Outdated Show resolved Hide resolved

echuraev approved these changes Sep 7, 2023

View reviewed changes

cbalint13 requested review from junrushao, kparzysz-quic and echuraev September 10, 2023 15:44

vvchernov reviewed Sep 11, 2023

View reviewed changes

python/tvm/target/x86.py Outdated Show resolved Hide resolved

cbalint13 requested a review from vvchernov September 11, 2023 12:20

junrushao mentioned this pull request Sep 12, 2023

[Target][Device] Auto detect target and create device from str in torch style #15714

Merged

vvchernov approved these changes Sep 13, 2023

View reviewed changes

[Target][TOPI] Use LLVM for x86 CPU feature lookup

4c17af1

junrushao merged commit 67df20f into apache:main Sep 14, 2023

cbalint13 mentioned this pull request Sep 15, 2023

[Target] LLVM helper functions for any target info #15761

Merged

ysh329 mentioned this pull request Oct 18, 2023

[Release] v0.14.0 Release Candidate Notes #15948

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Target][TOPI] Use LLVM for x86 CPU feature lookup #15685

[Target][TOPI] Use LLVM for x86 CPU feature lookup #15685

cbalint13 commented Sep 6, 2023 •

edited

Loading

kparzysz-quic left a comment

junrushao left a comment

echuraev left a comment

junrushao commented Sep 7, 2023

cbalint13 commented Sep 8, 2023 •

edited

Loading

cbalint13 commented Sep 10, 2023 •

edited

Loading

vvchernov left a comment

cbalint13 commented Sep 11, 2023 •

edited

Loading

vvchernov commented Sep 11, 2023

cbalint13 commented Sep 11, 2023 •

edited

Loading

vvchernov commented Sep 11, 2023

cbalint13 commented Sep 11, 2023 •

edited

Loading

vvchernov left a comment

Qianshui-Jiang commented Sep 13, 2023 •

edited

Loading

cbalint13 commented Sep 13, 2023

cbalint13 commented Sep 14, 2023 •

edited

Loading

junrushao commented Sep 14, 2023

cbalint13 commented Sep 14, 2023

[Target][TOPI] Use LLVM for x86 CPU feature lookup #15685

[Target][TOPI] Use LLVM for x86 CPU feature lookup #15685

Conversation

cbalint13 commented Sep 6, 2023 • edited Loading

kparzysz-quic left a comment

Choose a reason for hiding this comment

junrushao left a comment

Choose a reason for hiding this comment

echuraev left a comment

Choose a reason for hiding this comment

junrushao commented Sep 7, 2023

cbalint13 commented Sep 8, 2023 • edited Loading

cbalint13 commented Sep 10, 2023 • edited Loading

vvchernov left a comment

Choose a reason for hiding this comment

cbalint13 commented Sep 11, 2023 • edited Loading

vvchernov commented Sep 11, 2023

cbalint13 commented Sep 11, 2023 • edited Loading

vvchernov commented Sep 11, 2023

cbalint13 commented Sep 11, 2023 • edited Loading

vvchernov left a comment

Choose a reason for hiding this comment

Qianshui-Jiang commented Sep 13, 2023 • edited Loading

cbalint13 commented Sep 13, 2023

cbalint13 commented Sep 14, 2023 • edited Loading

junrushao commented Sep 14, 2023

cbalint13 commented Sep 14, 2023

cbalint13 commented Sep 6, 2023 •

edited

Loading

cbalint13 commented Sep 8, 2023 •

edited

Loading

cbalint13 commented Sep 10, 2023 •

edited

Loading

cbalint13 commented Sep 11, 2023 •

edited

Loading

cbalint13 commented Sep 11, 2023 •

edited

Loading

cbalint13 commented Sep 11, 2023 •

edited

Loading

Qianshui-Jiang commented Sep 13, 2023 •

edited

Loading

cbalint13 commented Sep 14, 2023 •

edited

Loading