-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable logf128 constant folding for hosts with 128bit long double #96287
Conversation
AArch64 has a long double bit length of 128. Therefore, it can benefit from constant fp128 folding. GCC versions more recent than version 12 must use the _Float128 type for logf128.
c42e7e4
to
4cc6905
Compare
@llvm/pr-subscribers-llvm-adt @llvm/pr-subscribers-llvm-analysis Author: Matthew Devereau (MDevereau) ChangesAArch64 has a long double bit length of 128. Therefore, it can benefit from constant fp128 folding. GCC versions more recent than version 12 must use the _Float128 type for logf128. Full diff: https://github.com/llvm/llvm-project/pull/96287.diff 2 Files Affected:
diff --git a/llvm/include/llvm/Support/float128.h b/llvm/include/llvm/Support/float128.h
index e15a98dc5a677..f1931c145f0fc 100644
--- a/llvm/include/llvm/Support/float128.h
+++ b/llvm/include/llvm/Support/float128.h
@@ -11,7 +11,14 @@
namespace llvm {
-#if defined(__clang__) && defined(__FLOAT128__) && \
+#if defined(__aarch64__)
+#define HAS_IEE754_FLOAT128
+#if (defined(__GNUC__) && __GNUC__ > 12)
+typedef _Float128 float128;
+#else
+typedef long double float128;
+#endif
+#elif defined(__clang__) && defined(__FLOAT128__) && \
defined(__SIZEOF_INT128__) && !defined(__LONG_DOUBLE_IBM128__)
#define HAS_IEE754_FLOAT128
typedef __float128 float128;
diff --git a/llvm/lib/Analysis/ConstantFolding.cpp b/llvm/lib/Analysis/ConstantFolding.cpp
index 512d1aadd5346..4329d2e9a0556 100644
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -1784,8 +1784,8 @@ Constant *ConstantFoldFP(double (*NativeFP)(double), const APFloat &V,
}
#if defined(HAS_IEE754_FLOAT128) && defined(HAS_LOGF128)
-Constant *ConstantFoldFP128(long double (*NativeFP)(long double),
- const APFloat &V, Type *Ty) {
+Constant *ConstantFoldFP128(float128 (*NativeFP)(float128), const APFloat &V,
+ Type *Ty) {
llvm_fenv_clearexcept();
float128 Result = NativeFP(V.convertToQuad());
if (llvm_fenv_testexcept()) {
|
I don't understand how this works. logf128 is defined to take a _Float128, by the C standard. long double is not _Float128, and is not compatible with _Float128. Therefore, it should not be possible to implicitly convert a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logf128 is defined to take a _Float128, by the C standard. long double is not _Float128, and is not compatible with _Float128. Therefore, it should not be possible to implicitly convert a _Float128()(_Float128) to long double ()(long double).
Hi - Can you explain more about how they are not compatible? Do you just mean at a type/C level? I thought that if they were the same size they would operate in the same way (i.e logl is just an alias to logf128 in libm on the system I have).
llvm/include/llvm/Support/float128.h
Outdated
@@ -11,7 +11,14 @@ | |||
|
|||
namespace llvm { | |||
|
|||
#if defined(__clang__) && defined(__FLOAT128__) && \ | |||
#if defined(__aarch64__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect we might need to be careful of systems that have long-double==double, such as windows and darwin targets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a check for the mantissa size now which should help with this.
This was the rationale for the typedef of _Float128 to float128 for gcc versions higher than 12 and why the changes in ConstantFolding.cpp were warranted. It appears pre gcc13, logf128 did not use the _Float128 type but rather an alias of long double or __float128. What this typedef does is remove the implicit conversion which gives an error message. You can see this was not an issue in gcc12 |
Oh, I see, gcc 13 has _Float128, and on older versions glibc typedefs it to long double. Can we make this not AArch64-specific? It seems like it affects any target where long double is 128 its. |
I'm not opposed to the idea of making this non AArch64-specific, but on further inspection it seems logf128 using _Float128/__float128/long double isn't consistent with GCC/Clang versions across hosts. While trying to figure out which host/compiler combinations were appropriate to use, it quickly got quite confusing. Including math.h seems to typedef long double to _Float128 under certain circumstances, but support for the original untypedef'd types seems to vary. AArch64:
x86:
Where it gets interesting is when using these types as parameters to logf128 from math.h. In GCC-13 and over logf128 appears to be a builtin which takes _Float128 and does not allow implicit conversions, but on x86 logf128 seems to throw when given long double. I then ran this logf128 function pointer test (https://godbolt.org/z/GvM78hbbh) for the same host/target combinations and got these results: AArch64:
x86:
So while x86 has a long double length of 128 bits, I don't think it's ok to enable this based on long double information alone. It looks like the only way to stay sane in this scenario is to do it on a per host and per compiler basis. |
I'd still prefer to use some sort of detection. If you can't figure out a way to do it in the header itself, please use CMake tests. |
Also enable optimisation without cmake opt-in
I've added detection of long double size. Confusingly some x86 targets defined long double as 16 bytes but are actually fp80 types, so I've checked the size of the mantissa too. I've also removed the CMake opt-in and detect logf128 by default now. |
llvm/include/llvm/Support/float128.h
Outdated
#if !defined(__LONG_DOUBLE_IBM128__) && (__SIZEOF_LONG_DOUBLE__ == 16) && \ | ||
(__SIZEOF_INT128__ == 16) && (__LDBL_MANT_DIG__ == 113) | ||
#define HAS_IEE754_FLOAT128 | ||
#if (defined(__GNUC__) && __GNUC__ > 12) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of explicitly checking the gcc version, can we typedef decltype(logf128(0.)) float128;
or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would require including math.h in this file (which isn't a problem but worth mentioning). It's also possible for hosts to satisfy the condition and fall through to this but not have logf128 (I've seen x86 with clang do this). It's possible to add HAS_LOGF128 to the check, which is set at CMake time in order to get around this, but this isn't really about logf128 and more about the 128 bit floating-point types themselves. Granted, my logf128 patches are the only thing using this, but it still seems incorrect.
Apart from this, it does function well though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mostly concerned that checking for __GNUC__
is overly specific, and might break on compilers you haven't seen, or compilers which don't exist yet (e.g. if clang's support for _Float128 changes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've instead decided to move to a CMake test which checks the availability of logf128, the availability of a true 128bit float type and that type's compatibility with the logf128 symbol to receive accurate answers instead of trying to do this inside a header file.
A problem I faced when trying to run this on an x86 machine was that the size of long doubles were defined to be 16 bytes (when they are actually fp80) however calling logf128 with this type caused error codes to be returned from the function while compiling without issue. Using __float128 which is guaranteed to perform a 128 bit calculation even though the machine uses fp80 works OK though. I'm seeing both x86 with fp80 and aarch64 compile and run this without issues now. Instead of explicitly checking the GCC version, the _Float128 type which is available in GCC12+ is now asserted for availability and correctness and used if it is available.
llvm/cmake/config-ix.cmake
Outdated
file(WRITE "${CMAKE_CURRENT_BINARY_DIR}/logf128_long_double.cpp" | ||
" | ||
extern \"C\" { | ||
long double logf128(long double); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd prefer to have CMake tests which #include <cmath>
, to verify that the header actually works.
Along those lines, I don't think float128.h should be declaring logf128.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I ended up doing things this way is because Clang on x86 doesn't pick up the logf128 symbol when including cmath. Clang just doesn't seem to have the magic definitions to unlock it like GCC does. It's probably okay to include cmath for long double and _Float128, but for clang I think we do need to explicitly declare the function prototype to get it working on x86.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the initial patch, let's just handle the case where logf128 is actually declared in math.h. The case where glibc actually provides logf128, but doesn't declare it, is a weird edge case, and I'd prefer to handle it in a followup patch.
llvm/cmake/config-ix.cmake
Outdated
unset(LOGF128_TEST_RUN CACHE) | ||
unset(LOGF128_TEST_COMPILE CACHE) | ||
try_run( | ||
LOGF128_TEST_RUN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using try_run tends to lead to strange results with cross-compiling; do we really need it here? _Float128/__float128 should always mean what we want it to, and we can use __LDBL_MANT_DIG__
for long double.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x86 hosts with 80bit long doubles can compile logf128 but will return an error code. __LDBL_MANT_DIG__
does work fine for hosts with 128bit long doubles, but this will also disable the fold for x86 targets with fp80 that can still perform an accurate fold with __float128 types. I'm struggling to think of an alternative way to select the most valid and best type at compile time to use without try_run based on the following criteria:
Logf128 symbol availability
Logf128 function prototype availability
Long double total size (Needs to be 16 bytes)
Long double mantissa size (Rules out fp80)
GCC version(12+ won't accept anything other than _Float128 for logf128)
If we're compiling with GCC or Clang (Clang needs to declare the logf128 prototype)
__float128 availability
_Float128 availability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You only need to check __LDBL_MANT_DIG__
if you're trying to use the type long double
. You can assume __float128
and _Float128
have the right format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Firstly, thanks for your patience with this review.
I'm still unsure how to differentiate in a header file between _Float128
and long double
for GCC12+, where trying to pass in anything other than GCC 12+'s dedicated _Float128 type will cause a compilation error. A condition along the lines of
#if (__LDBL_MANT_DIG__ == 113) && !defined(__LONG_DOUBLE_IBM128__) && (__SIZEOF_INT128__ == 16)
typedef long double float128;
#else
will be valid for GCC12 and GCC11 while it is actually only valid for GCC11: long double isn't valid for logf128 anymore after GCC12, but the host still has a long double mantissa size of 113.
I think you're suggesting I do typedef decltype(logf128(0.)) float128;
only when long double is known to have a mantissa of 113, and worry about hosts that have access to float128 but have a manstissa of not 113 in a following patch. I just wanted to double check.
llvm/include/llvm/Support/float128.h
Outdated
!defined(__LONG_DOUBLE_IBM128__) && \ | ||
(defined(__GNUC__) || defined(__GNUG__)) | ||
#ifdef HAS_LOGF128 | ||
#if (__LDBL_MANT_DIG__ == 113) && !defined(__LONG_DOUBLE_IBM128__) && \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the check for __LDBL_MANT_DIG__
necessary with the current version?
Otherwise, this approach seems fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I was suggesting LDBL_MANT_DIG with the old version that explicitly wrote out the type "long double". It should be safe to assume that if logf128 returns a long double, then long double is actually an IEEE 128-bit float.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late update, I've been away for a few weeks. I believe its unnecessary in this version, if HAS_LOGF128
is true the logf128 function prototype should return a valid IEEE 128-bit float.
I believe this will expose a problem I've seen where x86 targets with fp80 will return a slightly different result from AArch64. I.e., AArch64's 0xL00000000000000007FFF800000000000
vs x86's 0xL0000000000000000FFFF800000000000
for the Constant folding test log_e_negative_infinity and a few others. I'm not sure if this is due to a target specific implementation of logf128 currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/76/builds/1703 Here is the relevant piece of the build log for the reference:
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/66/builds/2673 Here is the relevant piece of the build log for the reference:
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/35/builds/1741 Here is the relevant piece of the build log for the reference:
|
Reverted to fix the buildbot failures. Before reapplying this, please also make sure that you fix the 1.5% regression to clang build time this causes. Presumably this is due to including |
Thanks. |
Hosts which support a float size of 128 bits can benefit from constant fp128 folding.
* 'main' of https://github.com/llvm/llvm-project: (700 commits) [SandboxIR][NFC] SingleLLVMInstructionImpl class (llvm#102687) [ThinLTO]Clean up 'import-assume-unique-local' flag. (llvm#102424) [nsan] Make #include more conventional [SandboxIR][NFC] Use Tracker.emplaceIfTracking() [libc] Moved range_reduction_double ifdef statement (llvm#102659) [libc] Fix CFP long double and add tests (llvm#102660) [TargetLowering] Handle vector types in expandFixedPointMul (llvm#102635) [compiler-rt][NFC] Replace environment variable with %t (llvm#102197) [UnitTests] Convert a test to use opaque pointers (llvm#102668) [CodeGen][NFCI] Don't re-implement parts of ASTContext::getIntWidth (llvm#101765) [SandboxIR] Clean up tracking code with the help of emplaceIfTracking() (llvm#102406) [mlir][bazel] remove extra blanks in mlir-tblgen test [NVPTX][NFC] Update tests to use bfloat type (llvm#101493) [mlir] Add support for parsing nested PassPipelineOptions (llvm#101118) [mlir][bazel] add missing td dependency in mlir-tblgen test [flang][cuda] Fix lib dependency [libc] Clean up remaining use of *_WIDTH macros in printf (llvm#102679) [flang][cuda] Convert cuf.alloc for box to fir.alloca in device context (llvm#102662) [SandboxIR] Implement the InsertElementInst class (llvm#102404) [libc] Fix use of cpp::numeric_limits<...>::digits (llvm#102674) [mlir][ODS] Verify type constraints in Types and Attributes (llvm#102326) [LTO] enable `ObjCARCContractPass` only on optimized build (llvm#101114) [mlir][ODS] Consistent `cppType` / `cppClassName` usage (llvm#102657) [lldb] Move definition of SBSaveCoreOptions dtor out of header (llvm#102539) [libc] Use cpp::numeric_limits in preference to C23 <limits.h> macros (llvm#102665) [clang] Implement -fptrauth-auth-traps. (llvm#102417) [LLVM][rtsan] rtsan transform to preserve CFGAnalyses (llvm#102651) Revert "[AMDGPU] Move `AMDGPUAttributorPass` to full LTO post link stage (llvm#102086)" [RISCV][GISel] Add missing tests for G_CTLZ/CTTZ instruction selection. NFC Return available function types for BindingDecls. (llvm#102196) [clang] Wire -fptrauth-returns to "ptrauth-returns" fn attribute. (llvm#102416) [RISCV] Remove riscv-experimental-rv64-legal-i32. (llvm#102509) [RISCV] Move PseudoVSET(I)VLI expansion to use PseudoInstExpansion. (llvm#102496) [NVPTX] support switch statement with brx.idx (reland) (llvm#102550) [libc][newhdrgen]sorted function names in yaml (llvm#102544) [GlobalIsel] Combine G_ADD and G_SUB with constants (llvm#97771) Suppress spurious warnings due to R_RISCV_SET_ULEB128 [scudo] Separated committed and decommitted entries. (llvm#101409) [MIPS] Fix missing ANDI optimization (llvm#97689) [Clang] Add env var for nvptx-arch/amdgpu-arch timeout (llvm#102521) [asan] Switch allocator to dynamic base address (llvm#98511) [AMDGPU] Move `AMDGPUAttributorPass` to full LTO post link stage (llvm#102086) [libc][math][c23] Add fadd{l,f128} C23 math functions (llvm#102531) [mlir][bazel] revert bazel rule change for DLTITransformOps [msan] Support vst{2,3,4}_lane instructions (llvm#101215) Revert "[MLIR][DLTI][Transform] Introduce transform.dlti.query (llvm#101561)" [X86] pr57673.ll - generate MIR test checks [mlir][vector][test] Split tests from vector-transfer-flatten.mlir (llvm#102584) [mlir][bazel] add bazel rule for DLTITransformOps OpenMPOpt: Remove dead include [IR] Add method to GlobalVariable to change type of initializer. (llvm#102553) [flang][cuda] Force default allocator in device code (llvm#102238) [llvm] Construct SmallVector<SDValue> with ArrayRef (NFC) (llvm#102578) [MLIR][DLTI][Transform] Introduce transform.dlti.query (llvm#101561) [AMDGPU][AsmParser][NFC] Remove a misleading comment. (llvm#102604) [Arm][AArch64][Clang] Respect function's branch protection attributes. (llvm#101978) [mlir] Verifier: steal bit to track seen instead of set. (llvm#102626) [Clang] Fix Handling of Init Capture with Parameter Packs in LambdaScopeForCallOperatorInstantiationRAII (llvm#100766) [X86] Convert truncsat clamping patterns to use SDPatternMatch. NFC. [gn] Give two scripts argparse.RawDescriptionHelpFormatter [bazel] Add missing dep for the SPIRVToLLVM target [Clang] Simplify specifying passes via -Xoffload-linker (llvm#102483) [bazel] Port for d45de80 [SelectionDAG] Use unaligned store/load to move AVX registers onto stack for `insertelement` (llvm#82130) [Clang][OMPX] Add the code generation for multi-dim `num_teams` (llvm#101407) [ARM] Regenerate big-endian-vmov.ll. NFC [AMDGPU][AsmParser][NFCI] All NamedIntOperands to be of the i32 type. (llvm#102616) [libc][math][c23] Add totalorderl function. (llvm#102564) [mlir][spirv] Support `memref` in `convert-to-spirv` pass (llvm#102534) [MLIR][GPU-LLVM] Convert `gpu.func` to `llvm.func` (llvm#101664) Fix a unit test input file (llvm#102567) [llvm-readobj][COFF] Dump hybrid objects for ARM64X files. (llvm#102245) AMDGPU/NewPM: Port SIFixSGPRCopies to new pass manager (llvm#102614) [MemoryBuiltins] Simplify getCalledFunction() helper (NFC) [AArch64] Add invalid 1 x vscale costs for reductions and reduction-operations. (llvm#102105) [MemoryBuiltins] Handle allocator attributes on call-site LSV/test/AArch64: add missing lit.local.cfg; fix build (llvm#102607) Revert "Enable logf128 constant folding for hosts with 128bit floats (llvm#96287)" [RISCV] Add Syntacore SCR5 RV32/64 processors definition (llvm#102285) [InstCombine] Remove unnecessary RUN line from test (NFC) [flang][OpenMP] Handle multiple ranges in `num_teams` clause (llvm#102535) [mlir][vector] Add tests for scalable vectors in one-shot-bufferize.mlir (llvm#102361) [mlir][vector] Disable `vector.matrix_multiply` for scalable vectors (llvm#102573) [clang] Implement CWG2627 Bit-fields and narrowing conversions (llvm#78112) [NFC] Use references to avoid copying (llvm#99863) Revert "[mlir][ArmSME] Pattern to swap shape_cast(tranpose) with transpose(shape_cast) (llvm#100731)" (llvm#102457) [IRBuilder] Generate nuw GEPs for struct member accesses (llvm#99538) [bazel] Port for 9b06e25 [CodeGen][NewPM] Improve start/stop pass error message CodeGenPassBuilder (llvm#102591) [AArch64] Implement TRBMPAM_EL1 system register (llvm#102485) [InstCombine] Fixing wrong select folding in vectors with undef elements (llvm#102244) [AArch64] Sink operands to fmuladd. (llvm#102297) LSV: document hang reported in llvm#37865 (llvm#102479) Enable logf128 constant folding for hosts with 128bit floats (llvm#96287) [RISCV][clang] Remove bfloat base type in non-zvfbfmin vcreate (llvm#102146) [RISCV][clang] Add missing `zvfbfmin` to `vget_v` intrinsic (llvm#102149) [mlir][vector] Add mask elimination transform (llvm#99314) [Clang][Interp] Fix display of syntactically-invalid note for member function calls (llvm#102170) [bazel] Port for 3fffa6d [DebugInfo][RemoveDIs] Use iterator-inserters in clang (llvm#102006) ... Signed-off-by: Edwiin Kusuma Jaya <kutemeikito0905@gmail.com>
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/127/builds/501 Here is the relevant piece of the build log for the reference:
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/49/builds/302 Here is the relevant piece of the build log for the reference:
|
This is a reland of (llvm#96287). This change makes tests in logf128.ll ignore the sign of NaNs for negative value tests and moves an #include <cmath> to be blocked behind #ifndef _GLIBCXX_MATH_H.
This is a reland of #96287. This change makes tests in logf128.ll ignore the sign of NaNs for negative value tests and moves an #include <cmath> to be blocked behind #ifndef _GLIBCXX_MATH_H.
This is a reland of (llvm#96287). This patch attempts to reduce clang's compile time by removing #includes of float128.h and inlining convertToQuad functions instead.
This is a reland of (llvm#96287). This patch attempts to reduce clang's compile time by removing #includes of float128.h and inlining convertToQuad functions instead. Use Api.extractBitsAsZExtValue Remove stray semi-colon
…vm#104929) This is a reland of (llvm#96287). This patch attempts to reduce the reverted patch's clang compile time by removing #includes of float128.h and inlining convertToQuad functions instead.
@MDevereau sadly the relanded patch has caused regression while compiling gawk on aarch64, I am attaching the failing test file and script to run
|
Thanks for the report. It looks like there is an unguarded use of TLI that might have been nullptr from certain passes. I've added 83a5c7c that hopefully addresses it. |
thats for quick turn around. this patch fixed the problem |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/27/builds/215 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/10/builds/33 Here is the relevant piece of the build log for the reference
|
Hosts with a long double size of 128 bits can benefit from constant fp128 folding. GCC versions more recent than version 12 must use the _Float128 type for logf128.