Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Add support for -foffload-fp32-prec-div/sqrt options. #15836

Open
wants to merge 25 commits into
base: sycl
Choose a base branch
from

Conversation

zahiraam
Copy link
Contributor

@zahiraam zahiraam commented Oct 23, 2024

Add support for options -f[no]-offload-fp32-prec-div and -f[no-]-offload-fp32-prec-sqrt.
These options are added to allow users to control whether fdiv and sqrt operations in offload device code are required to return correctly rounded results. In order to communicate this to the device code, we need the front end to generate IR that reflects the choice.

When the correctly rounded setting is used, we can just generate the fdiv instruction and llvm.sqrt intrinsic, because these operations are required to be correctly rounded by default in LLVM IR.

When the result is not required to be correctly rounded, the front end should generate a call to the llvm.fpbuiltin.fdiv or llvm.fpbuiltin.sqrt intrinsic with the fpbuiltin-max-error attribute set. For single precision fdiv, the setting should be 2.5. For single-precision sqrt, the setting should be 3.0.

If the -ffp-accuracy option is used, we should issue warnings if the settings conflict with an explicitly set -foffload-fp32-prec-div or -foffload-fp32-prec-sqrt option.

@zahiraam zahiraam changed the title Add support for -ftarget-prec-div/sqrt options. Add support for -foffload-fp32-prec-div/sqrt options. Oct 24, 2024
@zahiraam zahiraam marked this pull request as ready for review October 28, 2024 17:25
@zahiraam zahiraam requested review from a team as code owners October 28, 2024 17:25
@zahiraam zahiraam changed the title Add support for -foffload-fp32-prec-div/sqrt options. [SYCL] Add support for -foffload-fp32-prec-div/sqrt options. Oct 29, 2024
clang/lib/Driver/ToolChains/Clang.cpp Outdated Show resolved Hide resolved
Comment on lines 1747 to 1750
if (Args.getLastArg(options::OPT_fno_offload_fp32_prec_div))
CmdArgs.push_back("-fno-offload-fp32-prec-div");
else
CmdArgs.push_back("-foffload-fp32-prec-div");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (Args.getLastArg(options::OPT_fno_offload_fp32_prec_div))
CmdArgs.push_back("-fno-offload-fp32-prec-div");
else
CmdArgs.push_back("-foffload-fp32-prec-div");
if (!Args.hasFlag(option::OPT_foffload_fp32_prec_div,
option::OPT_fno_offload_fp32_prec_div, true))
CmdArgs.push_back("-fno-offload-fp32-prec-div");

Since -foffload-fp32-prec-div is default

if (Args.getLastArg(options::OPT_fno_offload_fp32_prec_sqrt))
CmdArgs.push_back("-fno-offload-fp32-prec-sqrt");
else
CmdArgs.push_back("-foffload-fp32-prec-sqrt");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar comment to above.

Copy link
Contributor

@elizabethandrews elizabethandrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@premanandrao can you review this please?

function instead of adding a JobAction to handle it.
Copy link
Contributor

@MrSidims MrSidims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has to withdraw my review as have 2 questions.

(FuncName == "sqrt" && !getLangOpts().OffloadFP32PrecSqrt &&
IsFloat32Type);
bool isFP32FdivFunction =
(FuncName == "fdiv" && !getLangOpts().OffloadFP32PrecDiv &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually though, that the request is done to replace fdiv instruction with the intrinsic, not fdiv function. Do we know if users actually use such function? I don't see any mentioning of it in SYCL or OpenCL specifications.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmlueck could you please comment on that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent of -foffload-fp32-prev-div is to affect the native divide operation (i.e. /). There is no SYCL function named fdiv. Is there a standard C / C++ function with that name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK there is no standard function float FP division. There is std::div, but it works only on integers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no C/C++ fdiv function.

bool hasFPAccuracyFuncMap = hasAccuracyRequirement(FuncName);
bool hasFPAccuracyVal = !getLangOpts().FPAccuracyVal.empty();
bool isFp32SqrtFunction =
(FuncName == "sqrt" && !getLangOpts().OffloadFP32PrecSqrt &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we compare with un-mangled sqrt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FuncName is the output of FD->getName() which returns a simple identifier. https://github.com/intel/llvm/blob/sycl/clang/include/clang/AST/Decl.h#L280

Copy link
Contributor

@MrSidims MrSidims Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So clang/test/CodeGenSYCL/offload-fp32-div-sqrt.cpp will pass even with extern "C" removed from sqrt function declaration?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the user has a function in their own namespace that happens to be named "sqrt"?

Comment on lines 3023 to 3025
bool IsFp32PrecDivSqrtAllowed = JA.isDeviceOffloading(Action::OFK_SYCL) &&
!JA.isDeviceOffloading(Action::OFK_Cuda) &&
!JA.isOffloading(Action::OFK_HIP);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offlne, something like:

Suggested change
bool IsFp32PrecDivSqrtAllowed = JA.isDeviceOffloading(Action::OFK_SYCL) &&
!JA.isDeviceOffloading(Action::OFK_Cuda) &&
!JA.isOffloading(Action::OFK_HIP);
bool IsFp32PrecDivSqrtAllowed = JA.isDeviceOffloading(Action::OFK_SYCL) &&
TC.getTriple().isSPIROrSPIRV();

instruction gets the precision set instead of the fdiv function.
@zahiraam
Copy link
Contributor Author

@MrSidims and @gmlueck I have removed the restriction for CUDA/HIP. My understanding is that @MrSidims will make changes so that the precision for 3.0 is allowed for the sqrt function. Is that the case?
I have also changed the code so that / has precision set with the options instead of fdiv.
Please let me know if these are the changes you expected.

@MrSidims
Copy link
Contributor

that @MrSidims will make changes so that the precision for 3.0 is allowed for the sqrt function

In email thread I've replied, that I'm planning to take care of the precise option propagating to CUDA and HIP drivers. I can take a look what should be done for the implementation of non-precise intrinsics.

Copy link
Contributor

@MrSidims MrSidims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming that the code doesn't affect C stdlib's div function, see the comment above.

// ROUNDED-SQRT-PREC-DIV: call reassoc nnan ninf nsz arcp afn float @llvm.fpbuiltin.sqrt.f32(float {{.*}}) #[[ATTR_SQRT:[0-9]+]]
// ROUNDED-DIV-PREC-SQRT: call reassoc nnan ninf nsz arcp afn spir_func nofpclass(nan inf) float @sqrt(float noundef nofpclass(nan inf) {{.*}})
// ROUNDED-DIV-ROUNDED-SQRT-FAST: call reassoc nnan ninf nsz arcp afn float @llvm.fpbuiltin.sqrt.f32(float {{.*}}) #[[ATTR_SQRT:[0-9]+]]
// LOW-PREC-DIV: call float @llvm.fpbuiltin.sqrt.f32(float {{.*}}) #[[ATTR_SQRT_LOW:[0-9]+]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to check if I understand this correctly. In this case we pass: -fno-offload-fp32-prec-div -ffp-builtin-accuracy=high flags. And 1.0 ULP fpbuiltin-error attribute for llvm.fpbuiltin.sqrt.f32 was generated in response of ffp-builtin-accuracy=high flag and we don't expect here precise calculations as high != precise, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct.

StringRef FPAccuracyVal = llvm::fp::getAccuracyForFPBuiltin(
ID, FuncType, convertFPAccuracy(getLangOpts().FPAccuracyVal));
StringRef FPAccuracyVal;
if (!getLangOpts().OffloadFP32PrecDiv && Name == "div")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not that familiar with the code, so would like to double check - "div" is just an alias to "llvm.fpbuiltin.fdiv" or is it some function call @div(...). If it's an intrinsic, then why renaming from fdiv to div was required?

Basically I'm asking if these lines may affect C stdlib's div function that have 'long integer' operands. If you say that this code won't affect it - that would be enough explanation for me :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes div is just an alias. You can look at the code in ScalarExprEmitter::EmitDiv at about line #3788 that the intrinsic's ID is set to llvm::Intrinsic::fpbuiltin_fdiv. The call to CreateBuiltinCallWithAttr is made with the hard code name div.
In order to remove this confusion. I will name it back to fdiv as it was. I thought that it would be clearer to rename it div so that we know that this is the / function instead of the fdiv function that responds to these options.

@zahiraam zahiraam requested a review from a team as a code owner November 25, 2024 16:10
@zahiraam
Copy link
Contributor Author

The FE work has been approved. But this is an attempt to fix the LIT test DeviceLib/cmath_test.cpp. I am not really sure the change is correct. Will revert it LIT fail persists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants