-
Notifications
You must be signed in to change notification settings - Fork 751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Driver] fsycl-device-only does not offload, target triple is used to define programing model #1814
Comments
I'm not sure if it's the same issue, but I also notice the difference in how device code is emitted in regular mode ( I wrote a simple SYCL app, which relies on CTAD and it doesn't compile in "device-only" mode, but compiles fine in regular mode. #include <CL/sycl.hpp>
using namespace sycl;
#define SIZE 1024
int main() {
queue q;
buffer<int, 1> result{SIZE};
q.submit([&](handler &cgh) {
auto Accessor = result.get_access<access::mode::write>(cgh);
cgh.parallel_for<class test_kernel>(range{SIZE}, [=](id<1> WIid) {
Accessor[WIid] = 42;
});
});
auto hostAccessor = result.get_access<access::mode::read>();
} clang++ -fsycl -c /tmp/test.cpp -o /tmp/test.o
; okay
clang++ -fsycl -fsycl-device-only /tmp/test.cpp -c -o /tmp/test.o
/tmp/test.cpp:10:41: error: use of class template 'range' requires template arguments
cgh.parallel_for<class test_kernel>(range{SIZE}, [=](id<1> WIid) {
^
include/sycl/CL/sycl/item.hpp:26:33: note: template is declared here
template <int dimensions> class range;
~~~~~~~~~~~~~~~~~~~~~~~~~ ^
1 error generated. When I dump compiler command I see significant difference in how device compiler is invoked. clang++ -fsycl -c /tmp/test.cpp -o /tmp/test.o -### clang++ -fsycl -fsycl-device-only /tmp/test.cpp -c -o /tmp/test.o -### In "device-only" mode driver does not set C++ and SYCL standard versions. @mdtoguchi, is this caused by the same issue reported by @Naghasan here or I should open another one? |
To add a littile bit, because the For instance here: https://github.com/intel/llvm/blob/sycl/clang/lib/Driver/ToolChains/Clang.cpp#L1225 if (JA.isOffloading(Action::OFK_SYCL) ||
Args.hasArg(options::OPT_fsycl_device_only)) IMO this really just be if (JA.isOffloading(Action::OFK_SYCL)) But what is really odd is that the result of bool IsSYCLOffloadDevice = JA.isDeviceOffloading(Action::OFK_SYCL); (in https://github.com/intel/llvm/blob/sycl/clang/lib/Driver/ToolChains/Clang.cpp#L3945) bool IsSYCLDevice = (RawTriple.getEnvironment() == llvm::Triple::SYCLDevice);
// Using just the sycldevice environment is not enough to determine usage
// of the device triple when considering fat static archives. The
// compilation path requires the host object to be fed into the partial link
// step, and being part of the SYCL tool chain causes the incorrect target.
// FIXME - Is it possible to retain host environment when on a target
// device toolchain.
bool UseSYCLTriple = IsSYCLDevice && (!IsSYCL || IsSYCLOffloadDevice); And I think a more stable and general approach would be to follow the CUDA approach. Last time I looked into the code, the cuda mode is always building host and device actions and if only the device actions are required (e.g. using |
@Naghasan, I largely agree with the suggestions you've made in your latest comment; I believe unifying the "offloading"/"device offloading" semantics is both doable and useful. @pvchupin mentioned that your team's contributions are blocked by the existing approach; could you please elaborate on the particular difficulties caused/the desired timeline? Myself, I should be able to work on this starting next week; so EO next week could be a realistic target for completion. |
Thanks for looking at the problem. The binding with libclc is causing difficulties as we can't really trigger behavior specific to the sycl abi (image lowering, type management, extensions etc.). So not be able to specify the The other part is that we can't compile the device only to extract the ptx (or the llvm IR for the ptx target). This is slowing down many investigation as people have to rely on hacks to be able to inspect the output. Here moving to a clean host/device SYCL offloading action pipeline should, by construction, enable any targets for device only (now limited to SPIR targets). Or at least make this much more simpler implement for any target. |
#2713 by @mdtoguchi addresses the issues described here. |
#2713 has merged, closing. |
The
-fsycl-device-only
flag does not mark the driver actions as offloading.As shown by
-ccc-print-phases
,-fsycl-device-only
dismisses SYCL offloadingWhereas with
fsycl
(note the device-sycl not present before)The problem it raises is this makes the
Clang::ConstructJob
kind of convoluted and leads to the target triple being used to define the source language/programing model of the input file (due toUseSYCLTriple
).For instance,
The text was updated successfully, but these errors were encountered: