Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] How OpenCL device mode is being reused for SYCL #59

Open
AnastasiaStulova opened this issue Apr 3, 2019 · 13 comments
Open

[SYCL] How OpenCL device mode is being reused for SYCL #59

AnastasiaStulova opened this issue Apr 3, 2019 · 13 comments
Labels
upstream This change is related to upstreaming SYCL support to llorg.

Comments

@AnastasiaStulova
Copy link
Contributor

This is to track the discussion about the device compilation flow in the Clang frontend.

There are multiple ways OpenCL features can be reused:

  • At source level by implementing SYCL libraries using C++ for OpenCL code and enabling OpenCL in LangOpts for SYCL device.
  • At AST level by mapping SYCL constructs onto OpenCL AST nodes.
  • At IR CodeGen level by re-using OpenCL CodeGen functionality.

The objective is to avoid code duplication as much as possible and to keep code base simpler/easier to maintain.

@AnastasiaStulova
Copy link
Contributor Author

AnastasiaStulova commented May 2, 2019

Just to list a couple of points that should allow us to get going:

1. Since SYCL device model is the same as OpenCL, how much diagnostics from OpenCL do you plan to reuse? I think it would be important for Clang users if SYCL allows similar functionality to be compiled/rejected, so that vendors that supported OpenCL in Clang (not necessarily just upstream!) can have a safe path to support SYCL.

Also just a general note on your prototype implementation, that has very little diagnostics functionality. It is perfectly reasonable that the prototype doesn't provide full functionality however it is important to select the directions that wouldn't prevent implementing important functionality. Clang has a reputation of being high quality compiler frontend and one of its main strengths has always been helpful diagnostics. It is important therefore that new implementations adhere to the same philosophy. I would therefore advise to look at this topic even for the host compilation mode - especially in the space of interfacing between host and device or things like restrictions to extern functions.

2. OpenCL has a number of special opaque types - sampler, image, events, etc... that I believe appear in SYCL too. Does SYCL implementation plan to reuse those? If yes how?

3. OpenCL provides support for explicit address spaces. SYCL doesn't have explicit address spaces but the memory hierarchy is identical to OpenCL and the behavior as described indicates that compiler support is required. What will be your strategy for implementing address space support for SYCL?

4. How do you implement vector functionality - in particular swizzles or conversion ranking? Is any OpenCL functionality required?

5. Do you plan to reuse functionality of OpenCL kernel? If yes how?

Not directly related to OpenCL functionality but how do you plan to solve outlining of data structures? The proposed patch https://reviews.llvm.org/D60455 suggest solution for the device functions but don't you need to mark data structures that are to be compiled for the device too i.e. classes that can't use virtual functions on the device? Also should template function also be outlined?

6. Do you plan to reuse OpenCL extensions, OpenCL command line options or builtin functions functionality?

7. Do you plan to provide implementation of SYCL_EXTERNAL?

8. Since the prototype implementation uses 2 separate compilation phases for host and device with different target information - how do you plan to ensure the data structures are the same on the host and device?

9. The prototype implementation makes use of various AST visitors. Is there any information why they are required?

10. There is some functionality specific to Intel FPGAs - do you plan to add this to Clang too?

@bader
Copy link
Contributor

bader commented May 21, 2019

@AnastasiaStulova, thanks for posting these questions. I'll try to address them ASAP.

Meanwhile, I've made a small experiment with re-using OpenCL sampler type in recently added SYCL class.
NOTE: image class is not implemented yet, so there no sampler users (e.g. image built-in functions) - just one synthetic test.

Here is few early observations:

  • sampler_t type name is replaced with __ocl_sampler_t to avoid potential collisions with user types (OpenCL C++ name is compatible with OpenCL C, but not with regular C++)
  • selectively enabled a couple of OpenCL diagnostics for this type
  • errors are reported in the SYCL headers in wrapper class rather than in the user's code
  • some diagnostics trigger on SYCL use cases. E.g. SYCL kernel assigns lambda members the values from kernel arguments. This initialization code for sampler emits errors because OpenCL disallows sampler on the right hand side of the binary operators - these are supposed to be used only by built-in functions. Another potential issue is the lambda object itself - captured sampler is a member of the lambda object and OpenCL doesn't allow composite types with samplers. SPIR-V produced from SYCL should be okay as lambda object can be removed by standard LLVM transformation passes.

Diff is here: bader#1

I would really like to know your feedback on that.

@AnastasiaStulova
Copy link
Contributor Author

Ok, it looks very reasonable.

sampler_t type name is replaced with __ocl_sampler_t to avoid potential collisions with user types (OpenCL C++ name is compatible with OpenCL C, but not with regular C++)

I think it's very reasonable to pick a name prefixed with __. However would __sycl_sampler_t make more sense? Note that C++ for OpenCL will use the same types as OpenCL C so __ocl_sampler_t won't appear anywhere in Clang.

errors are reported in the SYCL headers in wrapper class rather than in the user's code

That's perfectly understandable but considering that SYCL language is designed as a library it should hopefully not be counter intuitive to the developers.

some diagnostics trigger on SYCL use cases. E.g. SYCL kernel assigns lambda members the values from kernel arguments. This initialization code for sampler emits errors because OpenCL disallows sampler on the right hand side of the binary operators - these are supposed to be used only by built-in functions. Another potential issue is the lambda object itself - captured sampler is a member of the lambda object and OpenCL doesn't allow composite types with samplers. SPIR-V produced from SYCL should be okay as lambda object can be removed by standard LLVM transformation passes.

Would lambda still be removed in non optimized mode?

@bader
Copy link
Contributor

bader commented May 24, 2019

I think it's very reasonable to pick a name prefixed with __. However would __sycl_sampler_t make more sense? Note that C++ for OpenCL will use the same types as OpenCL C so __ocl_sampler_t won't appear anywhere in Clang.

I picked that have for a few reasons:

  • I thought that I would be able to re-use all the restrictions for OpenCL type, basically making this type an alias to sampler_t.
  • I re-used CodeGen part as is - i.e. this type represented as opencl.sampler_t in LLVM IR.

Do you want me to update the LLVM IR opaque type name to align it with __sycl_sampler_t?

errors are reported in the SYCL headers in wrapper class rather than in the user's code

That's perfectly understandable but considering that SYCL language is designed as a library it should hopefully not be counter intuitive to the developers.

If I understand the idea behind SYCL API design correctly, SYCL types restrictions are supposed to be enforced by the C++ wrapper class design and do not require non-standard C++ diagnostics.
For instance,

sampler s;
sampler t = s; // error

Assignment can be disallowed by assignment operator overloading. This approach should minimize the changes in the compiler.
Does it sounds reasonable to you?

Would lambda still be removed in non optimized mode?

No. It would look like we allocate a structure on the stack, copy kernel arguments into this structure and pass it to other function. I don't think this breaks anything as it's quite similar to how "device enqueue" feature in implemented in OpenCL.

@bader
Copy link
Contributor

bader commented May 24, 2019

I've opened new PR in this repository: #167.
List of changes compared to the bader#1:

  • enabled more OpenCL diagnostics for (not only) sampler type in SYCL mode
  • updated documentation to describe the way OpenCL types are re-used in SYCL mode

@AnastasiaStulova
Copy link
Contributor Author

Do you want me to update the LLVM IR opaque type name to align it with __sycl_sampler_t?

I think opencl.sampler_t is reasonable. It will probably allow to reuse implementation in the middle end and backend.

Assignment can be disallowed by assignment operator overloading. This approach should minimize the changes in the compiler.
Does it sounds reasonable to you?

Yep, makes sense to use API logic wherever applicable. I am guessing it's mainly restrictions on the declaration scopes or the uses inside structs you might not be able to represent in APIs?

@bader
Copy link
Contributor

bader commented May 28, 2019

Just to list a couple of points that should allow us to get going:

1. Since SYCL device model is the same as OpenCL, how much diagnostics from OpenCL do you plan to reuse? I think it would be important for Clang users if SYCL allows similar functionality to be compiled/rejected, so that vendors that supported OpenCL in Clang (not necessarily just upstream!) can have a safe path to support SYCL.

It's hard to say "how much". Reviewing the OpenCL diagnostics I noted that there are just a few restrictions for standard C code and the rest of the rules are applied to new "C extensions" like new data types (like images, events, etc.), function attributes and address spaces.
SYCL exposes these extensions though standard C++ API and tries to restrict invalid via standard C++ constructs like C++ class methods, template functions/classes.
Another difference is that restrictions on standard C++ features (e.g. exceptions are not supported in the device code) can be applied only to the offloaded part of the application, but OpenCL diagnostics is applied to the whole translation unit.

This doesn't mean that using OpenCL diagnostics can't be used.

I think once we complete SYCL 1.2.1 functionality implementation (i.e. add support for images, streams and hierarchical parallelism), we will go over OpenCL diagnostics more thoroughly and see if it can be applied in SYCL mode.
@AnastasiaStulova, does it sound good to you?

Anyway we will have to use deferred diagnostics engine to avoid applying diagnostics to the host part. NOTE: there are a few diagnostics implemented for CUDA/OpenMP, which use this infrastructure and can be re-used in SYCL mode (e.g. restricting exceptions in "device part").

Also just a general note on your prototype implementation, that has very little diagnostics functionality. It is perfectly reasonable that the prototype doesn't provide full functionality however it is important to select the directions that wouldn't prevent implementing important functionality. Clang has a reputation of being high quality compiler frontend and one of its main strengths has always been helpful diagnostics. It is important therefore that new implementations adhere to the same philosophy. I would therefore advise to look at this topic even for the host compilation mode - especially in the space of interfacing between host and device or things like restrictions to extern functions.

2. OpenCL has a number of special opaque types - sampler, image, events, etc... that I believe appear in SYCL too. Does SYCL implementation plan to reuse those? If yes how?

#167 and #171 demonstrate how OpenCL types can be re-used in SYCL.

3. OpenCL provides support for explicit address spaces. SYCL doesn't have explicit address spaces but the memory hierarchy is identical to OpenCL and the behavior as described indicates that compiler support is required. What will be your strategy for implementing address space support for SYCL?

"Raw" C++ pointers are supposed to be mapped to "generic OpenCL pointers". SYCL pointer classes are using corresponding OpenCL address space qualifiers to enable correct "inference". For inference, current plan is to re-use InferAddressSpaces llvm pass - https://llvm.org/doxygen/InferAddressSpaces_8cpp_source.html.
NOTE: @asavonic is actively working on the solution for address spaces.

4. How do you implement vector functionality - in particular swizzles or conversion ranking? Is any OpenCL functionality required?

vector functionality is implemented as C++ template classes. Conversion operators/swizzles follow standard C++ semantics. One interesting note is that some expressions with swizzles can be evaluated at compile time thanks to expression templates feature.
AFAIK, no additional compiler changes are required to support vector functionality.
Here is a link to the implementation of the cl::sycl::vec class: https://github.com/intel/llvm/blob/sycl/sycl/include/CL/sycl/types.hpp#L242.

5. Do you plan to reuse functionality of OpenCL kernel? If yes how?

Yes. SYCL device compiler generates OpenCL kernel function, which calls "lambda body" method.
https://github.com/intel/llvm/blob/sycl/clang/lib/Sema/SemaSYCL.cpp#L412

Not directly related to OpenCL functionality but how do you plan to solve outlining of data structures? The proposed patch https://reviews.llvm.org/D60455 suggest solution for the device functions but don't you need to mark data structures that are to be compiled for the device too i.e. classes that can't use virtual functions on the device?

Yes. It will be added later. We are trying to make patches smaller, so it's easier to review. Current state of the implementation is available on the GitHub.

Also should template function also be outlined?

Yes. AFAIK, outlining of template functions is supported. https://reviews.llvm.org/D60455 has a test for that case: clang/test/CodeGenSYCL/device-functions.cpp.

6. Do you plan to reuse OpenCL extensions, OpenCL command line options or builtin functions functionality?

I think re-using some OpenCL command line options might be useful for SYCL mode (e.g. -cl-fast-relaxed-math even mentioned in the SYCL spec).
Regarding extensions: I don't think we have implemented any extensions requiring compiler support. E.g. we implemented Intel sub-groups extension, but it adds only a set of built-ins.
Once we start working on extensions requiring compiler support (e.g. new image types), we will investigate how to re-use OpenCL extension. I think it's should be quite similar to the core part (i.e. #167)

7. Do you plan to provide implementation of SYCL_EXTERNAL?

Yes, but it's not done yet. I'll make sure we review the implementation design as soon as it's ready.

8. Since the prototype implementation uses 2 separate compilation phases for host and device with different target information - how do you plan to ensure the data structures are the same on the host and device?

I think the answer is "the same as OpenCL", but single source provides additional guarantee that both compilers see the same data types definition (if user do not screw it up with pre-processor).

  • OpenCL clSetKernelArg API validates the kernel parameter sizes.
  • OpenCL types has fixed sizes
  • Testing. :-)

Somewhat related article from the Codeplay's blog: https://www.codeplay.com/portal/01-30-18-getting-sycl-to-interact-with-opencl-code

9. The prototype implementation makes use of various AST visitors. Is there any information why they are required?

Are you referring to the SemaSYCL.cpp code?
I think they some of these visitors are used for SYCL device code diagnostics. Currently we are working on re-using deferred diagnostics engine and I think these visitors will be removed from SemaSYCL. We will create a PR with the prototype soon.

10. There is some functionality specific to Intel FPGAs - do you plan to add this to Clang too?

Yes.
Some of that functionality might interested to other vendors too (e.g. Xilinx).
#55 - is related to this question. I think we should discuss whether "attributes" is the right way to expose new functionality to user.

@Fznamznon
Copy link
Contributor

5. Do you plan to reuse functionality of OpenCL kernel? If yes how?

Yes. SYCL device compiler generates OpenCL kernel function, which calls "lambda body" method.
https://github.com/intel/llvm/blob/sycl/clang/lib/Sema/SemaSYCL.cpp#L412

A little bit more details about it you can find here - https://reviews.llvm.org/D60455#1472705 .

@AnastasiaStulova
Copy link
Contributor Author

"Raw" C++ pointers are supposed to be mapped to "generic OpenCL pointers". SYCL pointer classes are using corresponding OpenCL address space qualifiers to enable correct "inference". For inference, current plan is to re-use InferAddressSpaces llvm pass - https://llvm.org/doxygen/InferAddressSpaces_8cpp_source.html.
NOTE: @asavonic is actively working on the solution for address spaces.

Generic address space is not supported in CL1.2 devices. InferAS pass doesn't deduce address spaces but only eliminates address space conversions.

AFAIK, no additional compiler changes are required to support vector functionality.
Here is a link to the implementation of the cl::sycl::vec class: https://github.com/intel/llvm/blob/sycl/sycl/include/CL/sycl/types.hpp#L242.

typedef char __char2_vec_t __attribute__((ext_vector_type(2)))

Hmm, I can see from your link that you are using Clang vector extension that is the same as OpenCL vectors!

I think they some of these visitors are used for SYCL device code diagnostics. Currently we are working on re-using deferred diagnostics engine and I think these visitors will be removed from SemaSYCL. We will create a PR with the prototype soon.

Ok. Let me know once it's done!

@bader
Copy link
Contributor

bader commented May 30, 2019

"Raw" C++ pointers are supposed to be mapped to "generic OpenCL pointers". SYCL pointer classes are using corresponding OpenCL address space qualifiers to enable correct "inference". For inference, current plan is to re-use InferAddressSpaces llvm pass - https://llvm.org/doxygen/InferAddressSpaces_8cpp_source.html.
NOTE: @asavonic is actively working on the solution for address spaces.

Generic address space is not supported in CL1.2 devices. InferAS pass doesn't deduce address spaces but only eliminates address space conversions.

@asavonic, please, correct me if I wrong, but I think you are working on improving InferAS pass to do the "inference" of address spaces. In cases when CL1.2 environment requirements are enforced the plan is to run the inference pass and if it failed (i.e. there "generic" pointers in resulting IR), we fail compilation. For OpenCL 2.0+ environment, SPIR-V module is allowed to have "generic" pointers. We can enforce OpenCL 1.2 restrictions with the compiler option.

AFAIK, no additional compiler changes are required to support vector functionality.
Here is a link to the implementation of the cl::sycl::vec class: https://github.com/intel/llvm/blob/sycl/sycl/include/CL/sycl/types.hpp#L242.

typedef char __char2_vec_t __attribute__((ext_vector_type(2)))

Hmm, I can see from your link that you are using Clang vector extension that is the same as OpenCL vectors!

Right. If compiler supports "vector extension", cl::sycl::vec class uses this extensions and don't use it otherwise. E.g. host-side implementation which is compiled by GCC/MSVC is functionally equivalent, but doesn't use this extension.

Anyway, they point is there are no compiler changes required to support SYCL vector class.

I think they some of these visitors are used for SYCL device code diagnostics. Currently we are working on re-using deferred diagnostics engine and I think these visitors will be removed from SemaSYCL. We will create a PR with the prototype soon.

Ok. Let me know once it's done!

Here is the first version: #181.
I think there is a room for improvement, but this PR demonstrates the concept.

@asavonic
Copy link
Contributor

"Raw" C++ pointers are supposed to be mapped to "generic OpenCL pointers". SYCL pointer classes are using corresponding OpenCL address space qualifiers to enable correct "inference". For inference, current plan is to re-use InferAddressSpaces llvm pass - https://llvm.org/doxygen/InferAddressSpaces_8cpp_source.html.
NOTE: @asavonic is actively working on the solution for address spaces.

Generic address space is not supported in CL1.2 devices. InferAS pass doesn't deduce address spaces but only eliminates address space conversions.

@asavonic, please, correct me if I wrong, but I think you are working on improving InferAS pass to do the "inference" of address spaces.

"Inference" is a bit vague term. I suspect that you and Anastasia are talking about the same thing.
IAS pass follows a chain of instructions that lead from a pointer in a concrete address space to a particular pointer in the generic (flat) address space. Then it replaces generic AS with the concrete one, if it can prove correctness of this replacement.

One thing to note here: we're not relying on IAS pass - this is just an performance optimization that may not work in some cases (e.g. when loads/stores to memory which could not be removed by Mem2Reg are involved).

In cases when CL1.2 environment requirements are enforced the plan is to run the inference pass and if it failed (i.e. there "generic" pointers in resulting IR), we fail compilation. For OpenCL 2.0+ environment, SPIR-V module is allowed to have "generic" pointers. We can enforce OpenCL 1.2 restrictions with the compiler option.

Right.

@AnastasiaStulova
Copy link
Contributor Author

    I think they some of these visitors are used for SYCL device code diagnostics. Currently we are working on re-using deferred diagnostics engine and I think these visitors will be removed from SemaSYCL. We will create a PR with the prototype soon.

Ok. Let me know once it's done!

Here is the first version: #181.
I think there is a room for improvement, but this PR demonstrates the concept.

Perhaps I am missing something but I don't think it demonstrates reusing diagnostics from OpenCL. Can you provide some more information please.

@bader
Copy link
Contributor

bader commented Jun 4, 2019

Sorry, I was meant to say "the concept of deferring the diagnostics".
Existing patch do not use OpenCL diagnostics, but rather adds new ones.
Instead, I think we can unify some existing error messages and share in multiple offloading modes.
E.g. https://github.com/intel/llvm/pull/181/files#diff-6cd458144c039b4a707e2306f83cc189R758.
NOTE: code restricting a throwing an exception in CUDA mode is almost the same as in SYCL mode.

I'll leave some ideas on re-using existing diagnostics in PR comment.

@bader bader added the upstream This change is related to upstreaming SYCL support to llorg. label Jun 18, 2019
vladimirlaz pushed a commit that referenced this issue Feb 18, 2020
  CONFLICT (content): Merge conflict in clang/lib/CodeGen/CGOpenMPRuntime.cpp
vladimirlaz pushed a commit to vladimirlaz/llvm that referenced this issue Mar 31, 2020
  CONFLICT (content): Merge conflict in clang/docs/LanguageExtensions.rst
vladimirlaz pushed a commit that referenced this issue Apr 28, 2020
  CONFLICT (content): Merge conflict in README.md
iclsrc pushed a commit that referenced this issue Sep 21, 2023
…… (#67069)

We noticed some performance issue while in lldb-vscode for grabing the
name of the SBValue. Profiling shows SBValue::GetName() can cause
synthetic children provider of shared/unique_ptr to deference underlying
object and complete it type.

This patch lazily moves the dereference from synthetic child provider's
Update() method to GetChildAtIndex() so that SBValue::GetName() won't
trigger the slow code path.

Here is the culprit slow code path:
```
...
frame #59: 0x00007ff4102e0660 liblldb.so.15`SymbolFileDWARF::CompleteType(this=<unavailable>, compiler_type=0x00007ffdd9829450) at SymbolFileDWARF.cpp:1567:25 [opt]
...
frame #67: 0x00007ff40fdf9bd4 liblldb.so.15`lldb_private::ValueObject::Dereference(this=0x0000022bb5dfe980, error=0x00007ffdd9829970) at ValueObject.cpp:2672:41 [opt]
frame #68: 0x00007ff41011bb0a liblldb.so.15`(anonymous namespace)::LibStdcppSharedPtrSyntheticFrontEnd::Update(this=0x000002298fb94380) at LibStdcpp.cpp:403:40 [opt]
frame #69: 0x00007ff41011af9a liblldb.so.15`lldb_private::formatters::LibStdcppSharedPtrSyntheticFrontEndCreator(lldb_private::CXXSyntheticChildren*, std::shared_ptr<lldb_private::ValueObject>) [inlined] (anonymous namespace)::LibStdcppSharedPtrSyntheticFrontEnd::LibStdcppSharedPtrSyntheticFrontEnd(this=0x000002298fb94380, valobj_sp=<unavailable>) at LibStdcpp.cpp:371:5 [opt]
...
frame #78: 0x00007ff40fdf6e42 liblldb.so.15`lldb_private::ValueObject::CalculateSyntheticValue(this=0x000002296c66a500) at ValueObject.cpp:1836:27 [opt]
frame #79: 0x00007ff40fdf1939 liblldb.so.15`lldb_private::ValueObject::GetSyntheticValue(this=<unavailable>) at ValueObject.cpp:1867:3 [opt]
frame #80: 0x00007ff40fc89008 liblldb.so.15`ValueImpl::GetSP(this=0x0000022c71b90de0, stop_locker=0x00007ffdd9829d00, lock=0x00007ffdd9829d08, error=0x00007ffdd9829d18) at SBValue.cpp:141:46 [opt]
frame #81: 0x00007ff40fc7d82a liblldb.so.15`lldb::SBValue::GetSP(ValueLocker&) const [inlined] ValueLocker::GetLockedSP(this=0x00007ffdd9829d00, in_value=<unavailable>) at SBValue.cpp:208:21 [opt]
frame #82: 0x00007ff40fc7d817 liblldb.so.15`lldb::SBValue::GetSP(this=0x00007ffdd9829d90, locker=0x00007ffdd9829d00) const at SBValue.cpp:1047:17 [opt]
frame #83: 0x00007ff40fc7da6f liblldb.so.15`lldb::SBValue::GetName(this=0x00007ffdd9829d90) at SBValue.cpp:294:32 [opt]
...
```

Differential Revision: https://reviews.llvm.org/D159542
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
upstream This change is related to upstreaming SYCL support to llorg.
Projects
None yet
Development

No branches or pull requests

4 participants