diff --git a/adoc/chapters/acknowledgements.adoc b/adoc/chapters/acknowledgements.adoc index f84b0fd9..53fbddd2 100644 --- a/adoc/chapters/acknowledgements.adoc +++ b/adoc/chapters/acknowledgements.adoc @@ -48,8 +48,10 @@ * Michael Wong, Codeplay * Peter Žužek, Codeplay * Matt Newport, EA - * Rasool Maghareh, Huawei Technologies Co. Ltd. - * Guansong Zhang, Huawei Technologies Co. Ltd. + * Rasool Maghareh, Huawei Technologies Co. + Ltd. + * Guansong Zhang, Huawei Technologies Co. + Ltd. * Ruslan Arutyunyan, Intel * Alexey Bader, Intel * James Brodman, Intel @@ -67,7 +69,7 @@ * Jon Leech, Luna Princeps LLC * Kathleen Mattson, Miller & Mattson, LLC * Dave Miller, Miller & Mattson, LLC - * Stéphanie Even, Mercedes-Benz Research and Development NA + * Stéphanie Even, Mercedes-Benz Research and Development NA * Chris Gearing, Mobileye * Seiji Nishimura, NSITEXE, Inc. * Neil Trevett, NVIDIA diff --git a/adoc/chapters/architecture.adoc b/adoc/chapters/architecture.adoc index 58a6297c..5abba33e 100644 --- a/adoc/chapters/architecture.adoc +++ b/adoc/chapters/architecture.adoc @@ -3,62 +3,66 @@ [[architecture]] = SYCL architecture -This chapter describes the structure of a SYCL application, and how the -SYCL generic programming model lays out on top of a number of <>s. +This chapter describes the structure of a SYCL application, and how the SYCL +generic programming model lays out on top of a number of <>s. == Overview -SYCL is an open industry standard for programming a heterogeneous system. The -design of SYCL allows standard {cpp} source code to be written such that it can -run on either an heterogeneous device or on the <>. +SYCL is an open industry standard for programming a heterogeneous system. +The design of SYCL allows standard {cpp} source code to be written such that +it can run on either an heterogeneous device or on the <>. The terminology used for SYCL inherits historically from OpenCL with some -SYCL-specific additions. However SYCL is a generic {cpp} programming model -that can be laid out on top of other heterogeneous APIs apart from OpenCL. -SYCL implementations can provide <>s for various heterogeneous APIs, -implementing the SYCL general specification on top of them. We refer to this -heterogeneous API as the <>. The SYCL general specification -defines the behavior that all SYCL implementations must expose to SYCL users -for a SYCL application to behave as expected. - -A function object that can execute on a <> exposed by a <> -is called a <>. +SYCL-specific additions. +However SYCL is a generic {cpp} programming model that can be laid out on +top of other heterogeneous APIs apart from OpenCL. +SYCL implementations can provide <>s for various heterogeneous +APIs, implementing the SYCL general specification on top of them. +We refer to this heterogeneous API as the <>. +The SYCL general specification defines the behavior that all SYCL +implementations must expose to SYCL users for a SYCL application to behave +as expected. + +A function object that can execute on a <> exposed by a +<> is called a <>. To ensure maximum interoperability with different <>s, software developers can access the <> alongside the SYCL general API whenever they include the <> interoperability headers. However, interoperability is a <>-specific feature. -An application that uses interoperability does not conform to the -SYCL general application model, since it is not portable across backends. +An application that uses interoperability does not conform to the SYCL +general application model, since it is not portable across backends. // Note below I leave the reference to OpenCL intentionally -The target users of SYCL are {cpp} programmers who want all the performance and -portability features of a standard like OpenCL, but with the flexibility to use -higher-level {cpp} abstractions across the host/device code boundary. +The target users of SYCL are {cpp} programmers who want all the performance +and portability features of a standard like OpenCL, but with the flexibility +to use higher-level {cpp} abstractions across the host/device code boundary. Developers can use most of the abstraction features of {cpp}, such as templates, classes and operator overloading. -However, some {cpp} language features are not permitted inside -kernels, due to the limitations imposed by the capabilities of the underlying +However, some {cpp} language features are not permitted inside kernels, due +to the limitations imposed by the capabilities of the underlying heterogeneous platforms. These features include virtual functions, virtual inheritance, -throwing/catching exceptions, and run-time type-information. These features are -available outside kernels as normal. Within these constraints, developers can -use abstractions defined by SYCL, or they can develop their own on top. These -capabilities make SYCL ideal for library developers, middleware providers and -application developers who want to separate low-level highly-tuned algorithms -or data structures that work on heterogeneous systems from higher-level software -development. Software developers can produce templated algorithms that are easily -usable by developers in other fields. +throwing/catching exceptions, and run-time type-information. +These features are available outside kernels as normal. +Within these constraints, developers can use abstractions defined by SYCL, +or they can develop their own on top. +These capabilities make SYCL ideal for library developers, middleware +providers and application developers who want to separate low-level +highly-tuned algorithms or data structures that work on heterogeneous +systems from higher-level software development. +Software developers can produce templated algorithms that are easily usable +by developers in other fields. [[sec:anatomy]] == Anatomy of a SYCL application -Below is an example of a typical <> which schedules a job to run -in parallel on any heterogeneous device available. +Below is an example of a typical <> which schedules a job +to run in parallel on any heterogeneous device available. // An AsciiDoctor "feature", the language is specified as the second // parameter of this attribute, even if we do not want it. So add a @@ -69,70 +73,71 @@ in parallel on any heterogeneous device available. include::{code_dir}/anatomy.cpp[lines=4..-1] ---- -At line 1, we [code]#{hash}include# the SYCL header files, which -provide all of the SYCL features that will be used. +At line 1, we [code]#{hash}include# the SYCL header files, which provide all +of the SYCL features that will be used. A SYCL application runs on a <>. -The application is structured in three scopes which specify the different sections; -<>, <> and <>. -The <> specifies a single kernel function that will -be, or has been, compiled by a <> and executed on a -<>. In this example <> is defined by lines -25 to 26. The <> specifies a unit of work which is -comprised of a <> and <>. In this -example <> is defined by lines 20 to 28. The -<> specifies all other code outside of a +The application is structured in three scopes which specify the different +sections; <>, <> and +<>. +The <> specifies a single kernel function that will be, or has +been, compiled by a <> and executed on a <>. +In this example <> is defined by lines 25 to 26. +The <> specifies a unit of work which is comprised of a +<> and <>. +In this example <> is defined by lines 20 to 28. +The <> specifies all other code outside of a <>. These three scopes are used to control the application flow and the construction and lifetimes of the various objects used within SYCL, as explained in <>. -A <> is the scoped block of code that will be -compiled using a device compiler. This code may be defined by the -body of a lambda function or by the [code]#operator()# function of -a function object. Each instance of the -<> will be executed as a single, though not -necessarily entirely independent, flow of execution and has to adhere -to restrictions on what operations may be allowed to enable device +A <> is the scoped block of code that will be compiled +using a device compiler. +This code may be defined by the body of a lambda function or by the +[code]#operator()# function of a function object. +Each instance of the <> will be executed as a single, +though not necessarily entirely independent, flow of execution and has to +adhere to restrictions on what operations may be allowed to enable device compilers to safely compile it to a range of underlying devices. The [code]#parallel_for# member function can be templated with a class. -This class is used to manually name the -kernel when desired, such as to avoid a compiler-generated name when debugging -a kernel defined through a lambda, to provide a known name with which to apply -build options to a kernel, or to ensure compatibility with multiple -compiler-pass implementations. - -The [code]#parallel_for# member function creates an instance of a <>, -which is the entity that will be enqueued within a -command group. In the case of [code]#parallel_for# the -<> will be executed over the given range from 0 to 1023. -The different member functions to -execute kernels can be found in <>. +This class is used to manually name the kernel when desired, such as to +avoid a compiler-generated name when debugging a kernel defined through a +lambda, to provide a known name with which to apply build options to a +kernel, or to ensure compatibility with multiple compiler-pass +implementations. + +The [code]#parallel_for# member function creates an instance of a +<>, which is the entity that will be enqueued within a command +group. +In the case of [code]#parallel_for# the <> will be +executed over the given range from 0 to 1023. +The different member functions to execute kernels can be found in +<>. A <> is the syntactic scope wrapped by the construction -of a <> as seen on line 19. The -<> may invoke only a single +of a <> as seen on line 19. +The <> may invoke only a single <>, and it takes a parameter of type command group [code]#handler#, which is constructed by the runtime. -All the requirements for a kernel to execute are -defined in this <>, as described in -<>. In this case the constructor used -for [code]#myQueue# on line 9 is the default constructor, which allows -the queue to select the best underlying device to execute on, leaving the -decision up to the runtime. - -In SYCL, data that is required within a <> must -be contained within a <>, <>, or <> allocation, as described in -<>. We -construct a buffer on line 16. Access to the <> is controlled via -an <> which is constructed on line 21. -The <> is used to -keep track of access to the data and the <> is used to request -access to the data on a queue, as well as to track the dependencies between -<>. In this example the <> is used to -write to the data buffer on line 26. +All the requirements for a kernel to execute are defined in this +<>, as described in <>. +In this case the constructor used for [code]#myQueue# on line 9 is the +default constructor, which allows the queue to select the best underlying +device to execute on, leaving the decision up to the runtime. + +In SYCL, data that is required within a <> must be +contained within a <>, <>, or <> allocation, as +described in <>. +We construct a buffer on line 16. +Access to the <> is controlled via an <> which is +constructed on line 21. +The <> is used to keep track of access to the data and the +<> is used to request access to the data on a queue, as well as to +track the dependencies between <>. +In this example the <> is used to write to the data buffer on line 26. [[sec:normativerefs]] @@ -145,31 +150,35 @@ write to the data buffer on line 26. The documents in the following list are referred to within this SYCL specification, and their content is a requirement for this document. - . *{cpp17}:* <>, referred to in this - specification as the {cpp} core language. The SYCL specification refers to - language in the following {cpp} defect reports and assumes a compiler that - implements them: <>. + . *{cpp17}:* <>, referred to in + this specification as the {cpp} core language. + The SYCL specification refers to language in the following {cpp} defect + reports and assumes a compiler that implements them: <>. . *{cpp20}:* <>, referred to in this specification as the next {cpp} specification. +Programming languages — {cpp}>>, referred to in this specification as the +next {cpp} specification. [[sec:nonnormativerefs]] == Non-normative notes and examples -Unless stated otherwise, text within this SYCL specification is normative and defines -the required behavior of a SYCL implementation. Non-normative / informational notes -are included within this specification using a "`note`" callout, of the form: +Unless stated otherwise, text within this SYCL specification is normative +and defines the required behavior of a SYCL implementation. +Non-normative / informational notes are included within this specification +using a "`note`" callout, of the form: [NOTE] ==== -Information within a note callout, such as this text, is for informational purposes -and does not impose requirements on or specify behavior of a SYCL implementation. +Information within a note callout, such as this text, is for informational +purposes and does not impose requirements on or specify behavior of a SYCL +implementation. ==== -Source code examples within the specification are provided to aid with understanding, -and are non-normative. +Source code examples within the specification are provided to aid with +understanding, and are non-normative. -In case of any conflict between a non-normative note or source example, and normative -text within the specification, the normative text must be taken to be correct. +In case of any conflict between a non-normative note or source example, and +normative text within the specification, the normative text must be taken to +be correct. [[sec:platformmodel]] == The SYCL platform model @@ -180,148 +189,163 @@ called <>. A SYCL <> is constructed, either directly by the user or implicitly when creating a <>, to hold all the runtime information required by -the SYCL runtime and the <> to operate on a device, or group of devices. -When a group of devices can be grouped together on the same context, they have -some visibility of each other's memory objects. The SYCL runtime can assume that memory -is visible across all devices in the same <>. +the SYCL runtime and the <> to operate on a device, or group of +devices. +When a group of devices can be grouped together on the same context, they +have some visibility of each other's memory objects. +The SYCL runtime can assume that memory is visible across all devices in the +same <>. Not all devices exposed from the same <> can be grouped together in the same <>. A SYCL application executes on the host as a standard {cpp} program. -<> are exposed through different <> to the SYCL application. -The SYCL application submits <> to <>. +<> are exposed through different <> +to the SYCL application. +The SYCL application submits <> to <>. Each <> enables execution on a given device. The <> then extracts operations from the <>, e.g. an explicit copy operation or a -<>. When the operation is a -<>, the <> uses a -<>-specific mechanism to extract the device binary from the SYCL +<>. +When the operation is a <>, the <> uses +a <>-specific mechanism to extract the device binary from the SYCL application and pass it to the heterogeneous API for execution on the <>. -A SYCL <> is divided into one or more compute units (CUs) which are each divided -into one or more processing elements (PEs). Computations on a device occur -within the processing elements. +A SYCL <> is divided into one or more compute units (CUs) which are +each divided into one or more processing elements (PEs). +Computations on a device occur within the processing elements. How computation is mapped to PEs is <> and <> specific. -Two devices exposed via two different backends can map computations differently to the -same device. +Two devices exposed via two different backends can map computations +differently to the same device. When a SYCL application contains <> objects, the SYCL -implementation must provide an offline compilation mechanism that enables the -integration of the device binaries into the SYCL application. -The output of the offline compiler can be an intermediate representation, such as -SPIR-V, that will be finalized during execution or a final device ISA. +implementation must provide an offline compilation mechanism that enables +the integration of the device binaries into the SYCL application. +The output of the offline compiler can be an intermediate representation, +such as SPIR-V, that will be finalized during execution or a final device +ISA. A device may expose special purpose functionality as a _built-in_ function. -The SYCL API exposes functions to query and dispatch said _built-in_ functions. -Some <> and <> may not support programmable kernels, and only support -_built-in_ functions. +The SYCL API exposes functions to query and dispatch said _built-in_ +functions. +Some <> and <> may not support +programmable kernels, and only support _built-in_ functions. // TODO: Conformance of these custom-devices? == The SYCL backend model -SYCL is a generic programming model for the {cpp} language that can target multiple -heterogeneous APIs, such as OpenCL. +SYCL is a generic programming model for the {cpp} language that can target +multiple heterogeneous APIs, such as OpenCL. -SYCL implementations enable these target APIs by implementing <>. -For a SYCL implementation to be conformant on said <>, it must execute -the SYCL generic programming model on the backend. All SYCL implementations must -provide at least one backend. +SYCL implementations enable these target APIs by implementing <>. +For a SYCL implementation to be conformant on said <>, it must +execute the SYCL generic programming model on the backend. +All SYCL implementations must provide at least one backend. -The present document covers the SYCL generic interface available to -all <>. How the SYCL generic interface maps to a particular -<> is defined either by a separate <> specification -document, provided by the Khronos SYCL group, or by the SYCL -implementation documentation. Whenever there is a <> -specification document, this takes precedence over SYCL implementation -documentation. +The present document covers the SYCL generic interface available to all +<>. +How the SYCL generic interface maps to a particular <> is defined +either by a separate <> specification document, provided by the +Khronos SYCL group, or by the SYCL implementation documentation. +Whenever there is a <> specification document, this takes +precedence over SYCL implementation documentation. When a SYCL user builds their SYCL application, she decides which of the -<> will be used to build the SYCL application. This is called the set -of _active backends_. Implementations must ensure that the active -backends selected by the user can be used simultaneously by the SYCL -implementation at runtime. If two backends are available at compile time but -will produce an invalid SYCL application at runtime, the SYCL implementation -must emit a compilation error. - -A SYCL application built with a number of active backends does not necessarily -guarantee that said backends can be executed at runtime. -The subset of active backends available at runtime is called -_available backends_. -A backend is said to be _available_ if the host platform where the -SYCL application is executed exposes support for the heterogeneous API -required for the <>. +<> will be used to build the SYCL application. +This is called the set of _active backends_. +Implementations must ensure that the active backends selected by the user +can be used simultaneously by the SYCL implementation at runtime. +If two backends are available at compile time but will produce an invalid +SYCL application at runtime, the SYCL implementation must emit a compilation +error. + +A SYCL application built with a number of active backends does not +necessarily guarantee that said backends can be executed at runtime. +The subset of active backends available at runtime is called _available +backends_. +A backend is said to be _available_ if the host platform where the SYCL +application is executed exposes support for the heterogeneous API required +for the <>. It is implementation dependent whether certain backends require third-party -libraries to be available in the system. Failure to have all dependencies -required for all active backends at runtime will cause the SYCL application to -not run. - -Once the application is running, users can query what SYCL platforms are available. -SYCL implementations will expose the devices provided by each backend grouped -into platforms. A backend must expose at least one platform. - -Under the <> model, SYCL objects can contain one or multiple references -to a certain <> native type. +libraries to be available in the system. +Failure to have all dependencies required for all active backends at runtime +will cause the SYCL application to not run. + +Once the application is running, users can query what SYCL platforms are +available. +SYCL implementations will expose the devices provided by each backend +grouped into platforms. +A backend must expose at least one platform. + +Under the <> model, SYCL objects can contain one or multiple +references to a certain <> native type. Not all SYCL objects will map directly to a <> native type. The mapping of SYCL objects to <> native types is defined by the -<> specification document when available, or by the SYCL implementation -otherwise. +<> specification document when available, or by the SYCL +implementation otherwise. -To guarantee that multiple <> objects can interoperate with -each other, SYCL memory objects are not bound to a particular <>. +To guarantee that multiple <> objects can interoperate with each +other, SYCL memory objects are not bound to a particular <>. SYCL memory objects can be accessed from any device exposed by an _available_ backend. -SYCL Implementations can potentially map SYCL memory objects to -multiple native types in different <>. +SYCL Implementations can potentially map SYCL memory objects to multiple +native types in different <>. Since SYCL memory objects are independent of any particular <>, -SYCL <> can request access to memory objects allocated -by any <>, and execute it on the backend associated with the <>. +SYCL <> can request access to memory objects +allocated by any <>, and execute it on the backend associated with +the <>. This requires the SYCL implementation to be able to transfer memory objects across <>. -USM allocations are subject to the limitations -described in <>. +USM allocations are subject to the limitations described in <>. -When a SYCL application runs on any number of <> without relying on -any <>-specific behavior or interoperability, it is said to be a -SYCL general application, and it is expected to run in any SYCL-conformant -implementation that supports the required features for the application. +When a SYCL application runs on any number of <> +without relying on any <>-specific behavior or interoperability, it +is said to be a SYCL general application, and it is expected to run in any +SYCL-conformant implementation that supports the required features for the +application. === Platform mixed version support -The SYCL generic programming model exposes a number of <>, each of -them exposing a number of <>. Each <> is bound -to a certain <>. SYCL <> associated with said <> -are associated with that <>. +The SYCL generic programming model exposes a number of +<>, each of them exposing a number of +<>. +Each <> is bound to a certain <>. +SYCL <> associated with said <> are associated +with that <>. -Although the APIs in the SYCL generic programming model are defined according -to this specification and their version is indicated by the macro +Although the APIs in the SYCL generic programming model are defined +according to this specification and their version is indicated by the macro [code]#SYCL_LANGUAGE_VERSION#, this does not apply to APIs exposed by the -<>. Each <> provides its own document that defines its APIs, -and that document tells how to query for the device and platform versions. +<>. +Each <> provides its own document that defines its APIs, and that +document tells how to query for the device and platform versions. == SYCL execution model -As described in <>, a <> is comprised -of three scopes: <>, <>, and -<>. Code in the <> and -<> runs on the host and is governed by the -_SYCL application execution model_. Code in the kernel scope runs on a -device and is governed by the _SYCL kernel execution model_. +As described in <>, a <> is comprised of +three scopes: <>, <>, and +<>. +Code in the <> and <> runs on the +host and is governed by the _SYCL application execution model_. +Code in the kernel scope runs on a device and is governed by the _SYCL +kernel execution model_. [NOTE] ==== A SYCL device does not necessarily correspond to a physical accelerator. A SYCL implementation may choose to expose some or all of the host's -resources as a SYCL device; such an implementation would execute -code in <> on the host, but that code would still be governed by -the _SYCL kernel execution model_. +resources as a SYCL device; such an implementation would execute code in +<> on the host, but that code would still be governed by the +_SYCL kernel execution model_. ==== @@ -330,19 +354,22 @@ the _SYCL kernel execution model_. The SYCL application defines the execution order of the kernels by grouping each kernel with its requirements into a <>. -<> are submitted -for execution via a <> object, which defines the device where the kernel -will run. This specification sometimes refers to this as "`submitting the -kernel to a device`". The same <> object can be submitted to -different queues. When a <> is submitted to a SYCL <>, -the requirements of the kernel execution are captured. The implementation can -start executing a kernel as soon as its requirements have been satisfied. +<> are +submitted for execution via a <> object, which defines the device +where the kernel will run. +This specification sometimes refers to this as "`submitting the kernel to a +device`". +The same <> object can be submitted to different queues. +When a <> is submitted to a SYCL <>, the requirements +of the kernel execution are captured. +The implementation can start executing a kernel as soon as its requirements +have been satisfied. ==== <> resources managed by the SYCL application -The SYCL runtime integrated with the SYCL application will manage -the resources required by the <> -to manage the heterogeneous devices it is providing access to. +The SYCL runtime integrated with the SYCL application will manage the +resources required by the <> to manage the heterogeneous +devices it is providing access to. This includes, but is not limited to, resource handlers, memory pools, dispatch queues and other temporary handler objects. @@ -353,12 +380,13 @@ Construction of a SYCL object will typically entail the creation of multiple SYCL object. The overall rules for construction and destruction are detailed in <>. -Those <> with a <> document will detail how the resource -management from SYCL objects map down to the <> objects. +Those <> with a <> document will detail how +the resource management from SYCL objects map down to the <> +objects. -In SYCL, the minimum required object for submitting work to devices is -the <>, which contains references to a <>, <> -and a <> internally. +In SYCL, the minimum required object for submitting work to devices is the +<>, which contains references to a <>, <> and a +<> internally. The resources managed by SYCL are: @@ -368,133 +396,141 @@ The resources managed by SYCL are: // Also, references to the SYCL API are removed to make text independent // from changes in the programming - . <>: all features of <>s are implemented by - platforms. A platform can be viewed as a given vendor's runtime and the - devices accessible through it. Some devices will only be accessible to - one vendor's runtime and hence multiple platforms may be present. SYCL - manages the different platforms for the user which are accessible through a - [code]#sycl::platform# object. - . <>: any <> resource that is acquired by the user is - attached to a context. A context contains a collection of devices that - the host can use and manages memory objects that can be shared between - the devices. Devices belonging to the same <> must be able to - access each other's global memory using some implementation-specific - mechanism. A given context can only wrap devices owned by a single - platform. A context is exposed to the user with a - [code]#sycl::context# object. - . <>: platforms provide one or more devices for executing SYCL - kernels. In SYCL, a device is accessible through a - [code]#sycl::device# object. - . <>: the SYCL functions that run on SYCL devices are defined - as {cpp} function objects (a named function object type or a lambda - function). A kernel can be introspected through a - [code]#sycl::kernel# object. + . <>: all features of <>s are implemented + by platforms. + A platform can be viewed as a given vendor's runtime and the devices + accessible through it. + Some devices will only be accessible to one vendor's runtime and hence + multiple platforms may be present. + SYCL manages the different platforms for the user which are accessible + through a [code]#sycl::platform# object. + . <>: any <> resource that is acquired by the + user is attached to a context. + A context contains a collection of devices that the host can use and + manages memory objects that can be shared between the devices. + Devices belonging to the same <> must be able to access each + other's global memory using some implementation-specific mechanism. + A given context can only wrap devices owned by a single platform. + A context is exposed to the user with a [code]#sycl::context# object. + . <>: platforms provide one or more devices for executing + SYCL kernels. + In SYCL, a device is accessible through a [code]#sycl::device# object. + . <>: the SYCL functions that run on SYCL devices are + defined as {cpp} function objects (a named function object type or a + lambda function). + A kernel can be introspected through a [code]#sycl::kernel# object. + -- -Note that some <> may expose non-programmable functionality as -pre-defined kernels. +Note that some <> may expose non-programmable +functionality as pre-defined kernels. -- - . <>: Kernels are stored internally in the SYCL - application as device images, and these device images can be grouped into a - [code]#sycl::kernel_bundle# object. These objects provide a way for the - application to control the online compilation of kernels for devices. - . <>: SYCL kernels execute in command queues. The user must - create a [code]#sycl::queue# object, - which references an associated context, platform and - device. The context, platform and device may be chosen automatically, or + . <>: Kernels are stored internally in the + SYCL application as device images, and these device images can be + grouped into a [code]#sycl::kernel_bundle# object. + These objects provide a way for the application to control the online + compilation of kernels for devices. + . <>: SYCL kernels execute in command queues. + The user must create a [code]#sycl::queue# object, which references an + associated context, platform and device. + The context, platform and device may be chosen automatically, or specified by the user. SYCL queues execute <> on a particular device of a particular context, but can have dependencies from any device on any available <>. The SYCL implementation guarantees the correct initialization and -destruction of any resource handled by the underlying <>, except -for those the user has obtained manually via the SYCL interoperability API. +destruction of any resource handled by the underlying <>, +except for those the user has obtained manually via the SYCL +interoperability API. [[sec:command-groups-exec-order]] ==== SYCL command groups and execution order By default, SYCL queues execute kernel functions in an out-of-order fashion based on dependency information. -Developers only need to specify what data is required to execute a particular -kernel. The SYCL runtime will guarantee that kernels are executed in an order -that guarantees correctness. -By specifying access modes and types of memory, a directed acyclic dependency -graph (DAG) of kernels is built at runtime. This is achieved via the usage of -<> objects. A SYCL <> object defines a set -of requisites (_R_) and a kernel function (_k_). A <> is -_submitted_ to a queue when using the +Developers only need to specify what data is required to execute a +particular kernel. +The SYCL runtime will guarantee that kernels are executed in an order that +guarantees correctness. +By specifying access modes and types of memory, a directed acyclic +dependency graph (DAG) of kernels is built at runtime. +This is achieved via the usage of <> objects. +A SYCL <> object defines a set of requisites (_R_) and a +kernel function (_k_). +A <> is _submitted_ to a queue when using the [code]#sycl::queue::submit# member function. -A *requisite* (_r~i~_) is a requirement that must be fulfilled for -a kernel-function (_k_) to be executed on a particular device. +A *requisite* (_r~i~_) is a requirement that must be fulfilled for a +kernel-function (_k_) to be executed on a particular device. For example, a requirement may be that certain data is available on a device, or that another command group has finished execution. An implementation may evaluate the requirements of a command group at any point after it has been submitted. -The _processing of a command group_ is the process by which a SYCL -runtime evaluates all the requirements in a given _R_. +The _processing of a command group_ is the process by which a SYCL runtime +evaluates all the requirements in a given _R_. The SYCL runtime will execute _k_ only when all _r~i~_ are satisfied (i.e., when all requirements are satisfied). To simplify the notation, in the specification we refer to the set of -requirements of a command group named _foo_ as -_CG~foo~ = r~1~, {ldots}, r~n~_. +requirements of a command group named _foo_ as _CG~foo~ = r~1~, {ldots}, +r~n~_. The _evaluation of a requisite_ ({SYCLeval}(_r~i~_)) returns the status of the requisite, which can be _True_ or _False_. A _satisfied_ requisite implies the requirement is met. -{SYCLeval}(_r~i~_) never alters the requisite, only observes the current status. +{SYCLeval}(_r~i~_) never alters the requisite, only observes the current +status. The implementation may not block to check the requisite, and the same check can be performed multiple times. -An *action* (_a~i~_) is a collection of implementation-defined -operations that must be performed in order to satisfy a requisite. -The set of actions for a given <> _A_ is permitted -to be empty if no operation is required to satisfy the requirement. +An *action* (_a~i~_) is a collection of implementation-defined operations +that must be performed in order to satisfy a requisite. +The set of actions for a given <> _A_ is permitted to be +empty if no operation is required to satisfy the requirement. The notation _a~i~_ represents the action required to satisfy _r~i~_. -Actions of different requisites can be satisfied in any order with -respect to -each other without side effects (i.e., given two requirements _r~j~_ and _r~k~_, -_(r~j~, r~k~)_ {equiv} _(r~k~, r~j~)_). The intersection of two -actions is not necessarily empty. +Actions of different requisites can be satisfied in any order with respect +to each other without side effects (i.e., given two requirements _r~j~_ and +_r~k~_, _(r~j~, r~k~)_ {equiv} _(r~k~, r~j~)_). +The intersection of two actions is not necessarily empty. *Actions* can include (but are not limited to): memory copy operations, mapping operations, host side synchronization, or implementation-specific behavior. -Finally, _Performing an action_ ({SYCLperform}(_a~i~_)) executes the -action operations required to satisfy the requisite _r~j~_. Note that, after -{SYCLperform}(_a~i~_), the evaluation {SYCLeval}(_r~j~_) will return _True_ -until the kernel is executed. After the kernel execution, it is not defined -whether a different <> with the same requirements needs to -perform the action again, where actions of different requisites inside the -same <> object can be satisfied in any order with -respect to each +Finally, _Performing an action_ ({SYCLperform}(_a~i~_)) executes the action +operations required to satisfy the requisite _r~j~_. +Note that, after {SYCLperform}(_a~i~_), the evaluation {SYCLeval}(_r~j~_) +will return _True_ until the kernel is executed. +After the kernel execution, it is not defined whether a different +<> with the same requirements needs to perform the action +again, where actions of different requisites inside the same +<> object can be satisfied in any order with respect to each other without side effects: Given two requirements _r~j~_ and _r~k~_, {SYCLperform}(_a~j~_) followed by {SYCLperform}(_a~k~_) is equivalent to {SYCLperform}(_a~k~_) followed by {SYCLperform}(_a~j~_). -The requirements of different <> submitted to the same -or different queues are evaluated in the relative order of submission. -<> objects whose intersection of requirement sets is -not empty are said to depend on each other. +The requirements of different <> submitted to +the same or different queues are evaluated in the relative order of +submission. +<> objects whose intersection of requirement sets is not +empty are said to depend on each other. They are executed in order of submission to the queue. -If <> are submitted to different queues or by multiple -threads, the order of execution is determined by the SYCL runtime. +If <> are submitted to different queues or by +multiple threads, the order of execution is determined by the SYCL runtime. Note that independent <> objects can be submitted simultaneously without affecting dependencies. <> illustrates the execution order of three <> objects (_CG~a~,CG~b~,CG~c~_) with certain requirements submitted to the same queue. -Both _CG~a~_ and _CG~b~_ only have one requirement, _r~1~_ and _r~2~_ respectively. +Both _CG~a~_ and _CG~b~_ only have one requirement, _r~1~_ and _r~2~_ +respectively. _CG~c~_ requires both _r~1~_ and _r~2~_. This enables the SYCL runtime to potentially execute _CG~a~_ and _CG~b~_ -simultaneously, whereas _CG~c~_ cannot be executed until both _CG~a~_ and _CG~b~_ -have been completed. -The SYCL runtime evaluates the *requisites* and performs the -*actions* required (if any) for the _CG~a~_ and _CG~b~_. -When evaluating the *requisites* of _CG~c~_, they will be satisfied -once the _CG~a~_ and _CG~b~_ have finished. +simultaneously, whereas _CG~c~_ cannot be executed until both _CG~a~_ and +_CG~b~_ have been completed. +The SYCL runtime evaluates the *requisites* and performs the *actions* +required (if any) for the _CG~a~_ and _CG~b~_. +When evaluating the *requisites* of _CG~c~_, they will be satisfied once the +_CG~a~_ and _CG~b~_ have finished. // Formerly in three_cg_one_queue.tex @@ -519,16 +555,17 @@ syclQueue.submit(_CG~c~(r~1~,r~2~)_); image::{images}/three-cg-one-queue.svg[align="center",opts="{imageopts}"] |==== -<> uses three separate SYCL queue objects -to submit the same <> objects as before. -Regardless of using three different queues, the execution order -of the different <> objects is the same. -When different threads enqueue to different queues, the execution order -of the command group will be the order in which the submit member functions are executed. +<> uses three separate SYCL queue objects to +submit the same <> objects as before. +Regardless of using three different queues, the execution order of the +different <> objects is the same. +When different threads enqueue to different queues, the execution order of +the command group will be the order in which the submit member functions are +executed. In this case, since the different <> objects execute on -different devices, the *actions* required to satisfy the -*requirements* may be different (e.g, the SYCL runtime may -need to copy data to a different device in a separate context). +different devices, the *actions* required to satisfy the *requirements* may +be different (e.g, the SYCL runtime may need to copy data to a different +device in a separate context). // Formerly in three_cg_three_queue.tex @@ -558,159 +595,178 @@ image::{images}/three-cg-three-queue.svg[align="center",opts="{imageopts}"] ==== Controlling execution order with events -Submitting an action for execution returns an [code]#event# object. Programmers -may use these events to explicitly synchronize programs. Host code can wait for an -event to complete, which will block execution on the host until the action represented -by the event has completed. The [code]#event# class is described in greater detail -in <>. - -Events may also be used to explicitly order the execution of kernels. Host code may -wait for the completion of specific event, which blocks execution on the host until -that event's action has completed. Events may also define requisites between -<>. Using events in this manner informs the runtime -that one or more <> must complete before another -<> may begin executing. See <> for -greater detail. +Submitting an action for execution returns an [code]#event# object. +Programmers may use these events to explicitly synchronize programs. +Host code can wait for an event to complete, which will block execution on +the host until the action represented by the event has completed. +The [code]#event# class is described in greater detail in +<>. + +Events may also be used to explicitly order the execution of kernels. +Host code may wait for the completion of specific event, which blocks +execution on the host until that event's action has completed. +Events may also define requisites between <>. +Using events in this manner informs the runtime that one or more +<> must complete before another +<> may begin executing. +See <> for greater detail. === SYCL kernel execution model When a kernel is submitted for execution, an index space is defined. An instance of the kernel body executes for each point in this index space. This kernel instance is called a <> and is identified by its -point in the index space, which provides a <> for the work-item. Each -work-item executes the same code but the specific execution pathway through the -code and the data operated upon can vary by using the work-item global id to -specialize the computation. - -An index space of size zero is allowed. All aspects of kernel execution proceed -as normal with the exception that the kernel function itself is not executed. -Note this means the command queue will still schedule this kernel after satisfying -the requirements and this satisfies requirements of any dependent enqueued kernels. +point in the index space, which provides a <> for the work-item. +Each work-item executes the same code but the specific execution pathway +through the code and the data operated upon can vary by using the work-item +global id to specialize the computation. + +An index space of size zero is allowed. +All aspects of kernel execution proceed as normal with the exception that +the kernel function itself is not executed. +Note this means the command queue will still schedule this kernel after +satisfying the requirements and this satisfies requirements of any dependent +enqueued kernels. ==== Basic kernels SYCL allows a simple execution model in which a kernel is invoked over an -_N_-dimensional index space defined by [code]#range#, where _N_ is one, two -or three. Each work-item in such a kernel executes independently. +_N_-dimensional index space defined by [code]#range#, where _N_ is one, +two or three. +Each work-item in such a kernel executes independently. -Each work-item is identified by a value of type [code]#item#. The type -[code]#item# encapsulates a work-item identifier of type [code]#id# and -a [code]#range# representing the number of work-items executing the kernel. +Each work-item is identified by a value of type [code]#item#. +The type [code]#item# encapsulates a work-item identifier of type +[code]#id# and a [code]#range# representing the number of work-items +executing the kernel. ==== ND-range kernels -Work-items can be organized into <>, providing a more -coarse-grained decomposition of the index space. Each work-group is assigned a -unique <> with the same dimensionality as the index space used for -the work-items. Work-items are each assigned a <>, unique within the -work-group, so that a single work-item can be uniquely identified by its global -id or by a combination of its local id and work-group id. The work-items in a -given work-group execute on the processing elements of a single compute unit. - -When work-groups are used in SYCL, the index space is called an <>. -An ND-range is an -_N_-dimensional index space, where _N_ is one, two or three. In -SYCL, the ND-range is represented via the [code]#nd_range# class. An -[code]#nd_range# is made up of a global range and a local range, each +Work-items can be organized into <>, providing a +more coarse-grained decomposition of the index space. +Each work-group is assigned a unique <> with the same +dimensionality as the index space used for the work-items. +Work-items are each assigned a <>, unique within the work-group, +so that a single work-item can be uniquely identified by its global id or by +a combination of its local id and work-group id. +The work-items in a given work-group execute on the processing elements of a +single compute unit. + +When work-groups are used in SYCL, the index space is called an +<>. +An ND-range is an _N_-dimensional index space, where _N_ is one, two or +three. +In SYCL, the ND-range is represented via the [code]#nd_range# class. +An [code]#nd_range# is made up of a global range and a local range, each represented via values of type [code]#range#. -Additionally, there can be a global offset, represented via a value of type [code]#id#; this is deprecated in SYCL 2020. The types -[code]#range# and [code]#id# are each _N_-element -arrays of integers. The iteration space defined via an [code]#nd_range# -is an _N_-dimensional index space starting at the ND-range's global -offset whose size is its global range, split into work-groups of the -size of its local range. +Additionally, there can be a global offset, represented via a value of type +[code]#id#; this is deprecated in SYCL 2020. +The types [code]#range# and [code]#id# are each _N_-element arrays of +integers. +The iteration space defined via an [code]#nd_range# is an _N_-dimensional +index space starting at the ND-range's global offset whose size is its +global range, split into work-groups of the size of its local range. Each work-item in the ND-range is identified by a value of type -[code]#nd_item#. The type [code]#nd_item# encapsulates a -global id, local id and work-group id, all of type [code]#id# -(the iteration space offset also of type [code]#id#, but this is deprecated in SYCL 2020), as well as -global and local ranges and synchronization operations necessary to -make work-groups useful. Work-groups are assigned ids using a similar -approach to that used for work-item global ids. Work-items are -assigned to a work-group and given a local id with components in the -range from zero to the size of the work-group in that dimension minus -one. Hence, the combination of a work-group id and the local id -within a work-group uniquely defines a work-item. +[code]#nd_item#. +The type [code]#nd_item# encapsulates a global id, local id and +work-group id, all of type [code]#id# (the iteration space offset also of +type [code]#id#, but this is deprecated in SYCL 2020), as well as global +and local ranges and synchronization operations necessary to make +work-groups useful. +Work-groups are assigned ids using a similar approach to that used for +work-item global ids. +Work-items are assigned to a work-group and given a local id with components +in the range from zero to the size of the work-group in that dimension minus +one. +Hence, the combination of a work-group id and the local id within a +work-group uniquely defines a work-item. ==== Backend-specific kernels SYCL allows a <> to expose fixed functionality as -non-programmable built-in kernels. The availability and behavior of these -built-in kernels are <>-specific, and are not required to follow the -SYCL execution and memory models. Furthermore the interface exposed utilize -these built-in kernels is also <>-specific. +non-programmable built-in kernels. +The availability and behavior of these built-in kernels are +<>-specific, and are not required to follow the SYCL execution and +memory models. +Furthermore the interface exposed utilize these built-in kernels is also +<>-specific. See the relevant backend specification for details. [[sec:memory.model]] == Memory model -Since SYCL is a single-source programming model, the memory model affects both -the application and the device kernel parts of a program. +Since SYCL is a single-source programming model, the memory model affects +both the application and the device kernel parts of a program. On the SYCL application, the SYCL runtime will make sure data is available for execution of the kernels. On the SYCL device kernel, the <> rules describing how the memory -behaves on a specific device are mapped to SYCL {cpp} constructs. Thus it is -possible to program kernels efficiently in pure {cpp}. +behaves on a specific device are mapped to SYCL {cpp} constructs. +Thus it is possible to program kernels efficiently in pure {cpp}. [[sub.section.memmodel.app]] === SYCL application memory model -The application running on the host uses SYCL <> objects using instances of -the [code]#sycl::buffer# class or <> allocation functions -to allocate memory in the global address -space, or can allocate specialized image memory using the -[code]#sycl::unsampled_image# and [code]#sycl::sampled_image# classes. +The application running on the host uses SYCL <> objects using +instances of the [code]#sycl::buffer# class or <> allocation functions +to allocate memory in the global address space, or can allocate specialized +image memory using the [code]#sycl::unsampled_image# and +[code]#sycl::sampled_image# classes. In the SYCL application, memory objects are bound to all devices in which they are used, regardless of the SYCL context where they reside. -SYCL memory objects (namely, <> and <> objects) -can encapsulate multiple underlying <> memory objects together with +SYCL memory objects (namely, <> and <> objects) can +encapsulate multiple underlying <> memory objects together with multiple host memory allocations to enable the same object to be shared -between devices in different contexts, platforms or backends. <> -allocations uniquely identify a memory allocation and are bound to a SYCL context. +between devices in different contexts, platforms or backends. +<> allocations uniquely identify a memory allocation and are bound to a +SYCL context. They are only valid on the backend used by the context. The order of execution of <> objects ensures a sequentially consistent access to the memory from the different devices to the memory -objects. Accessing a USM allocation does not alter the order of execution. +objects. +Accessing a USM allocation does not alter the order of execution. Users must explicitly inform the SYCL runtime of any requirements necessary for a legal execution. -To access a memory object, the user must create an <> object -which parameterizes the type of access to the memory object that a kernel or -the host requires. The <> object defines a requirement to access -a memory object, and this requirement is defined by construction of an -accessor, regardless of whether there are any uses in a kernel or by the -host. An accessor object specifies whether the -access is via global memory, constant memory or image samplers and their -associated access functions. The <> also specifies whether the -access is read-only (RO), write-only (WO) or read-write (RW). An optional -[code]#no_init# property can be added to an accessor to tell the system to -discard any previous contents of the data the accessor refers to, so there -are two additional requirement types: no-init-write-only (NWO) and -no-init-read-write (NRW). For simplicity, when a *requisite* represents an -accessor object in a certain access mode, we represent it as -MemoryObject~AccessMode~. For example, an accessor that -accesses memory object *buf1* in *RW* mode is represented as -_buf1~RW~_. A <> object that uses such an accessor is -represented as _CG(buf1~RW~)_. The *action* required to satisfy a -requisite and the location of the latest copy of a memory object will vary -depending on the implementation. - -<> illustrates an example where -<> objects are enqueued to two separate SYCL queues -executing in devices in different contexts. The *requisites* for the -<> execution are the same, but the *actions* to -satisfy them are different. For example, if the data is on the host before -execution, _A(b1~RW~)_ and _A(b2~RW~)_ can potentially be implemented as -copy operations from the host memory to [code]#context1# or -[code]#context2# respectively. After _CG~a~_ and _CG~b~_ are executed, -_A'(b1~RW~)_ will likely be an empty operation, since the result of the -kernel can stay on the device. On the other hand, the results of _CG~b~_ are -now on a different context than _CG~c~_ is executing, therefore _A'(b2~RW~)_ -will need to copy data across two separate contexts using an -implementation specific mechanism. +To access a memory object, the user must create an <> object which +parameterizes the type of access to the memory object that a kernel or the +host requires. +The <> object defines a requirement to access a memory object, and +this requirement is defined by construction of an accessor, regardless of +whether there are any uses in a kernel or by the host. +An accessor object specifies whether the access is via global memory, +constant memory or image samplers and their associated access functions. +The <> also specifies whether the access is read-only (RO), +write-only (WO) or read-write (RW). +An optional [code]#no_init# property can be added to an accessor to tell the +system to discard any previous contents of the data the accessor refers to, +so there are two additional requirement types: no-init-write-only (NWO) and +no-init-read-write (NRW). +For simplicity, when a *requisite* represents an accessor object in a +certain access mode, we represent it as MemoryObject~AccessMode~. +For example, an accessor that accesses memory object *buf1* in *RW* mode is +represented as _buf1~RW~_. +A <> object that uses such an accessor is represented as +_CG(buf1~RW~)_. +The *action* required to satisfy a requisite and the location of the latest +copy of a memory object will vary depending on the implementation. + +<> illustrates an example where <> +objects are enqueued to two separate SYCL queues executing in devices in +different contexts. +The *requisites* for the <> execution are the same, but the +*actions* to satisfy them are different. +For example, if the data is on the host before execution, _A(b1~RW~)_ and +_A(b2~RW~)_ can potentially be implemented as copy operations from the host +memory to [code]#context1# or [code]#context2# respectively. +After _CG~a~_ and _CG~b~_ are executed, _A'(b1~RW~)_ will likely be an empty +operation, since the result of the kernel can stay on the device. +On the other hand, the results of _CG~b~_ are now on a different context +than _CG~c~_ is executing, therefore _A'(b2~RW~)_ will need to copy data +across two separate contexts using an implementation specific mechanism. // TODO : The example below mentions OpenCL but I think is illustrative of a // potential implementation and behavior so I am inclined to leave it there @@ -748,23 +804,25 @@ image::{images}/device_to_device2.svg[align="center",opts="{imageopts}"] <> shows actions performed when three command groups are submitted to two distinct queues, and potential implementation in an OpenCL -<> by a SYCL runtime. Note that in this example, each SYCL buffer -(_b2,b2_) is implemented as separate [code]#cl_mem# objects per -context. +<> by a SYCL runtime. +Note that in this example, each SYCL buffer (_b2,b2_) is implemented as +separate [code]#cl_mem# objects per context. Note that the order of the definition of the accessors within the <> is irrelevant to the requirements they define. -All accessors always apply to the entire <> object where -they are defined. +All accessors always apply to the entire <> object where they +are defined. -When multiple <> in the same <> define different -requisites to the same memory object these requisites must be resolved. +When multiple <> in the same <> define +different requisites to the same memory object these requisites must be +resolved. -Firstly, any requisites with different access modes but the same access target -are resolved into a single requisite with the union of the different access -modes according to <>. The atomic access mode acts -as if it was read-write (RW) when determining the combined requirement. The -rules in <> are commutative and associative. +Firstly, any requisites with different access modes but the same access +target are resolved into a single requisite with the union of the different +access modes according to <>. +The atomic access mode acts as if it was read-write (RW) when determining +the combined requirement. +The rules in <> are commutative and associative. [[table.access.mode.union]] .Combined requirement from two different accessor access modes within the same <>. The rules are commutative and associative @@ -783,16 +841,16 @@ rules in <> are commutative and associative. | no-init-read-write (NRW) | read-write (RW) | read-write (RW) |==== -The result of this should be that there should not be any requisites with the -same access target. +The result of this should be that there should not be any requisites with +the same access target. -Secondly, the remaining requisites must adhere to the following rule. Only -one of the requisites may have write access (_W_ or _RW_), otherwise the -<> must throw an exception. All requisites create a -requirement for the data they represent to be made available in the specified -access target, however only the requisite with write access determines the side -effects of the <>, i.e. only the data which that requisite -represents will be updated. +Secondly, the remaining requisites must adhere to the following rule. +Only one of the requisites may have write access (_W_ or _RW_), otherwise +the <> must throw an exception. +All requisites create a requirement for the data they represent to be made +available in the specified access target, however only the requisite with +write access determines the side effects of the <>, i.e. only +the data which that requisite represents will be updated. For example: @@ -804,21 +862,21 @@ Where _G_ and _C_ correspond to a [code]#target::device# and [code]#target::constant_buffer# accessor and _H_ corresponds to a host accessor. -A buffer created from a range of an existing buffer is called -a [keyword]#sub-buffer#. +A buffer created from a range of an existing buffer is called a +[keyword]#sub-buffer#. A buffer may be overlaid with any number of sub-buffers. Accessors can be created to operate on these [keyword]#sub-buffers#. -Refer to <> for details on [keyword]#sub-buffer# -creation and restrictions. -A requirement to access a sub-buffer is represented by specifying its -range, e.g. _CG(b1~RW,[0,5)~)_ represents the requirement of accessing -the range _[0,5)_ buffer _b1_ in read write mode. - -If two accessors are constructed to -access the same buffer, but both are to non-overlapping sub-buffers of the -buffer, then the two accessors are said to not [keyword]#overlap#, otherwise the -accessors do overlap. Overlapping is the test that is used to determine the -scheduling order of command groups. +Refer to <> for details on [keyword]#sub-buffer# creation +and restrictions. +A requirement to access a sub-buffer is represented by specifying its range, +e.g. _CG(b1~RW,[0,5)~)_ represents the requirement of accessing the range +_[0,5)_ buffer _b1_ in read write mode. + +If two accessors are constructed to access the same buffer, but both are to +non-overlapping sub-buffers of the buffer, then the two accessors are said +to not [keyword]#overlap#, otherwise the accessors do overlap. +Overlapping is the test that is used to determine the scheduling order of +command groups. Command-groups with non-overlapping requirements may execute concurrently. // Formerly in overlap.tex @@ -846,29 +904,29 @@ q1.submit(_CG~c~(b1~RW,[5,15)~)_); image::{images}/overlap.svg[align="center",opts="{imageopts}"] |==== -It is permissible for command groups that only read data to not copy that data -back to the host or other devices after reading and for the runtime to maintain -multiple read-only copies of the data on multiple devices. +It is permissible for command groups that only read data to not copy that +data back to the host or other devices after reading and for the runtime to +maintain multiple read-only copies of the data on multiple devices. A special case of requirement is the one defined by a *host accessor*. -Host accessors are represented with -_H(MemoryObject~AccessMode~)_, e.g, +Host accessors are represented with _H(MemoryObject~AccessMode~)_, e.g, _H(b1~RW~)_ represents a host accessor to _b1_ in read-write mode. Host accessors are a special type of accessor constructed from a memory object outside a command group, and require that the data associated with the given memory object is available on the host in the given pointer. This causes the runtime to block on construction of this object until the requirement has been satisfied. -*Host accessor* objects are effectively barriers on all accesses to -a certain memory object. -<> shows an example of multiple command groups -enqueued to the same queue. Once the host accessor _H(b1~RW~)_ is reached, -the execution cannot proceed until _CG~a~_ is finished. +*Host accessor* objects are effectively barriers on all accesses to a +certain memory object. +<> shows an example of multiple command groups enqueued to the +same queue. +Once the host accessor _H(b1~RW~)_ is reached, the execution cannot proceed +until _CG~a~_ is finished. However, _CG~b~_ does not have any requirements on _b1_, therefore, it can execute concurrently with the barrier. -Finally, _CG~c~_ will be enqueued after _H(b1~RW~)_ is finished, -but still has to wait for _CG~b~_ to conclude for all its requirements to -be satisfied. +Finally, _CG~c~_ will be enqueued after _H(b1~RW~)_ is finished, but still +has to wait for _CG~b~_ to conclude for all its requirements to be +satisfied. See <> for details on synchronization rules. // Formerly in host_acc.tex @@ -903,116 +961,132 @@ image::{images}/host-acc.svg[align="center",opts="{imageopts}"] === SYCL device memory model The memory model for SYCL devices is based on the OpenCL 1.2 memory model. -Work-items executing in a kernel have access to three distinct address spaces -(memory regions) and a virtual address space overlapping some concrete address spaces: +Work-items executing in a kernel have access to three distinct address +spaces (memory regions) and a virtual address space overlapping some +concrete address spaces: - * <> is accessible to all work-items in all work-groups. + * <> is accessible to all work-items in all + work-groups. Work-items can read from or write to any element of a global memory - object. Reads and writes to global memory may be cached depending on the - capabilities of the device. Global memory is persistent across kernel - invocations. Concurrent access to a location in an USM allocation by two or more executing - kernels where at least one kernel modifies that location is a data race; there is no guarantee - of correct results unless <> and atomic operations are used. - * <> is accessible to all work-items in a single - work-group. Attempting to access local memory in one work-group from - another work-group results in undefined behavior. This memory region can be - used to allocate variables that are shared by all work-items in a - work-group. Work-group-level visibility allows local memory to be - implemented as dedicated regions of the device memory where this is - appropriate. - * <> is a region of memory private to a work-item. - Attempting to access private memory in one work-item from another work-item - results in undefined behavior. - * <> is a virtual address space which overlaps the - global, local and private address spaces. Therefore, an object that resides - in the global, local, or private address space can also be accessed through - the generic address space. + object. + Reads and writes to global memory may be cached depending on the + capabilities of the device. + Global memory is persistent across kernel invocations. + Concurrent access to a location in an USM allocation by two or more + executing kernels where at least one kernel modifies that location is a + data race; there is no guarantee of correct results unless <> + and atomic operations are used. + * <> is accessible to all work-items in a + single work-group. + Attempting to access local memory in one work-group from another + work-group results in undefined behavior. + This memory region can be used to allocate variables that are shared by + all work-items in a work-group. + Work-group-level visibility allows local memory to be implemented as + dedicated regions of the device memory where this is appropriate. + * <> is a region of memory private to a + work-item. + Attempting to access private memory in one work-item from another + work-item results in undefined behavior. + * <> is a virtual address space which + overlaps the global, local and private address spaces. + Therefore, an object that resides in the global, local, or private + address space can also be accessed through the generic address space. ==== Access to memory -Accessors in the device kernels provide access to the memory objects, -acting as pointers to the corresponding address space. +Accessors in the device kernels provide access to the memory objects, acting +as pointers to the corresponding address space. Pointers can be passed directly as kernel arguments if an implementation -supports <>. See <> for information on when it is legal -to dereference pointers passed from the host inside kernels. +supports <>. +See <> for information on when it is legal to dereference pointers +passed from the host inside kernels. -To allocate local memory within a kernel, the user can either pass -a [code]#sycl::local_accessor# object as a argument to an ND-range -kernel (that has a user-defined work-group size), or -can define a variable in work-group scope inside -[code]#sycl::parallel_for_work_group#. +To allocate local memory within a kernel, the user can either pass a +[code]#sycl::local_accessor# object as a argument to an ND-range kernel +(that has a user-defined work-group size), or can define a variable in +work-group scope inside [code]#sycl::parallel_for_work_group#. Any variable defined inside a [code]#sycl::parallel_for# scope or [code]#sycl::parallel_for_work_item# scope will be allocated in private -memory. Any variable defined inside a [code]#sycl::parallel_for_work_group# -scope will be allocated in local memory. +memory. +Any variable defined inside a [code]#sycl::parallel_for_work_group# scope +will be allocated in local memory. -Users can create accessors that reference sub-buffers as well as entire buffers. +Users can create accessors that reference sub-buffers as well as entire +buffers. Within kernels, the underlying {cpp} pointer types can be obtained from an -accessor. The pointer types will contain a compile-time deduced address space. -So, for example, if a {cpp} pointer is obtained from an accessor to global memory, -the {cpp} pointer type will have a global address space attribute attached to it. +accessor. +The pointer types will contain a compile-time deduced address space. +So, for example, if a {cpp} pointer is obtained from an accessor to global +memory, the {cpp} pointer type will have a global address space attribute +attached to it. The address space attribute will be compile-time propagated to other pointer -values when one pointer is initialized to another pointer value using a defined -algorithm. - -When developers need to explicitly state the address space of a pointer value, -one of the explicit pointer classes can be used. There is a different explicit -pointer class for each address space: [code]#sycl::raw_local_ptr#, -[code]#sycl::raw_global_ptr#, [code]#sycl::raw_private_ptr#, -[code]#sycl::raw_generic_ptr#, -[code]#sycl::decorated_local_ptr#, -[code]#sycl::decorated_global_ptr#, [code]#sycl::decorated_private_ptr#, -or [code]#sycl::decorated_generic_ptr#. +values when one pointer is initialized to another pointer value using a +defined algorithm. + +When developers need to explicitly state the address space of a pointer +value, one of the explicit pointer classes can be used. +There is a different explicit pointer class for each address space: +[code]#sycl::raw_local_ptr#, [code]#sycl::raw_global_ptr#, +[code]#sycl::raw_private_ptr#, [code]#sycl::raw_generic_ptr#, +[code]#sycl::decorated_local_ptr#, [code]#sycl::decorated_global_ptr#, +[code]#sycl::decorated_private_ptr#, or [code]#sycl::decorated_generic_ptr#. The classes with the [code]#decorated# prefix expose pointers that use an implementation-defined address space decoration, while the classes with the -[code]#raw# prefix do not. Buffer accessors with an access target -[code]#target::device# or [code]#target::constant_buffer# and local accessors -can be converted into explicit pointer classes ([code]#multi_ptr#). +[code]#raw# prefix do not. +Buffer accessors with an access target [code]#target::device# or +[code]#target::constant_buffer# and local accessors can be converted into +explicit pointer classes ([code]#multi_ptr#). For templates that need to adapt to different address spaces, a -[code]#sycl::multi_ptr# class is defined which is templated -via a compile-time constant enumerator value to specify the address space. +[code]#sycl::multi_ptr# class is defined which is templated via a +compile-time constant enumerator value to specify the address space. [[sec:memoryconsistency]] === SYCL memory consistency model -The SYCL memory consistency model is based upon the memory consistency -model of the {cpp} core language. Where SYCL offers extensions to classes and -functions that may affect memory consistency, the default behavior when these -extensions are not used always matches the behavior of standard {cpp}. - -A SYCL implementation must guarantee that the same memory consistency model is -used across host and device code. Every <> must support the -memory model defined by the minimum version of {cpp} described in -<>; SYCL implementations supporting -additional versions of {cpp} must also support the corresponding memory models. - -Within a work-item, operations are ordered according to the _sequenced before_ -relation defined by the {cpp} core language. - -Ensuring memory consistency across different work-items requires careful usage -of <> operations, <> operations and atomic -operations. The ordering of operations across different work-items is -determined by the _happens before_ relation defined by the {cpp} core language, -with a single relation governing all address spaces (memory regions). - -On any SYCL device, local and global memory may be made consistent -across work-items in a single <> through use of a <> -operation. On SYCL devices supporting acquire-release or sequentially -consistent memory orderings, all memory visible to a set of work-items may be -made consistent across the work-items in that set through the use of -<> and atomic operations. +The SYCL memory consistency model is based upon the memory consistency model +of the {cpp} core language. +Where SYCL offers extensions to classes and functions that may affect memory +consistency, the default behavior when these extensions are not used always +matches the behavior of standard {cpp}. + +A SYCL implementation must guarantee that the same memory consistency model +is used across host and device code. +Every <> must support the memory model defined by the +minimum version of {cpp} described in <>; +SYCL implementations supporting additional versions of {cpp} must also +support the corresponding memory models. + +Within a work-item, operations are ordered according to the _sequenced +before_ relation defined by the {cpp} core language. + +Ensuring memory consistency across different work-items requires careful +usage of <> operations, <> operations and atomic +operations. +The ordering of operations across different work-items is determined by the +_happens before_ relation defined by the {cpp} core language, with a single +relation governing all address spaces (memory regions). + +On any SYCL device, local and global memory may be made consistent across +work-items in a single <> through use of a <> +operation. +On SYCL devices supporting acquire-release or sequentially consistent memory +orderings, all memory visible to a set of work-items may be made consistent +across the work-items in that set through the use of <> and +atomic operations. Memory consistency between the host and SYCL device(s), or different SYCL devices in the same context, can be guaranteed through synchronization in -the host application as defined in <>. On SYCL devices -supporting concurrent atomic accesses to USM allocations and acquire-release or sequentially -consistent memory orderings, cross-device memory consistency can be -enforced through the use of <> and atomic operations. +the host application as defined in <>. +On SYCL devices supporting concurrent atomic accesses to USM allocations and +acquire-release or sequentially consistent memory orderings, cross-device +memory consistency can be enforced through the use of <> and +atomic operations. [[sec:memory-ordering]] ==== Memory ordering @@ -1022,9 +1096,9 @@ enforced through the use of <> and atomic operations. include::{header_dir}/memoryOrder.h[lines=4..-1] ---- -The memory synchronization order of a given atomic operation is controlled by a -[code]#sycl::memory_order# parameter, which can take one of the following -values: +The memory synchronization order of a given atomic operation is controlled +by a [code]#sycl::memory_order# parameter, which can take one of the +following values: * [code]#sycl::memory_order::relaxed#; * [code]#sycl::memory_order::acquire#; @@ -1032,23 +1106,25 @@ values: * [code]#sycl::memory_order::acq_rel#; * [code]#sycl::memory_order::seq_cst#. -The meanings of these values are identical to those defined in the {cpp} core -language. +The meanings of these values are identical to those defined in the {cpp} +core language. These memory orders are listed above from weakest -([code]#memory_order::relaxed#) to strongest ([code]#memory_order::seq_cst#). +([code]#memory_order::relaxed#) to strongest +([code]#memory_order::seq_cst#). The complete set of memory orders is not guaranteed to be supported by every -device, nor across all combinations of devices within a platform. The set of -supported memory orders can be queried via the information descriptors for the -[code]#sycl::device# and [code]#sycl::context# classes. +device, nor across all combinations of devices within a platform. +The set of supported memory orders can be queried via the information +descriptors for the [code]#sycl::device# and [code]#sycl::context# classes. [NOTE] ==== SYCL implementations are not required to support a memory order equivalent to [code]#std::memory_order::consume#, and using this ordering within a SYCL -device kernel results in undefined behavior. Developers are encouraged to use -[code]#sycl::memory_order::acquire# instead. +device kernel results in undefined behavior. +Developers are encouraged to use [code]#sycl::memory_order::acquire# +instead. ==== [[sec:memory-scope]] @@ -1069,92 +1145,95 @@ values: * [code]#sycl::memory_scope::sub_group# The ordering constraint applies only to work-items in the same <> as the calling work-item; * [code]#sycl::memory_scope::work_group# The ordering constraint applies - only to work-items in the same <> as the calling - work-item; + only to work-items in the same <> as the calling work-item; * [code]#sycl::memory_scope::device# The ordering constraint applies only to work-items executing on the same device as the calling work-item; - * [code]#sycl::memory_scope::system# The ordering constraint applies to any - work-item or host thread in the system that is currently permitted to - access the memory allocation containing the referenced object, as + * [code]#sycl::memory_scope::system# The ordering constraint applies to + any work-item or host thread in the system that is currently permitted + to access the memory allocation containing the referenced object, as defined by the capabilities of <> and <>. The memory scopes are listed above from narrowest ([code]#memory_scope::work_item#) to widest ([code]#memory_scope::system#). The complete set of memory scopes is not guaranteed to be supported by every -device. The set of supported memory scopes can be queried via the information +device. +The set of supported memory scopes can be queried via the information descriptors for the [code]#sycl::device# and [code]#sycl::context# classes. The widest scope that can be applied to an atomic operation corresponds to -the set of work-items which can access the associated memory location. For -example, the widest scope that can be applied to atomic operations in -work-group local memory is [code]#sycl::memory_scope::work_group#. If a -wider scope is supplied, the behavior is as-if the narrowest scope containing -all work-items which can access the associated memory location was supplied. +the set of work-items which can access the associated memory location. +For example, the widest scope that can be applied to atomic operations in +work-group local memory is [code]#sycl::memory_scope::work_group#. +If a wider scope is supplied, the behavior is as-if the narrowest scope +containing all work-items which can access the associated memory location +was supplied. [NOTE] ==== The addition of memory scopes to the {cpp} memory model modifies the -definition of some concepts from the {cpp} core language. For example: -data races, the synchronizes-with relationship and sequential -consistency must be defined in a way that accounts for atomic -operations with differing (but compatible) scopes, in a manner -similar to the <>. Efforts to -formalize the memory model of SYCL are ongoing, and a formal memory model -will be included in a future version of the SYCL specification. +definition of some concepts from the {cpp} core language. +For example: data races, the synchronizes-with relationship and sequential +consistency must be defined in a way that accounts for atomic operations +with differing (but compatible) scopes, in a manner similar to the +<>. +Efforts to formalize the memory model of SYCL are ongoing, and a formal +memory model will be included in a future version of the SYCL specification. ==== ==== Atomic operations -Atomic operations can be performed on memory in buffers and USM. The -[code]#sycl::atomic_ref# class must be used to provide safe atomic access -to the buffer or USM allocation from device code. +Atomic operations can be performed on memory in buffers and USM. +The [code]#sycl::atomic_ref# class must be used to provide safe atomic +access to the buffer or USM allocation from device code. ==== Forward progress -This section, and any subsequent section referring to progress guarantees, uses -the following terms as defined in the {cpp} core language: thread of +This section, and any subsequent section referring to progress guarantees, +uses the following terms as defined in the {cpp} core language: thread of execution; weakly parallel forward progress guarantees; parallel forward progress guarantees; concurrent forward progress guarantees; and block with forward progress guarantee delegation. Each work-item in SYCL is a separate thread of execution, providing at least -weakly parallel forward progress guarantees. Whether work-items provide -stronger forward progress guarantees is implementation-defined. +weakly parallel forward progress guarantees. +Whether work-items provide stronger forward progress guarantees is +implementation-defined. All implementations must additionally ensure that a work-item arriving at a -<> does not prevent other work-items in the same -group from making progress. When a work-item arrives at a group barrier acting -on group _G_, implementations must eventually select and potentially strengthen -another work-item in group _G_ that has not yet arrived at the barrier. +<> does not prevent other work-items in the +same group from making progress. +When a work-item arrives at a group barrier acting on group _G_, +implementations must eventually select and potentially strengthen another +work-item in group _G_ that has not yet arrived at the barrier. -When a host thread blocks on the completion of a command previously submitted -to a SYCL queue (for example, via the [code]#sycl::queue::wait# function), it -blocks with forward progress guarantee delegation. +When a host thread blocks on the completion of a command previously +submitted to a SYCL queue (for example, via the [code]#sycl::queue::wait# +function), it blocks with forward progress guarantee delegation. [NOTE] ==== -SYCL commands submitted to a queue are not guaranteed to begin executing until -a host thread blocks on their completion. In the absence of multiple host -threads, there is no guarantee that host and device code will execute -concurrently. +SYCL commands submitted to a queue are not guaranteed to begin executing +until a host thread blocks on their completion. +In the absence of multiple host threads, there is no guarantee that host and +device code will execute concurrently. ==== // Later, this label will move onto a new subsection - see below [[sec:progmodel.cpp]] == The SYCL programming model -A SYCL program is written in standard {cpp}. Host code and device code is -written in the same {cpp} source file, enabling instantiation of templated -kernels from host code and also enabling kernel source code to be shared -between host and device. +A SYCL program is written in standard {cpp}. +Host code and device code is written in the same {cpp} source file, enabling +instantiation of templated kernels from host code and also enabling kernel +source code to be shared between host and device. The device kernels are encapsulated {cpp} callable types (a function object -with [code]#operator()# or a lambda function), which have -been designated to be compiled as SYCL kernels. +with [code]#operator()# or a lambda function), which have been designated to +be compiled as SYCL kernels. -SYCL programs target heterogeneous systems. The kernels may be compiled and -optimized for multiple different processor architectures with very different -binary representations. +SYCL programs target heterogeneous systems. +The kernels may be compiled and optimized for multiple different processor +architectures with very different binary representations. // TODO: Add \subsection{SYCL {cpp} language requirements} before merging @@ -1165,49 +1244,56 @@ binary representations. === Minimum version of {cpp} The {cpp} features used in SYCL are based on a specific version of {cpp}. -Implementations of SYCL must support this minimum {cpp} version, which defines the -{cpp} constructs that can consequently be used by SYCL feature definitions -(for example, lambdas). - -The minimum {cpp} version of this SYCL specification is determined by the normative {cpp} -core language defined in <>. All implementations -of this specification must support at least this core language, and features within this -specification are defined using features of the core language. Note that not all -core language constructs are supported within <> or code -invoked by a <>, as detailed by +Implementations of SYCL must support this minimum {cpp} version, which +defines the {cpp} constructs that can consequently be used by SYCL feature +definitions (for example, lambdas). + +The minimum {cpp} version of this SYCL specification is determined by the +normative {cpp} core language defined in <>. +All implementations of this specification must support at least this core +language, and features within this specification are defined using features +of the core language. +Note that not all core language constructs are supported within +<> or code invoked by a +<>, as detailed by <>. -Implementations may support newer {cpp} versions than the minimum required by SYCL. -Code written using newer features than the SYCL requirement, though, may -not be portable to other implementations that don't support the same {cpp} version. +Implementations may support newer {cpp} versions than the minimum required +by SYCL. +Code written using newer features than the SYCL requirement, though, may not +be portable to other implementations that don't support the same {cpp} +version. [[sec:progmodel.futurecppversion]] === Alignment with future versions of {cpp} -Some features of SYCL are aligned with the next {cpp} specification, as defined -in <>. +Some features of SYCL are aligned with the next {cpp} specification, as +defined in <>. -The following features are pre-adopted by SYCL 2020 and made available in the -[code]#sycl::# namespace: [code]#std::span#, [code]#std::dynamic_extent#, -[code]#std::bit_cast#. The implementations of pre-adopted features are -compliant with the next {cpp} specification, and are expected to forward directly -to standard {cpp} features in a future version of SYCL. +The following features are pre-adopted by SYCL 2020 and made available in +the [code]#sycl::# namespace: [code]#std::span#, +[code]#std::dynamic_extent#, [code]#std::bit_cast#. +The implementations of pre-adopted features are compliant with the next +{cpp} specification, and are expected to forward directly to standard {cpp} +features in a future version of SYCL. The following features of SYCL 2020 use syntax based on the next {cpp} -specification: [code]#sycl::atomic_ref#. These features behave as -described in the next {cpp} specification, barring modifications to ensure -compatibility with other SYCL 2020 features and heterogeneous -programming. Any such modifications are documented in the corresponding -sections of this specification. +specification: [code]#sycl::atomic_ref#. +These features behave as described in the next {cpp} specification, barring +modifications to ensure compatibility with other SYCL 2020 features and +heterogeneous programming. +Any such modifications are documented in the corresponding sections of this +specification. === Basic data parallel kernels -Data-parallel <> that execute as -multiple <> and where no local synchronization is required are enqueued -with the [code]#sycl::parallel_for# function parameterized by a -[code]#sycl::range# parameter. These kernels will execute the kernel -function body once for each work-item in the specified <>. +Data-parallel <> that execute as multiple +<> and where no local synchronization is required are +enqueued with the [code]#sycl::parallel_for# function parameterized by a +[code]#sycl::range# parameter. +These kernels will execute the kernel function body once for each work-item +in the specified <>. Functionality tied to <> of work-items, including <> and <>, must not be used @@ -1220,36 +1306,40 @@ kernels using the features described in <>. Data parallel <> can also execute in a mode where the set of <> is divided into <> of -user-defined dimensions. The user specifies the global <> and local -work-group size as parameters to the [code]#sycl::parallel_for# function with a -[code]#sycl::nd_range# parameter. In this mode of execution, -kernels execute over the <> in work-groups of the specified -size. It is possible to share data among work-items within the same -work-group in <> or <> and to synchronize between -work-items in the same work-group by calling the -[code]#group_barrier# function. All work-groups in a given -[code]#parallel_for# will be the same size, and the global size -defined in the nd-range must either be a multiple of the work-group size in -each dimension, or the global size must be zero. When the global size -is zero, the kernel function is not executed, the local size is ignored, and -any dependencies are satisfied. - -Work-groups may be further subdivided into <>. The -work-items that compose a sub-group are selected in an implementation-defined -way, and therefore the size and number of sub-groups may differ for each -kernel. Moreover, different devices may make different guarantees with respect -to how sub-groups within a work-group are scheduled. The maximum number of -work-items in any sub-group in a kernel is based on a combination of the kernel -and its dispatch dimensions. The size of any sub-group in the dispatch is -between 1 and this maximum sub-group size, and the size of an individual -sub-group is invariant for the duration of a kernel's execution. Similarly -to work-groups, the work-items within the same sub-group can be synchronized -by calling the [code]#group_barrier# function. - -Portable device code must not assume that work-items within a sub-group execute -in any particular order, that work-groups are subdivided into sub-groups in a -specific way, nor that the work-items within a sub-group provide specific -forward progress guarantees. +user-defined dimensions. +The user specifies the global <> and local work-group size as +parameters to the [code]#sycl::parallel_for# function with a +[code]#sycl::nd_range# parameter. +In this mode of execution, kernels execute over the <> in +work-groups of the specified size. +It is possible to share data among work-items within the same work-group in +<> or <> and to synchronize between +work-items in the same work-group by calling the [code]#group_barrier# +function. +All work-groups in a given [code]#parallel_for# will be the same size, and +the global size defined in the nd-range must either be a multiple of the +work-group size in each dimension, or the global size must be zero. +When the global size is zero, the kernel function is not executed, the local +size is ignored, and any dependencies are satisfied. + +Work-groups may be further subdivided into <>. +The work-items that compose a sub-group are selected in an +implementation-defined way, and therefore the size and number of sub-groups +may differ for each kernel. +Moreover, different devices may make different guarantees with respect to +how sub-groups within a work-group are scheduled. +The maximum number of work-items in any sub-group in a kernel is based on a +combination of the kernel and its dispatch dimensions. +The size of any sub-group in the dispatch is between 1 and this maximum +sub-group size, and the size of an individual sub-group is invariant for the +duration of a kernel's execution. +Similarly to work-groups, the work-items within the same sub-group can be +synchronized by calling the [code]#group_barrier# function. + +Portable device code must not assume that work-items within a sub-group +execute in any particular order, that work-groups are subdivided into +sub-groups in a specific way, nor that the work-items within a sub-group +provide specific forward progress guarantees. Variables with <> semantics can be added to work-group data parallel kernels using the features described in <>. @@ -1259,58 +1349,60 @@ parallel kernels using the features described in <>. [NOTE] ==== -Based on developer and implementation feedback, the hierarchical -data parallel kernel feature described next is undergoing -improvements to better align with the frameworks and patterns -prevalent in modern programming. As this is a key part of the SYCL -API and we expect to make changes to it, we temporarily recommend -that new codes refrain from using this feature until the new API -is finished in a near-future version of the SYCL specification, -when full use of the updated feature will be recommended for use -in new code. Existing codes using this feature will of course be -supported by conformant implementations of this specification. +Based on developer and implementation feedback, the hierarchical data +parallel kernel feature described next is undergoing improvements to better +align with the frameworks and patterns prevalent in modern programming. +As this is a key part of the SYCL API and we expect to make changes to it, +we temporarily recommend that new codes refrain from using this feature +until the new API is finished in a near-future version of the SYCL +specification, when full use of the updated feature will be recommended for +use in new code. +Existing codes using this feature will of course be supported by conformant +implementations of this specification. ==== -The SYCL compiler provides a way of specifying data parallel kernels -that execute within work-groups via a different syntax which -highlights the hierarchical nature of the parallelism. This mode is -purely a compiler feature and does not change the execution model of -the kernel. Instead of calling [code]#sycl::parallel_for# the -user calls [code]#sycl::parallel_for_work_group# with a -[code]#sycl::range# value representing the number of -work-groups to launch and optionally a second -[code]#sycl::range# representing the size of each work-group -for performance tuning. All code within the -[code]#parallel_for_work_group# scope effectively executes once -per work-group. Within the [code]#parallel_for_work_group# scope, -it is possible to call [code]#parallel_for_work_item# which -creates a new scope in which all work-items within the current -work-group execute. This enables a programmer to write code that looks -like there is an inner work-item loop inside an outer work-group loop, -which closely matches the effect of the execution model. All variables -declared inside the [code]#parallel_for_work_group# scope are -allocated in work-group local memory, whereas all variables declared -inside the [code]#parallel_for_work_item# scope are declared in -private memory. All [code]#parallel_for_work_item# calls within a -given [code]#parallel_for_work_group# execution must have the -same dimensions. +The SYCL compiler provides a way of specifying data parallel kernels that +execute within work-groups via a different syntax which highlights the +hierarchical nature of the parallelism. +This mode is purely a compiler feature and does not change the execution +model of the kernel. +Instead of calling [code]#sycl::parallel_for# the user calls +[code]#sycl::parallel_for_work_group# with a [code]#sycl::range# value +representing the number of work-groups to launch and optionally a second +[code]#sycl::range# representing the size of each work-group for performance +tuning. +All code within the [code]#parallel_for_work_group# scope effectively +executes once per work-group. +Within the [code]#parallel_for_work_group# scope, it is possible to call +[code]#parallel_for_work_item# which creates a new scope in which all +work-items within the current work-group execute. +This enables a programmer to write code that looks like there is an inner +work-item loop inside an outer work-group loop, which closely matches the +effect of the execution model. +All variables declared inside the [code]#parallel_for_work_group# scope are +allocated in work-group local memory, whereas all variables declared inside +the [code]#parallel_for_work_item# scope are declared in private memory. +All [code]#parallel_for_work_item# calls within a given +[code]#parallel_for_work_group# execution must have the same dimensions. === Kernels that are not launched over parallel instances -Simple kernels for which only a single instance of the kernel function will be -executed are enqueued with the [code]#sycl::single_task# function. The -kernel enqueued takes no "`work-item id`" parameter and will only execute once. -The behavior is logically equivalent to executing a kernel on a single compute -unit with a single work-group comprising only one work-item. Such kernels may be -enqueued on multiple queues and devices and as a result may be executed in -task-parallel fashion. +Simple kernels for which only a single instance of the kernel function will +be executed are enqueued with the [code]#sycl::single_task# function. +The kernel enqueued takes no "`work-item id`" parameter and will only +execute once. +The behavior is logically equivalent to executing a kernel on a single +compute unit with a single work-group comprising only one work-item. +Such kernels may be enqueued on multiple queues and devices and as a result +may be executed in task-parallel fashion. [[sec:pre-defined-kernels]] === Pre-defined kernels -Some <> may expose pre-defined functionality to users as kernels. +Some <> may expose pre-defined functionality to +users as kernels. These kernels are not programmable, hence they are not bound by the SYCL {cpp} programming model restrictions, and how they are written is implementation-defined. @@ -1321,92 +1413,96 @@ implementation-defined. Synchronization of processing elements executing inside a device is handled by the SYCL device kernel following the SYCL kernel execution model. -The synchronization of the different SYCL device kernels executing with -the host memory is handled by the SYCL application via the SYCL runtime. +The synchronization of the different SYCL device kernels executing with the +host memory is handled by the SYCL application via the SYCL runtime. ==== Synchronization in the SYCL application -Synchronization points between host and device(s) are exposed through -the following operations: +Synchronization points between host and device(s) are exposed through the +following operations: - * _Buffer destruction_: The destructors for - [code]#sycl::buffer#, [code]#sycl::unsampled_image# and - [code]#sycl::sampled_image# objects wait for all submitted work on - those objects to complete and to copy the data back to host memory - before returning. These destructors only wait if the object was - constructed with attached host memory and if data needs to be copied - back to the host. + * _Buffer destruction_: The destructors for [code]#sycl::buffer#, + [code]#sycl::unsampled_image# and [code]#sycl::sampled_image# objects + wait for all submitted work on those objects to complete and to copy the + data back to host memory before returning. + These destructors only wait if the object was constructed with attached + host memory and if data needs to be copied back to the host. + -- -More complex forms of synchronization on buffer destruction -can be specified by the user by constructing buffers with other kinds of -references to memory, such as [code]#shared_ptr# and [code]#unique_ptr#. +More complex forms of synchronization on buffer destruction can be specified +by the user by constructing buffers with other kinds of references to +memory, such as [code]#shared_ptr# and [code]#unique_ptr#. -- * _Host Accessors_: The constructor for a host accessor waits for all kernels that modify the same buffer (or image) in any queues to complete and then copies data back to host memory before the constructor returns. Any command groups with requirements to the same memory object cannot - execute until the host accessor is destroyed as shown on <>. - * _Command group enqueue_: The <> internally ensures - that any command groups added to queues have the correct event - dependencies added to those queues to ensure correct operation. Adding - command groups to queues never blocks. Instead any required - synchronization is added to the queue and events of type - [code]#sycl::event# are returned by the queue's submit function + execute until the host accessor is destroyed as shown on + <>. + * _Command group enqueue_: The <> internally ensures that + any command groups added to queues have the correct event dependencies + added to those queues to ensure correct operation. + Adding command groups to queues never blocks. + Instead any required synchronization is added to the queue and events of + type [code]#sycl::event# are returned by the queue's submit function that contain event information related to the specific command group. - * _Queue operations_: The user can manually use queue operations, - such as [code]#sycl::queue::wait()# to block execution of the calling thread until all - the command groups submitted to the queue have finished execution. Note - that this will also affect the dependencies of those command groups in - other queues. - * _SYCL event objects_: SYCL provides [code]#sycl::event# - objects which can be used for synchronization. If synchronization is - required across SYCL contexts from different <>, then the - <> ensures that extra host-based synchronization is - added to enable the SYCL event objects to operate between contexts - correctly. - -Note that the destructors of other SYCL objects -([code]#sycl::queue#, [code]#sycl::context#,{ldots}) do -not block. Only a [code]#sycl::buffer#, [code]#sycl::sampled_image# or -[code]#sycl::unsampled_image# destructor might block. The rationale is -that an object without any side effect on the host does not need to -block on destruction as it would impact the performance. So it is up -to the programmer to use a member function to wait for completion in some -cases if this does not fit the goal. -See <> for more information -on object life time. + * _Queue operations_: The user can manually use queue operations, such as + [code]#sycl::queue::wait()# to block execution of the calling thread + until all the command groups submitted to the queue have finished + execution. + Note that this will also affect the dependencies of those command groups + in other queues. + * _SYCL event objects_: SYCL provides [code]#sycl::event# objects which + can be used for synchronization. + If synchronization is required across SYCL contexts from different + <>, then the <> ensures that extra + host-based synchronization is added to enable the SYCL event objects to + operate between contexts correctly. + +Note that the destructors of other SYCL objects ([code]#sycl::queue#, +[code]#sycl::context#,{ldots}) do not block. +Only a [code]#sycl::buffer#, [code]#sycl::sampled_image# or +[code]#sycl::unsampled_image# destructor might block. +The rationale is that an object without any side effect on the host does not +need to block on destruction as it would impact the performance. +So it is up to the programmer to use a member function to wait for +completion in some cases if this does not fit the goal. +See <> for more information on object life +time. ==== Synchronization in SYCL kernels -In SYCL, synchronization can be either global or local within a -group of work-items. Synchronization between work-items in a single -group is achieved using a <>. +In SYCL, synchronization can be either global or local within a group of +work-items. +Synchronization between work-items in a single group is achieved using a +<>. All the work-items of a group must execute the barrier before any are -allowed to continue execution beyond the barrier. Note that the -group barrier must be encountered by all work-items of a -group executing the kernel or by none at all. In SYCL, -<> and <> functionality is +allowed to continue execution beyond the barrier. +Note that the group barrier must be encountered by all work-items of a group +executing the kernel or by none at all. +In SYCL, <> and <> functionality is exposed via the [code]#group_barrier# function. Synchronization between work-items in different work-groups via atomic -operations is possible only on SYCL devices with certain capabilities, -as described in <>. +operations is possible only on SYCL devices with certain capabilities, as +described in <>. === Error handling In SYCL, there are two types of errors: synchronous errors that can be -detected immediately when an API call is made, and <> -that can only be detected later after an API call has returned. -Synchronous errors, such as failure to construct an -object, are reported immediately by the runtime throwing an -exception. <>, such as an error occurring during +detected immediately when an API call is made, and +<> that can only be detected later after an +API call has returned. +Synchronous errors, such as failure to construct an object, are reported +immediately by the runtime throwing an exception. +<>, such as an error occurring during execution of a kernel on a device, are reported via an asynchronous error-handler mechanism. -<> are not reported immediately as they occur. The -asynchronous error handler for a context or queue is called with a +<> are not reported immediately as they +occur. +The asynchronous error handler for a context or queue is called with a [code]#sycl::exception_list# object, which contains a list of asynchronously-generated exception objects, on the conditions described by <> and <>. @@ -1415,9 +1511,9 @@ Asynchronous errors may be generated regardless of whether the user has specified any asynchronous error handler(s), as described in <>. -Some <> can report errors that are specific to the platform -they are targeting, or that are more concrete than the errors provided -by the SYCL API. +Some <> can report errors that are specific to the +platform they are targeting, or that are more concrete than the errors +provided by the SYCL API. Any error reported by a <> must derive from the base [code]#sycl::exception#. When a user wishes to capture specifically an error thrown by a <>, @@ -1425,107 +1521,115 @@ she must include the <>-specific headers for said <>. === Fallback mechanism -A <> can be submitted either to a single queue -to be executed on, or to a secondary queue. If a -<> fails to be enqueued to the primary queue, then -the system will attempt to enqueue it to the secondary queue, if given as a -parameter to the submit function. If the <> fails to be -queued to both of these queues, then a synchronous SYCL exception will be thrown. - -It is possible that a command group may be successfully enqueued, -but then asynchronously fail to run, for some reason. In this case, it may be -possible for the runtime system to execute the <> -on the secondary queue, instead of the primary queue. The situations where a SYCL -runtime may be able to achieve this asynchronous fall-back is implementation-defined. +A <> can be submitted either to a single +queue to be executed on, or to a secondary queue. +If a <> fails to be enqueued to the primary +queue, then the system will attempt to enqueue it to the secondary queue, if +given as a parameter to the submit function. +If the <> fails to be queued to both of these +queues, then a synchronous SYCL exception will be thrown. + +It is possible that a command group may be successfully enqueued, but then +asynchronously fail to run, for some reason. +In this case, it may be possible for the runtime system to execute the +<> on the secondary queue, instead of the +primary queue. +The situations where a SYCL runtime may be able to achieve this asynchronous +fall-back is implementation-defined. === Scheduling of kernels and data movement A <> takes a reference to a command group -[code]#handler# as a parameter and anything within that scope is -immediately executed and takes the handler object as a parameter. The -intention is that a user will perform calls to SYCL functions, member functions, -destructors and constructors inside that scope. These calls will be non-blocking -on the host, but enqueue operations to the queue that the command group is submitted -to. All user functions within the command group scope will be called on the host +[code]#handler# as a parameter and anything within that scope is immediately +executed and takes the handler object as a parameter. +The intention is that a user will perform calls to SYCL functions, member +functions, destructors and constructors inside that scope. +These calls will be non-blocking on the host, but enqueue operations to the +queue that the command group is submitted to. +All user functions within the command group scope will be called on the host as the <> is executed, but any <> it invokes will be added to the SYCL <>. All commands added -to the <> will be executed out-of-order from each other, according to -their data dependencies. +commands>> it invokes will be added to the SYCL <>. +All commands added to the <> will be executed out-of-order from each +other, according to their data dependencies. [[sec:managing-object-lifetimes]] === Managing object lifetimes A SYCL application does not initialize any <> features until a -[code]#sycl::context# object is created. A user does not need to -explicitly create a [code]#sycl::context# object, but they do need to -explicitly create a [code]#sycl::queue# object, for which a -[code]#sycl::context# object will be implicitly created if not provided -by the user. - -All <> objects encapsulated in SYCL objects are reference-counted and will -be destroyed once all references have been released. This means that a user needs -only create a SYCL <> (which will automatically create an SYCL context) for -the lifetime of their application to initialize and release any <> objects -safely. - -There is no global state specified to be required in SYCL implementations. This -means, for example, that if the user creates two queues without explicitly -constructing a common context, then a SYCL implementation does not have to -create a shared context for the two queues. Implementations are free to share or -cache state globally for performance, but it is not required. - -Memory objects can be constructed with or without attached host memory. If no -host memory is attached at the point of construction, then destruction of that -memory object is non-blocking. The user may use {cpp} standard pointer classes -for sharing the host data with the user application and for defining blocking, -or non-blocking behavior of the buffers and images. -If host memory is attached by using a raw pointer, then the default behavior is -followed, which is that the destructor will block until any command groups -operating on the memory object have completed, then, if the contents of the -memory object is modified on a device those contents are copied back to host and -only then does the destructor return. - -In the case where host memory is shared -between the user application and the <> with a -[code]#std::shared_ptr#, then the reference counter +[code]#sycl::context# object is created. +A user does not need to explicitly create a [code]#sycl::context# object, +but they do need to explicitly create a [code]#sycl::queue# object, for +which a [code]#sycl::context# object will be implicitly created if not +provided by the user. + +All <> objects encapsulated in SYCL objects are reference-counted +and will be destroyed once all references have been released. +This means that a user needs only create a SYCL <> (which will +automatically create an SYCL context) for the lifetime of their application +to initialize and release any <> objects safely. + +There is no global state specified to be required in SYCL implementations. +This means, for example, that if the user creates two queues without +explicitly constructing a common context, then a SYCL implementation does +not have to create a shared context for the two queues. +Implementations are free to share or cache state globally for performance, +but it is not required. + +Memory objects can be constructed with or without attached host memory. +If no host memory is attached at the point of construction, then destruction +of that memory object is non-blocking. +The user may use {cpp} standard pointer classes for sharing the host data +with the user application and for defining blocking, or non-blocking +behavior of the buffers and images. +If host memory is attached by using a raw pointer, then the default behavior +is followed, which is that the destructor will block until any command +groups operating on the memory object have completed, then, if the contents +of the memory object is modified on a device those contents are copied back +to host and only then does the destructor return. + +In the case where host memory is shared between the user application and the +<> with a [code]#std::shared_ptr#, then the reference counter of the [code]#std::shared_ptr# determines whether the buffer needs to copy -data back on destruction, and in that case the blocking or non-blocking behavior -depends on the user application. +data back on destruction, and in that case the blocking or non-blocking +behavior depends on the user application. Instead of a [code]#std::shared_ptr#, a [code]#std::unique_ptr# may be provided, which uses move semantics for initializing and using the -associated host memory. In this case, the behavior of the buffer in -relation to the user application will be non-blocking on destruction. +associated host memory. +In this case, the behavior of the buffer in relation to the user application +will be non-blocking on destruction. -As said in <>, the only blocking -operations in SYCL (apart from explicit wait operations) are: +As said in <>, the only blocking operations in SYCL +(apart from explicit wait operations) are: * host accessor constructor, which waits for any kernels enqueued before its creation that write to the corresponding object to finish and be - copied back to host memory before it starts processing. The host - accessor does not necessarily copy back to the same host memory as - initially given by the user; + copied back to host memory before it starts processing. + The host accessor does not necessarily copy back to the same host memory + as initially given by the user; * memory object destruction, in the case where copies back to host memory have to be done or when the host memory is used as a backing-store. === Device discovery and selection -A user specifies which queue to submit a -<> and each <> is -targeted to run on a specific <> (and <>). A user -can specify the actual device on queue creation, or they can specify a -<> which causes the <> to choose a -device based on the user's provided preferences. Specifying a -<> causes the <> to perform device -discovery. No device discovery is performed until a SYCL -<> is passed to a queue constructor. Device -topology may be cached by the <>, but this is not +A user specifies which queue to submit a <> +and each <> is targeted to run on a specific <> (and +<>). +A user can specify the actual device on queue creation, or they can specify +a <> which causes the <> to choose a device +based on the user's provided preferences. +Specifying a <> causes the <> to perform +device discovery. +No device discovery is performed until a SYCL <> is passed +to a queue constructor. +Device topology may be cached by the <>, but this is not required. -Device discovery will return all <> from all <> exposed -by all the supported <>. +Device discovery will return all <> from all +<> exposed by all the supported <>. === Interfacing with the SYCL backend API @@ -1534,116 +1638,126 @@ There are two styles of developing a SYCL application: . writing a pure SYCL generic application; . writing a SYCL application that relies on some <> specific behavior. -When users follow 1., there is no assumption about what <> will be used during -compilation or execution of the SYCL application. Therefore, the <> -is not assumed to be available to the developer. -Only standard {cpp} types and interfaces are assumed to be available, -as described in <>. +When users follow 1., there is no assumption about what <> will be +used during compilation or execution of the SYCL application. +Therefore, the <> is not assumed to be available to the +developer. +Only standard {cpp} types and interfaces are assumed to be available, as +described in <>. Users only need to include the [code]## header to write a SYCL generic application. -On the other hand, when users follow 2., they must know what <>s -they are using. In this case, any header required for the normal -programmability of the <> is assumed to be available to the user. -In addition to the [code]## header, users must also -include the <>-specific header as defined in -<>. The <>-specific header -provides the interoperability interface for the SYCL API to interact with -<>. +On the other hand, when users follow 2., they must know what +<>s they are using. +In this case, any header required for the normal programmability of the +<> is assumed to be available to the user. +In addition to the [code]## header, users must also include +the <>-specific header as defined in +<>. +The <>-specific header provides the interoperability interface for +the SYCL API to interact with <>. The interoperability API is defined in <>. == Memory objects -SYCL memory objects represent data that is handled by the <> and -can represent allocations in one or multiple <> at any time. +SYCL memory objects represent data that is handled by the <> +and can represent allocations in one or multiple <> at any +time. Memory objects, both buffers and images, may have one or more underlying -<> to ensure that <> objects -can use data in any device. A SYCL implementation may have multiple -<> for the same device. -The <> is responsible for ensuring the different copies are up-to-date -whenever necessary, using whatever mechanism is available in the system -to update the copies of the underlying <>. +<> to ensure that +<> objects can use data in any device. +A SYCL implementation may have multiple <> for the same device. +The <> is responsible for ensuring the different copies are +up-to-date whenever necessary, using whatever mechanism is available in the +system to update the copies of the underlying <>. [NOTE] .Implementation note ==== A valid mechanism for this update is to transfer the data from one -<> into the system memory using the <>-specific -mechanism available, and then transfer it to a different device -using the mechanism exposed by the new <>. +<> into the system memory using the <>-specific mechanism +available, and then transfer it to a different device using the mechanism +exposed by the new <>. ==== Memory objects in SYCL fall into one of two categories: <> objects -and <> objects. A buffer object stores a one-, two- or -three-dimensional collection of elements that are stored linearly directly back -to back in the same way C or {cpp} stores arrays. An image object is used to store -a one-, two- or three-dimensional texture, frame-buffer or image data that may be -stored in an optimized and device-specific format in memory and must be accessed -through specialized operations. +and <> objects. +A buffer object stores a one-, two- or three-dimensional collection of +elements that are stored linearly directly back to back in the same way C or +{cpp} stores arrays. +An image object is used to store a one-, two- or three-dimensional texture, +frame-buffer or image data that may be stored in an optimized and +device-specific format in memory and must be accessed through specialized +operations. Elements of a buffer object can be a scalar data type (such as an [code]#int# or [code]#float#), vector data type, or a user-defined -structure. In SYCL, a <> object is a templated type -([code]#sycl::buffer#), parameterized by the element type and number of -dimensions. An <> object is stored in one of a limited number of -formats. The elements of an image object are selected from a list of -predefined image formats which are provided by an underlying <> -implementation. Images are encapsulated in the -[code]#sycl::unsampled_image# or [code]#sycl::sampled_image# -types, which are templated by the number of dimensions in the image. The -minimum number of elements in an image object is one. The minimum number -of elements in a buffer object is zero. +structure. +In SYCL, a <> object is a templated type ([code]#sycl::buffer#), +parameterized by the element type and number of dimensions. +An <> object is stored in one of a limited number of formats. +The elements of an image object are selected from a list of predefined image +formats which are provided by an underlying <> implementation. +Images are encapsulated in the [code]#sycl::unsampled_image# or +[code]#sycl::sampled_image# types, which are templated by the number of +dimensions in the image. +The minimum number of elements in an image object is one. +The minimum number of elements in a buffer object is zero. The fundamental differences between a buffer and an image object are: * elements in a buffer are stored in an array of 1, 2 or 3 dimensions and - can be accessed using an accessor by a kernel executing on a device. The - accessors for kernels provide a member function to get {cpp} pointer types, or the - [code]#sycl::global_ptr# class; + can be accessed using an accessor by a kernel executing on a device. + The accessors for kernels provide a member function to get {cpp} pointer + types, or the [code]#sycl::global_ptr# class; * elements of an image are stored in a format that is opaque to the user - and cannot be directly accessed using a pointer. SYCL provides image - accessors and samplers to allow a kernel to read from or write to an - image; + and cannot be directly accessed using a pointer. + SYCL provides image accessors and samplers to allow a kernel to read + from or write to an image; * for a buffer object the data is accessed within a kernel in the same format as it is stored in memory, but in the case of an image object the data is not necessarily accessed within a kernel in the same format as it is stored in memory; * image elements are always a 4-component vector (each component can be a - float or signed/unsigned integer) in a kernel. Accessors that read an - image convert image elements from their storage format into a 4-component - vector. + float or signed/unsigned integer) in a kernel. + Accessors that read an image convert image elements from their storage + format into a 4-component vector. + -- -Similarly, the SYCL accessor member functions provided to write to an -image convert the image element from a 4-component vector to -the appropriate image format specified such as four 8-bit -elements, for example. +Similarly, the SYCL accessor member functions provided to write to an image +convert the image element from a 4-component vector to the appropriate image +format specified such as four 8-bit elements, for example. -- -Users may want fine-grained control of the synchronization, memory management -and storage semantics of SYCL image or buffer objects. For example, a user may -wish to specify the host memory for a memory object to use, but may not want the -memory object to block on destruction. +Users may want fine-grained control of the synchronization, memory +management and storage semantics of SYCL image or buffer objects. +For example, a user may wish to specify the host memory for a memory object +to use, but may not want the memory object to block on destruction. -Depending on the control and the use cases of the SYCL applications, -well established {cpp} classes and patterns can be used for reference counting and -sharing data between user applications and the <>. For control over -memory allocation on the host and mapping between host and device memory, -pre-defined or user-defined {cpp} [code]#std::allocator# classes are -used. For better control of synchronization between a SYCL and a non SYCL -application that share data, [code]#std::shared_ptr# and -[code]#std::mutex# classes are used. +Depending on the control and the use cases of the SYCL applications, well +established {cpp} classes and patterns can be used for reference counting +and sharing data between user applications and the <>. +For control over memory allocation on the host and mapping between host and +device memory, pre-defined or user-defined {cpp} [code]#std::allocator# +classes are used. +For better control of synchronization between a SYCL and a non SYCL +application that share data, [code]#std::shared_ptr# and [code]#std::mutex# +classes are used. == Multi-dimensional objects and linearization SYCL defines a number of multi-dimensional objects such as buffers and -accessors. The iteration space of work-items in a kernel may also be -multi-dimensional. The size of each dimension is defined by a [code]#range# -object of one, two or three dimensions, and an element in the multi-dimensional -space can be identified using an [code]#id# object with the same number of -dimensions as the corresponding [code]#range#. +accessors. +The iteration space of work-items in a kernel may also be multi-dimensional. +The size of each dimension is defined by a [code]#range# object of one, two +or three dimensions, and an element in the multi-dimensional space can be +identified using an [code]#id# object with the same number of dimensions as +the corresponding [code]#range#. If the size of any dimension is zero, there are zero elements in the multi-dimensional range. @@ -1651,9 +1765,9 @@ multi-dimensional range. [[sec:multi-dim-linearization]] === Linearization -Some multi-dimensional objects can be viewed in a linear form. When this -happens, the right-most term in the object's range varies fastest in the -linearization. +Some multi-dimensional objects can be viewed in a linear form. +When this happens, the right-most term in the object's range varies fastest +in the linearization. A three-dimensional element [code]#id{id0, id1, id2}# within a three-dimensional object of range [code]#range{r0, r1, r2}# has a linear @@ -1678,63 +1792,71 @@ A one-dimensional element [code]#id{id0}# within a one-dimensional range === Multi-dimensional subscript operators Some multi-dimensional objects can be indexed using the subscript operator -where consecutive subscript operators correspond to each dimension. The -right-most operator varies fastest, as with standard {cpp} arrays. Formally, a -three-dimensional subscript access [code]#a[id0][id1][id2]# references the element -at [code]#id{id0, id1, id2}#. A two-dimensional subscript access -[code]#a[id0][id1]# references the element at [code]#id{id0, id1}#. A -one-dimensional subscript access [code]#a[id0]# references the element at +where consecutive subscript operators correspond to each dimension. +The right-most operator varies fastest, as with standard {cpp} arrays. +Formally, a three-dimensional subscript access [code]#a[id0][id1][id2]# +references the element at [code]#id{id0, id1, id2}#. +A two-dimensional subscript access [code]#a[id0][id1]# references the +element at [code]#id{id0, id1}#. +A one-dimensional subscript access [code]#a[id0]# references the element at [code]#id{id0}#. == Implementation options The SYCL language is designed to allow several different possible -implementations. The contents of this section are non-normative, so -implementations need not follow the guidelines listed here. However, this -section is intended to help readers understand the possible strategies that can -be used to implement SYCL. +implementations. +The contents of this section are non-normative, so implementations need not +follow the guidelines listed here. +However, this section is intended to help readers understand the possible +strategies that can be used to implement SYCL. [[subsec:smcp]] === Single source multiple compiler passes With this technique, known as <>, there are separate host and device -compilers. Each SYCL source file is compiled two times: once by the host -compiler and once by the device compiler. An implementation could support more -than one device compiler, in which case each SYCL source file is compiled -more than two times. The host compiler in this technique could be an -off-the-shelf compiler with no special knowledge of SYCL, but the device -compiler must be SYCL aware. The device compiler parses the source file to -identify each <> and any <> it calls. SYCL is designed so that this analysis can be -done statically. The device compiler then generates code only for the -<> and the <>. - -Typically, the device compilers generate header files which interface between -the host compiler and the <>. Therefore, the device compiler -runs first, and then the host compiler consumes these header files when -generating the host code. +compilers. +Each SYCL source file is compiled two times: once by the host compiler and +once by the device compiler. +An implementation could support more than one device compiler, in which case +each SYCL source file is compiled more than two times. +The host compiler in this technique could be an off-the-shelf compiler with +no special knowledge of SYCL, but the device compiler must be SYCL aware. +The device compiler parses the source file to identify each +<> and any <> it +calls. +SYCL is designed so that this analysis can be done statically. +The device compiler then generates code only for the <> and the <>. + +Typically, the device compilers generate header files which interface +between the host compiler and the <>. +Therefore, the device compiler runs first, and then the host compiler +consumes these header files when generating the host code. The device compilers in this technique generate one or more <> for the <>, which -can be read by the <>. Each <> could either -contain native ISA for a device or it could contain an intermediate language -such as SPIR-V. In the later case, the <> must translate the -intermediate language into native device ISA when the <> -is submitted to a device. - -Since this technique has separate host and device compilers, there needs to be -some way to associate a <> (which is compiled by the -device compiler) with the code that invokes it (which is compiled by the host -compiler). Implementations conformant to the reduced feature set +device images>> for the <>, +which can be read by the <>. +Each <> could either contain native ISA for a device or it +could contain an intermediate language such as SPIR-V. +In the later case, the <> must translate the intermediate +language into native device ISA when the <> is +submitted to a device. + +Since this technique has separate host and device compilers, there needs to +be some way to associate a <> (which is compiled by +the device compiler) with the code that invokes it (which is compiled by the +host compiler). +Implementations conformant to the reduced feature set (<>) can do this by using the {cpp} type of the -<>. This type is specified via the <> -template parameter if the <> is a lambda function, or it -is obtained from the class type if the <> is an object. -Implementations conformant to the full feature set (<>) -do not require a <> at the invocation site, so they must implement -some other way to make the association. +<>. +This type is specified via the <> template parameter if the +<> is a lambda function, or it is obtained from the +class type if the <> is an object. +Implementations conformant to the full feature set +(<>) do not require a <> at the +invocation site, so they must implement some other way to make the +association. [[subsec:sscp]] @@ -1742,72 +1864,76 @@ some other way to make the association. With this technique, known as <>, the vendor implements a custom compiler that reads each SYCL source file only once, and that compiler -generates the host code as well as the <> -for the <>. As in the -<> case, each <> could either contain native +generates the host code as well as the <> for +the <>. +As in the <> case, each <> could either contain native device ISA or an intermediate language. === Library-only implementation It is also possible to implement SYCL purely as a library, using an -off-the-shelf host compiler with no special support for SYCL. In such an -implementation, each <> may run on the host system. +off-the-shelf host compiler with no special support for SYCL. +In such an implementation, each <> may run on the host system. == Language restrictions in kernels The SYCL <> are executed on SYCL devices and all of the functions called from a SYCL kernel are going to be compiled for the device -by a SYCL <>. Due to restrictions of the heterogeneous -devices where the SYCL kernel will execute, there are certain restrictions -on the base {cpp} language features that can be used inside kernel code. For -details on language restrictions please refer -to <>. +by a SYCL <>. +Due to restrictions of the heterogeneous devices where the SYCL kernel will +execute, there are certain restrictions on the base {cpp} language features +that can be used inside kernel code. +For details on language restrictions please refer to +<>. SYCL kernels use arguments that are captured by value in the <> or are passed from the host to the device using -<>. Sharing data structures between host and device code -imposes certain restrictions, such as using only objects that are -<>, and in general, no pointers -initialized for the host can be used on the device. SYCL memory objects, -such as [code]#sycl::buffer#, [code]#sycl::unsampled_image#, and -[code]#sycl::sampled_image#, cannot be passed to a kernel. Instead, a kernel -must interact with these objects through <>. -No hierarchical structures of -these memory object classes are supported and any other data containers need to be -converted to the SYCL data management classes using the SYCL interface. For -more details on the rules for kernel parameter passing, please refer -to <>. - -Pointers to <> allocations -may be passed to a kernel either directly as arguments or indirectly -inside of other objects. Pointers to <> allocations that are -passed as kernel arguments are treated as being in the global -address space. +<>. +Sharing data structures between host and device code imposes certain +restrictions, such as using only objects that are <>, and +in general, no pointers initialized for the host can be used on the device. +SYCL memory objects, such as [code]#sycl::buffer#, +[code]#sycl::unsampled_image#, and [code]#sycl::sampled_image#, cannot be +passed to a kernel. +Instead, a kernel must interact with these objects through +<>. +No hierarchical structures of these memory object classes are supported and +any other data containers need to be converted to the SYCL data management +classes using the SYCL interface. +For more details on the rules for kernel parameter passing, please refer to +<>. + +Pointers to <> allocations may be passed to a kernel either directly as +arguments or indirectly inside of other objects. +Pointers to <> allocations that are passed as kernel arguments are +treated as being in the global address space. [[sec::device.copyable]] === Device copyable The SYCL implementation may need to copy data between the host and a device -or between two devices. For example, this may occur when a <> -has a requirement for the contents of a buffer or when the application passes -certain arguments to a <> (as described in -<>). Such data must have a type that is -<> as defined below. +or between two devices. +For example, this may occur when a <> has a requirement for +the contents of a buffer or when the application passes certain arguments to +a <> (as described in +<>). +Such data must have a type that is <> as defined below. -Any type that is trivially copyable (as defined by the {cpp} core language) is -implicitly device copyable. +Any type that is trivially copyable (as defined by the {cpp} core language) +is implicitly device copyable. Although implementations are not required to support device code that calls library functions from the {cpp} core language, some implementations may -provide device support for some of these functions. If the implementation -provides device support for one of the following classes, that type is also -implicitly device copyable: +provide device support for some of these functions. +If the implementation provides device support for one of the following +classes, that type is also implicitly device copyable: * [code]#std::array#; * [code]#std::array# if [code]#T# is device copyable; * [code]#std::optional# if [code]#T# is device copyable; - * [code]#std::pair# if [code]#T1# and [code]#T2# are device copyable; + * [code]#std::pair# if [code]#T1# and [code]#T2# are device + copyable; * [code]#std::tuple<>#; * [code]#+std::tuple+# if all the types in the parameter pack [code]#Types# are device copyable; @@ -1827,46 +1953,49 @@ device copyable. ==== The types [code]#std::basic_string_view# and [code]#std::span# are both view types, which reference -underlying data that is not contained within their type. Although these view -types are device copyable, the implementation copies just the view and not -the contained data when doing an inter-device copy. In order to reference the -contained data after such a copy, the application must allocate the contained -data in unified shared memory (USM) that is accessible on both the host and -device (or on both devices in the case of a device-to-device copy). +underlying data that is not contained within their type. +Although these view types are device copyable, the implementation copies +just the view and not the contained data when doing an inter-device copy. +In order to reference the contained data after such a copy, the application +must allocate the contained data in unified shared memory (USM) that is +accessible on both the host and device (or on both devices in the case of a +device-to-device copy). ==== -In addition, the implementation may allow the application to explicitly declare -certain class types as device copyable. If the implementation has this support, -it must predefine the preprocessor macro [code]#SYCL_DEVICE_COPYABLE# to -[code]#1#, and it must not predefine this preprocessor macro if it does not -have this support. When the implementation has this support, a class type -[code]#T# is device copyable if all of the following statements are true: +In addition, the implementation may allow the application to explicitly +declare certain class types as device copyable. +If the implementation has this support, it must predefine the preprocessor +macro [code]#SYCL_DEVICE_COPYABLE# to [code]#1#, and it must not predefine +this preprocessor macro if it does not have this support. +When the implementation has this support, a class type [code]#T# is device +copyable if all of the following statements are true: * The application defines the trait [code]#is_device_copyable_v# to [code]#true#; * Type [code]#T# has at least one eligible copy constructor, move constructor, copy assignment operator, or move assignment operator; - * Each eligible copy constructor, move constructor, copy assignment operator, - and move assignment operator is [code]#public#; + * Each eligible copy constructor, move constructor, copy assignment + operator, and move assignment operator is [code]#public#; * When doing an inter-device transfer of an object of type [code]#T#, the - effect of each eligible copy constructor, move constructor, copy assignment - operator, and move assignment operator is the same as a bitwise copy of the - object; + effect of each eligible copy constructor, move constructor, copy + assignment operator, and move assignment operator is the same as a + bitwise copy of the object; * Type [code]#T# has a [code]#public# non-deleted destructor; * The destructor has no effect when executed on the device. When the application explicitly declares a class type to be device copyable, arrays of that type and cv-qualified versions of that type are also device -copyable, and the implementation sets the [code]#is_device_copyable_v# trait to -[code]#true# for these array and cv-qualified types. +copyable, and the implementation sets the [code]#is_device_copyable_v# trait +to [code]#true# for these array and cv-qualified types. [NOTE] ==== It is unspecified whether the implementation actually calls the copy constructor, move constructor, copy assignment operator, or move assignment operator of a class declared as [code]#is_device_copyable_v# when doing an -inter-device copy. Since these operations must all be the same as a bitwise -copy, the implementation may simply copy the memory where the object resides. +inter-device copy. +Since these operations must all be the same as a bitwise copy, the +implementation may simply copy the memory where the object resides. Likewise, it is unspecified whether the implementation actually calls the destructor for such a class on the device since the destructor must have no effect on the device. @@ -1876,8 +2005,9 @@ effect on the device. == Endianness support SYCL does not mandate any particular byte order, but the byte order of the -host always matches the byte order of the devices. This allows data to be -copied between the host and the devices without any byte swapping. +host always matches the byte order of the devices. +This allows data to be copied between the host and the devices without any +byte swapping. == Example SYCL application diff --git a/adoc/chapters/copyright-spec.adoc b/adoc/chapters/copyright-spec.adoc index 2a679aa9..0bab460d 100644 --- a/adoc/chapters/copyright-spec.adoc +++ b/adoc/chapters/copyright-spec.adoc @@ -1,10 +1,11 @@ Copyright (c) 2011-2023 The Khronos Group, Inc. This Specification is protected by copyright laws and contains material -proprietary to Khronos. Except as described by these terms, it or any -components may not be reproduced, republished, distributed, transmitted, -displayed, broadcast or otherwise exploited in any manner without the -express prior written permission of Khronos. +proprietary to Khronos. +Except as described by these terms, it or any components may not be +reproduced, republished, distributed, transmitted, displayed, broadcast or +otherwise exploited in any manner without the express prior written +permission of Khronos. Khronos grants a conditional copyright license to use and reproduce the unmodified Specification for any purpose, without fee or royalty, EXCEPT no licenses to any patent, trademark or other intellectual property rights are @@ -23,27 +24,28 @@ or otherwise, arising from or in connection with these materials. This Specification has been created under the Khronos Intellectual Property Rights Policy, which is Attachment A of the Khronos Group Membership -Agreement available at https://www.khronos.org/files/member_agreement.pdf, and which -defines the terms 'Scope', 'Compliant Portion', and 'Necessary Patent Claims'. -Parties desiring to implement the Specification and make use of Khronos trademarks -in relation to that implementation, and receive reciprocal patent license protection -under the Khronos Intellectual Property Rights Policy must become Adopters and -confirm the implementation as conformant under the process defined by Khronos for -this Specification; see https://www.khronos.org/adopters. +Agreement available at https://www.khronos.org/files/member_agreement.pdf, +and which defines the terms 'Scope', 'Compliant Portion', and 'Necessary +Patent Claims'. +Parties desiring to implement the Specification and make use of Khronos +trademarks in relation to that implementation, and receive reciprocal patent +license protection under the Khronos Intellectual Property Rights Policy +must become Adopters and confirm the implementation as conformant under the +process defined by Khronos for this Specification; see +https://www.khronos.org/adopters. -Some parts of this Specification are purely informative and so are EXCLUDED from -the Scope of this Specification. +Some parts of this Specification are purely informative and so are EXCLUDED +from the Scope of this Specification. // Jon: how much do we want to say about Informative spec sections? No // convention in use at present. Could also add a "technical terminology" // section and link from the following paragraph. // The <> section of the // <> defines how these parts of the Specification are identified. -Where this Specification uses technical -terminology, defined in the <> or otherwise, that refer to -enabling technologies that are not expressly set forth in this -Specification, those enabling technologies are EXCLUDED from the Scope of -this Specification. +Where this Specification uses technical terminology, defined in the +<> or otherwise, that refer to enabling technologies that are not +expressly set forth in this Specification, those enabling technologies are +EXCLUDED from the Scope of this Specification. For clarity, enabling technologies not disclosed with particularity in this Specification (e.g. semiconductor manufacturing technology, hardware architecture, processor architecture or microarchitecture, memory @@ -53,28 +55,31 @@ are NOT to be considered expressly set forth; only those application program interfaces and data structures disclosed with particularity are included in the Scope of this Specification. -For purposes of the Khronos Intellectual Property Rights Policy as it relates -to the definition of Necessary Patent Claims, all recommended or optional -features, behaviors and functionality set forth in this Specification, if -implemented, are considered to be included as Compliant Portions. +For purposes of the Khronos Intellectual Property Rights Policy as it +relates to the definition of Necessary Patent Claims, all recommended or +optional features, behaviors and functionality set forth in this +Specification, if implemented, are considered to be included as Compliant +Portions. -Where this Specification includes -normative references to external documents, only the specifically -identified sections of those external documents are INCLUDED in the Scope of -this Specification. If not created by Khronos, those external documents may -contain contributions from non-members of Khronos not covered by the Khronos +Where this Specification includes normative references to external +documents, only the specifically identified sections of those external +documents are INCLUDED in the Scope of this Specification. +If not created by Khronos, those external documents may contain +contributions from non-members of Khronos not covered by the Khronos Intellectual Property Rights Policy. ifndef::ratified_core_spec[] This document contains extensions which are not ratified by Khronos, and as such is not a ratified Specification, though it contains text from (and is a -superset of) the ratified SYCL Specification. The ratified version of the -SYCL Specification can be found at +superset of) the ratified SYCL Specification. +The ratified version of the SYCL Specification can be found at https://www.khronos.org/registry/SYCL . endif::ratified_core_spec[] Khronos and Vulkan are registered trademarks, and SPIR-V is a trademark of -The Khronos Group Inc. OpenCL is a trademark of Apple Inc. and OpenGL is a -registered trademarks of Hewlett Packard Enterprise, all used under license -by Khronos. All other product names, trademarks, and/or company names are -used solely for identification and belong to their respective owners. +The Khronos Group Inc. +OpenCL is a trademark of Apple Inc. +and OpenGL is a registered trademarks of Hewlett Packard Enterprise, all +used under license by Khronos. +All other product names, trademarks, and/or company names are used solely +for identification and belong to their respective owners. diff --git a/adoc/chapters/device_compiler.adoc b/adoc/chapters/device_compiler.adoc index 36b3b562..c9c6eae3 100644 --- a/adoc/chapters/device_compiler.adoc +++ b/adoc/chapters/device_compiler.adoc @@ -5,8 +5,8 @@ This section specifies the requirements of the SYCL device compiler. Most features described in this section relate to underlying <> -capabilities of target devices and limiting the requirements of device -code to ensure portability. +capabilities of target devices and limiting the requirements of device code +to ensure portability. == Offline compilation of SYCL source files @@ -17,41 +17,45 @@ the technique of <>. A SYCL device compiler takes in a {cpp} source file, extracts only the SYCL kernels and outputs the device code in a form that can be enqueued from host -code by the associated <>. How the <> -invokes the kernels is implementation-defined, but a typical approach is for -a device compiler to produce a header file with the compiled kernel -contained within it. By providing a command-line option to the host -compiler, it would cause the implementation's SYCL header files to -[code]#{hash}include# the generated header file. The SYCL specification has -been written to allow this as an implementation approach in order to allow -<>. However, any of the mechanisms needed from the SYCL compiler, -the <> and build system are implementation-defined, as they -can vary depending on the platform and approach. - -A SYCL single-source device compiler takes in a {cpp} source file and compiles -both host and device code at the same time. This specification specifies how -a SYCL single-source device compiler sees and outputs device code for kernels, -but does not specify the host compilation. +code by the associated <>. +How the <> invokes the kernels is implementation-defined, but +a typical approach is for a device compiler to produce a header file with +the compiled kernel contained within it. +By providing a command-line option to the host compiler, it would cause the +implementation's SYCL header files to [code]#{hash}include# the generated +header file. +The SYCL specification has been written to allow this as an implementation +approach in order to allow <>. +However, any of the mechanisms needed from the SYCL compiler, the +<> and build system are implementation-defined, as they can +vary depending on the platform and approach. + +A SYCL single-source device compiler takes in a {cpp} source file and +compiles both host and device code at the same time. +This specification specifies how a SYCL single-source device compiler sees +and outputs device code for kernels, but does not specify the host +compilation. [[sec:naming.kernels]] == Naming of kernels SYCL kernels are extracted from {cpp} source files and stored in an -implementation-defined format. In the case of the shared-source compilation model, the kernels -have to be uniquely identified by both host and device compiler. This is -required in order for the host runtime to be able to load the kernel by using -a backend-specific host runtime interface. +implementation-defined format. +In the case of the shared-source compilation model, the kernels have to be +uniquely identified by both host and device compiler. +This is required in order for the host runtime to be able to load the kernel +by using a backend-specific host runtime interface. From this requirement the following rules apply for naming the kernels: * The kernel name is a [keyword]#{cpp} typename#. - * The kernel name must be forward declarable at namespace scope - (including global namespace scope) and may not be forward declared other - than at namespace scope. If it isn't forward declared - but is specified as a template argument in a kernel invoking interface, - as described in <>, then it may not conflict - with a name in any enclosing namespace scope. + * The kernel name must be forward declarable at namespace scope (including + global namespace scope) and may not be forward declared other than at + namespace scope. + If it isn't forward declared but is specified as a template argument in + a kernel invoking interface, as described in <>, + then it may not conflict with a name in any enclosing namespace scope. [NOTE] ==== @@ -66,44 +70,47 @@ behavior). at namespace scope, or does not conflict with any name in an enclosing namespace scope. * If the kernel is defined as a lambda, a typename can optionally be - provided to the kernel invoking interface as described - in <>, so that the developer can control the - kernel name for purposes such as debugging or referring to the kernel - when applying build options. + provided to the kernel invoking interface as described in + <>, so that the developer can control the kernel + name for purposes such as debugging or referring to the kernel when + applying build options. * If a kernel function relies on template parameters, then those template - parameters must be contained by the kernel name. If such a kernel name - is specified as a template argument in a kernel invoking interface, then - the template parameters on which the kernel depends must be forward - declarable at namespace scope. + parameters must be contained by the kernel name. + If such a kernel name is specified as a template argument in a kernel + invoking interface, then the template parameters on which the kernel + depends must be forward declarable at namespace scope. -In both single-source and shared-source implementations, a device compiler should -detect the kernel invocations (e.g. [code]#parallel_for)# +In both single-source and shared-source implementations, a device compiler +should detect the kernel invocations (e.g. [code]#parallel_for)# in the source code and compile the enclosed kernels, storing them with their associated type name. The format of the kernel and the compilation techniques are details of an -implementation and not specified. The interface between the compiler and the -runtime for extracting and executing SYCL kernels on the device is a detail of -an implementation and not specified. +implementation and not specified. +The interface between the compiler and the runtime for extracting and +executing SYCL kernels on the device is a detail of an implementation and +not specified. == Compilation of functions The SYCL device compiler parses an entire {cpp} source file supplied by the user, including any header files referenced via [code]#{hash}include# -directives. From this source file, the SYCL device compiler must compile -kernels for the device, as well as any functions that the kernels call. +directives. +From this source file, the SYCL device compiler must compile kernels for the +device, as well as any functions that the kernels call. The device compiler identifies kernels by looking for calls to -<> such as [code]#parallel_for#. One of -the parameters is a function object which is known as a -<>, and this function must always return -[code]#void#. Any function called by the <> is -also compiled for the device, and these functions together with the -<> are known as <>. The device -compiler searches recursively for any functions called from a -<>, and these functions are also compiled for the device and -known as <>. +<> such as +[code]#parallel_for#. +One of the parameters is a function object which is known as a +<>, and this function must always return [code]#void#. +Any function called by the <> is also compiled for the +device, and these functions together with the <> are known as <>. +The device compiler searches recursively for any functions called from a +<>, and these functions are also compiled for the device +and known as <>. To illustrate, the following source code shows three functions and a kernel invoke with comments explaining which functions need to be compiled for the @@ -129,96 +136,98 @@ void h() { } ---- -In order for the SYCL device compiler to correctly compile <>, all -functions in the source file, whether <> or not, must be -syntactically correct functions according to this specification. A syntactically -correct function adheres to at least the minimum required {cpp} version -defined in <>. +In order for the SYCL device compiler to correctly compile <>, all functions in the source file, whether +<> or not, must be syntactically correct +functions according to this specification. +A syntactically correct function adheres to at least the minimum required +{cpp} version defined in <>. [[sec:language.restrictions.kernels]] == Language restrictions for device functions -<> must abide by certain restrictions. The full set of -{cpp} features are not available to these functions. Following is a list of -these restrictions: - - * Pointers and objects containing pointers may be shared. However, when a pointer is - passed between SYCL devices or between the host and a SYCL device, - dereferencing that pointer on the device produces undefined behavior unless - the device supports <> and the pointer is an address within a - <> memory region (see <>). - * Memory storage allocation is not allowed in kernels. All memory allocation - for the device is done on the host using accessor classes or using - <> as explained in <>. - Consequently, the default allocation [code]#operator new# overloads - that allocate storage are disallowed in a SYCL kernel. The placement - [code]#new# operator and any user-defined overloads that do not - allocate storage are permitted. - * Kernel functions must always have a [code]#void# return type. A - kernel lambda trailing-return-type that is not [code]#void# is +<> must abide by certain restrictions. +The full set of {cpp} features are not available to these functions. +Following is a list of these restrictions: + + * Pointers and objects containing pointers may be shared. + However, when a pointer is passed between SYCL devices or between the + host and a SYCL device, dereferencing that pointer on the device + produces undefined behavior unless the device supports <> and the + pointer is an address within a <> memory region (see <>). + * Memory storage allocation is not allowed in kernels. + All memory allocation for the device is done on the host using accessor + classes or using <> as explained in <>. + Consequently, the default allocation [code]#operator new# overloads that + allocate storage are disallowed in a SYCL kernel. + The placement [code]#new# operator and any user-defined overloads that + do not allocate storage are permitted. + * Kernel functions must always have a [code]#void# return type. + A kernel lambda trailing-return-type that is not [code]#void# is therefore illegal, as is a return statement (that would return from the kernel function) with an expression that does not convert to [code]#void#. - * The odr-use of polymorphic classes and classes with virtual - inheritance is allowed. However, no virtual member functions are allowed - to be called in a <>. + * The odr-use of polymorphic classes and classes with virtual inheritance + is allowed. + However, no virtual member functions are allowed to be called in a + <>. * No function pointers or references are allowed to be called in a <>. * RTTI is disabled inside <>. - * No variadic functions are allowed to be called in a - <>. - * Exception-handling cannot be used inside a - <>. + * No variadic functions are allowed to be called in a <>. + * Exception-handling cannot be used inside a <>. [code]#noexcept# is allowed. * Recursion is not allowed in a <>. - * Variables with thread storage duration ([code]#thread_local# - storage class specifier) are not allowed to be odr-used in a + * Variables with thread storage duration ([code]#thread_local# storage + class specifier) are not allowed to be odr-used in a <>. * Variables with static storage duration that are odr-used inside a - <>, must be either [code]#const# - or [code]#constexpr#, and must also be either - zero-initialized or constant-initialized. + <>, must be either [code]#const# or [code]#constexpr#, + and must also be either zero-initialized or constant-initialized. [NOTE] ==== Amongst other things, this restriction makes it illegal for a -<> to access a global variable that isn't [code]#const# -or [code]#constexpr#. +<> to access a global variable that isn't [code]#const# or +[code]#constexpr#. ==== * The rules for kernels apply to both the kernel function objects themselves and all functions, operators, member functions, constructors - and destructors called by the kernel. This means that kernels can only - use library functions that have been adapted to work with SYCL. + and destructors called by the kernel. + This means that kernels can only use library functions that have been + adapted to work with SYCL. Implementations are not required to support any library routines in kernels beyond those explicitly mentioned as usable in kernels in this - spec. Developers should refer to the SYCL built-in functions - in <> to find functions that are specified to be usable - in kernels. + spec. + Developers should refer to the SYCL built-in functions in + <> to find functions that are specified to be usable in + kernels. * Interacting with a special <> class (e.g. SYCL [code]#accessor# or [code]#stream#) that is stored within a {cpp} union is undefined behavior. - * Any variable or function that is odr-used from a <> must - be defined in the same translation unit as that use. However, a function - may be defined in another translation unit if the implementation defines - the [code]#SYCL_EXTERNAL# macro as described in <>. + * Any variable or function that is odr-used from a <> + must be defined in the same translation unit as that use. + However, a function may be defined in another translation unit if the + implementation defines the [code]#SYCL_EXTERNAL# macro as described in + <>. [[subsec:scalartypes]] == Built-in scalar data types In a SYCL device compiler, the device definition of all standard {cpp} -fundamental types from <> must match the -host definition of those types, in both size and alignment. A device -compiler may have this preconfigured so that it can match them based on the -definitions of those types on the platform, or there may be a necessity for -a device compiler command-line option to ensure the types are the same. - -The standard {cpp} fixed width types, e.g. [code]#int8_t#, -[code]#int16_t#, [code]#int32_t#,[code]#int64_t#, -should have the same size as defined by the {cpp} standard for host and -device. +fundamental types from <> must match the host +definition of those types, in both size and alignment. +A device compiler may have this preconfigured so that it can match them +based on the definitions of those types on the platform, or there may be a +necessity for a device compiler command-line option to ensure the types are +the same. + +The standard {cpp} fixed width types, e.g. [code]#int8_t#, [code]#int16_t#, +[code]#int32_t#,[code]#int64_t#, should have the same size as defined by the +{cpp} standard for host and device. [[table.types.fundamental]] @@ -339,54 +348,61 @@ The standard {cpp} preprocessing directives and macros are supported. The following preprocessor macros must be defined by all conformant implementations: - * [code]#SYCL_LANGUAGE_VERSION# substitutes an integer reflecting - the version number and revision of the SYCL language being supported - by the implementation. The version of SYCL defined in this document - will have [code]#SYCL_LANGUAGE_VERSION# substitute the integer + * [code]#SYCL_LANGUAGE_VERSION# substitutes an integer reflecting the + version number and revision of the SYCL language being supported by the + implementation. + The version of SYCL defined in this document will have + [code]#SYCL_LANGUAGE_VERSION# substitute the integer [code]#{SYCL_LANGUAGE_VERSION}#, composed with the general SYCL version followed by 2 digits representing the revision number; * [code]#SYCL_DEVICE_COPYABLE# is defined to 1 if the implementation supports explicitly specified <> types as described in - <>. Otherwise, the implementation's definition of - device copyable falls back to {cpp} trivially copyable and - [code]#sycl::is_device_copyable# is ignored; + <>. + Otherwise, the implementation's definition of device copyable falls back + to {cpp} trivially copyable and [code]#sycl::is_device_copyable# is + ignored; * [code]#+__SYCL_DEVICE_ONLY__+# is defined to 1 if the source file is being compiled with a SYCL device compiler which does not produce host binary; - * [code]#+__SYCL_SINGLE_SOURCE__+# is defined to 1 if the source file - is being compiled with a SYCL single-source compiler which produces host - as well as device binary; + * [code]#+__SYCL_SINGLE_SOURCE__+# is defined to 1 if the source file is + being compiled with a SYCL single-source compiler which produces host as + well as device binary; * [code]#SYCL_FEATURE_SET_FULL# is defined to 1 if the SYCL implementation - supports the full feature set and is not defined otherwise. For more details - see <>; - * [code]#SYCL_FEATURE_SET_REDUCED# is defined to 1 if the SYCL implementation - supports the reduced feature set and not the full feature set, otherwise it - is not defined. For more details see <>; + supports the full feature set and is not defined otherwise. + For more details see <>; + * [code]#SYCL_FEATURE_SET_REDUCED# is defined to 1 if the SYCL + implementation supports the reduced feature set and not the full feature + set, otherwise it is not defined. + For more details see <>; * [code]#SYCL_EXTERNAL# is an optional macro which enables external - linkage of SYCL functions and member functions to be included in a SYCL kernel. - The macro is only defined if the implementation supports external linkage. + linkage of SYCL functions and member functions to be included in a SYCL + kernel. + The macro is only defined if the implementation supports external + linkage. For more details see <>. -In addition, for each <> supported, the preprocessor macros described -in <> must be defined by all conformant implementations. +In addition, for each <> supported, the preprocessor macros +described in <> must be defined by all conformant +implementations. [[sec:optional-kernel-features]] == Optional kernel features A number of kernel features defined by this SYCL specification are optional; -they may be supported on some devices but not on other devices. As described -in <>, an application can test whether a device supports -these features by testing whether the device has an associated aspect. The -following aspects are those that correspond to optional kernel features: +they may be supported on some devices but not on other devices. +As described in <>, an application can test whether a +device supports these features by testing whether the device has an +associated aspect. +The following aspects are those that correspond to optional kernel features: * [code]#fp16# * [code]#fp64# * [code]#atomic64# -In addition, the following {cpp} attributes from <> also -correspond to optional kernel features because they force the kernel to be -compiled in a way that might not run on all devices: +In addition, the following {cpp} attributes from <> +also correspond to optional kernel features because they force the kernel to +be compiled in a way that might not run on all devices: * [code]#reqd_work_group_size()# * [code]#reqd_sub_group_size()# @@ -396,19 +412,20 @@ optional kernel features, all SYCL implementations must be able to compile device code that uses these optional features regardless of whether the implementation supports the features on any of its devices. -Of course, applications that make use of optional kernel features should ensure -that a kernel using such a feature is submitted only to a device that supports -the feature. If the application submits a <> using a secondary -queue, then any kernel submitted from the <> should use only -features that are supported by both the primary queue's device and the -secondary queue's device. If an application fails to do this, the -implementation must throw a synchronous exception with the -[code]#errc::kernel_not_supported# error code from the -<> (e.g. [code]#parallel_for()#). +Of course, applications that make use of optional kernel features should +ensure that a kernel using such a feature is submitted only to a device that +supports the feature. +If the application submits a <> using a secondary queue, then +any kernel submitted from the <> should use only features +that are supported by both the primary queue's device and the secondary +queue's device. +If an application fails to do this, the implementation must throw a +synchronous exception with the [code]#errc::kernel_not_supported# error code +from the <> (e.g. [code]#parallel_for()#). It is legal for a SYCL application to define several kernels in the same -translation unit even if they use different optional features, as shown in the -following example: +translation unit even if they use different optional features, as shown in +the following example: [source,,linenums] ---- @@ -416,84 +433,92 @@ include::{code_dir}/twoOptionalFeatures.cpp[lines=4..-1] ---- An implementation may not raise a compile time diagnostic or a run time -exception merely due to speculative compilation of a kernel for a device when -the application does not actually submit the kernel to that device. To -illustrate using the example above, assume that device [code]#dev1# does not -have [code]#aspect::atomic64# and device [code]#dev2# doe not have -[code]#aspect::fp16#. An implementation cannot raise a diagnostic due to -compilation of [code]#KernelA# for device [code]#dev2# or for compilation of -[code]#KernelB# for device [code]#dev1# because the application does not submit -these kernels to those devices. +exception merely due to speculative compilation of a kernel for a device +when the application does not actually submit the kernel to that device. +To illustrate using the example above, assume that device [code]#dev1# does +not have [code]#aspect::atomic64# and device [code]#dev2# doe not have +[code]#aspect::fp16#. +An implementation cannot raise a diagnostic due to compilation of +[code]#KernelA# for device [code]#dev2# or for compilation of +[code]#KernelB# for device [code]#dev1# because the application does not +submit these kernels to those devices. [NOTE] ==== It is expected that this requirement will have an impact on the way an -implementation bundles kernels into device images. For example, naively -bundling [code]#KernelA# and [code]#KernelB# into the same device image could -run afoul of this requirement if the implementation compiles the entire device -image when [code]#KernelA# is submitted to device [code]#dev1#. +implementation bundles kernels into device images. +For example, naively bundling [code]#KernelA# and [code]#KernelB# into the +same device image could run afoul of this requirement if the implementation +compiles the entire device image when [code]#KernelA# is submitted to device +[code]#dev1#. ==== [[sec:device.attributes]] == Attributes for device code -{cpp} attributes may be used to decorate kernels and device functions in order -to influence the code generated by the device compiler. These attributes are -all defined in the [code]#+[[sycl::]]+# namespace. +{cpp} attributes may be used to decorate kernels and device functions in +order to influence the code generated by the device compiler. +These attributes are all defined in the [code]#+[[sycl::]]+# namespace. If one of the attributes defined in this section is applied to a kernel or device function, it must be applied to the first declaration of that kernel -or device function in the translation unit. Programs which fail to do this are -ill formed and the compiler must issue a diagnostic. Redeclarations of the -kernel or device function in the same translation unit may optionally have the -same attribute applied (so long as the attribute arguments are the same between -the declarations), but this is not required. The attribute remains in effect -regardless of whether it appears in the redeclaration. +or device function in the translation unit. +Programs which fail to do this are ill formed and the compiler must issue a +diagnostic. +Redeclarations of the kernel or device function in the same translation unit +may optionally have the same attribute applied (so long as the attribute +arguments are the same between the declarations), but this is not required. +The attribute remains in effect regardless of whether it appears in the +redeclaration. Unless an attribute's description specifically allows it, a kernel or device function may not be declared with the more than one instance of the same -attribute unless all instances have the same attribute arguments. The compiler -must issue a diagnostic for programs which violate this requirement. When two -or more instances of the same attribute appear on the declaration of a kernel -or device function, the effect is as though a single instance appeared -(assuming that all instances have the same attribute arguments). - -If a kernel or device function is declared with an attribute in one translation -unit and the same kernel or device function is declared without the same -attribute (and its same attribute arguments) in another translation unit, the -program is ill formed and no diagnostic is required. +attribute unless all instances have the same attribute arguments. +The compiler must issue a diagnostic for programs which violate this +requirement. +When two or more instances of the same attribute appear on the declaration +of a kernel or device function, the effect is as though a single instance +appeared (assuming that all instances have the same attribute arguments). + +If a kernel or device function is declared with an attribute in one +translation unit and the same kernel or device function is declared without +the same attribute (and its same attribute arguments) in another translation +unit, the program is ill formed and no diagnostic is required. If any of these attributes are applied to a device function that is also compiled for the host, they have no effect when the function is compiled for the host. -Applying these attributes to any language construct other than those specified -in this section has implementation-defined effect. +Applying these attributes to any language construct other than those +specified in this section has implementation-defined effect. [[sec:kernel.attributes]] === Kernel attributes -The attributes listed in <> have a different position -depending on whether the kernel is defined as a lambda function or as a named -function object. If the kernel is a named function object, the attribute is -applied to the declarator-id in the function declaration. However, if the -kernel is a lambda function, the attribute is applied to the lambda declarator. +The attributes listed in <> have a different +position depending on whether the kernel is defined as a lambda function or +as a named function object. +If the kernel is a named function object, the attribute is applied to the +declarator-id in the function declaration. +However, if the kernel is a lambda function, the attribute is applied to the +lambda declarator. [NOTE] ==== -The reason for the different positions is because the {cpp} core language does -not currently define a position for attributes to appertain to the lambda's -corresponding function operator or operator template, only to the corresponding -_type_ of the function operator or operator template. This is expected to be -remedied in a future version of the {cpp} core language specification. +The reason for the different positions is because the {cpp} core language +does not currently define a position for attributes to appertain to the +lambda's corresponding function operator or operator template, only to the +corresponding _type_ of the function operator or operator template. +This is expected to be remedied in a future version of the {cpp} core +language specification. ==== The example below demonstrates these attribute positions using the -[code]#[[sycl::reqd_work_group_size(16)]]# attribute. Note that the {cpp} core -language allows two possible positions for kernels that are defined as a named -function object. +[code]#[[sycl::reqd_work_group_size(16)]]# attribute. +Note that the {cpp} core language allows two possible positions for kernels +that are defined as a named function object. [source,,linenums] ---- @@ -621,9 +646,9 @@ include::{code_dir}/deviceHas.cpp[lines=4..-1] === Device function attributes The attributes in <> are applied to the declaration -of a non-kernel device function. The position of the attribute is the same -as for the kernel function attributes defined above in -<>. +of a non-kernel device function. +The position of the attribute is the same as for the kernel function +attributes defined above in <>. [[table.device.attributes]] .Attributes for non-kernel device functions @@ -667,80 +692,85 @@ associated with an aspect that is not listed in the attribute. == Address-space deduction -{cpp} has no type-level support to represent address spaces. As a consequence, -the SYCL generic programming model does not directly affect the {cpp} type of -unannotated pointers and references. +{cpp} has no type-level support to represent address spaces. +As a consequence, the SYCL generic programming model does not directly +affect the {cpp} type of unannotated pointers and references. -Source level guarantees about address spaces in the SYCL generic -programming model can only be achieved using pointer classes (instances of -[code]#multi_ptr#), which are regular classes that represent pointers -to data stored in the corresponding address spaces. +Source level guarantees about address spaces in the SYCL generic programming +model can only be achieved using pointer classes (instances of +[code]#multi_ptr#), which are regular classes that represent pointers to +data stored in the corresponding address spaces. In SYCL, the address space of pointer and references are derived from: - * Accessors that give access to shared data. They can be bound to a memory - object in a command group and passed into a kernel. Accessors are used - in scheduling of kernels to define ordering. Accessors to buffers have a - compile-time address space based on their access mode. - * Explicit pointer classes (e.g. [code]#global_ptr#) holds a pointer - which is known to be addressing the address space represented by the - [code]#access::address_space#. This allows the compiler to - determine whether the pointer references global, local, constant or - private memory and generate code accordingly. + * Accessors that give access to shared data. + They can be bound to a memory object in a command group and passed into + a kernel. + Accessors are used in scheduling of kernels to define ordering. + Accessors to buffers have a compile-time address space based on their + access mode. + * Explicit pointer classes (e.g. [code]#global_ptr#) holds a pointer which + is known to be addressing the address space represented by the + [code]#access::address_space#. + This allows the compiler to determine whether the pointer references + global, local, constant or private memory and generate code accordingly. * Raw {cpp} pointer and reference types (e.g. [code]#int*#) are allowed - within SYCL kernels. They can be constructed from the address of local - variables, explicit pointer classes, or accessors. + within SYCL kernels. + They can be constructed from the address of local variables, explicit + pointer classes, or accessors. [[subsec:addrspaceAssignment]] === Address space assignment -In order to understand where data lives, the device compiler is -expected to assign address spaces while lowering types for the -underlying target based on the context. Depending on the <> -and mode, address space deducing rules differ slightly. +In order to understand where data lives, the device compiler is expected to +assign address spaces while lowering types for the underlying target based +on the context. +Depending on the <> and mode, address space deducing +rules differ slightly. If the target of the SYCL backend can represent the generic address space, then the "common address space deduction rules" in -<> and the "generic as default address space rules" -in <> apply. If the target of the SYCL backend -cannot represent the generic address space, then the "common address space -deduction rules" in <> and the "inferred address -space rules" in <> apply. +<> and the "generic as default address space +rules" in <> apply. +If the target of the SYCL backend cannot represent the generic address +space, then the "common address space deduction rules" in +<> and the "inferred address space rules" in +<> apply. [NOTE] ==== SYCL address space does not affect the type, address space shall be -understood as memory segment in which data is allocated. For -instance, if [code]#int i;# is allocated to the global address -space, then [code]#decltype(&i)# shall evaluate to -[code]#int*#. +understood as memory segment in which data is allocated. +For instance, if [code]#int i;# is allocated to the global address space, +then [code]#decltype(&i)# shall evaluate to [code]#int*#. ==== [[subsec:commonAddressSpace]] === Common address space deduction rules -The variable declarations get assigned to an address space depending on their -scope and storage class: +The variable declarations get assigned to an address space depending on +their scope and storage class: * Namespace scope - ** If the type is [code]#const#, the address space the declaration is assigned to - is implementation-defined. If the target of the SYCL backend can represent the - generic address space, then the assigned address space must be compatible with - the generic address space. + ** If the type is [code]#const#, the address space the declaration is + assigned to is implementation-defined. + If the target of the SYCL backend can represent the generic address + space, then the assigned address space must be compatible with the + generic address space. [NOTE] ==== -Namespace scope non-[code]#const# declarations cannot be used within a kernel, -as restricted in <>. This means that -non-[code]#const# global variables cannot be accessed by any device kernel or -code called by the device kernel. +Namespace scope non-[code]#const# declarations cannot be used within a +kernel, as restricted in <>. +This means that non-[code]#const# global variables cannot be accessed by any +device kernel or code called by the device kernel. ==== * Block scope and function parameter scope - ** Declarations with static storage duration are treated the same way as variables - in namespace scope + ** Declarations with static storage duration are treated the same way as + variables in namespace scope ** Otherwise the declaration is assigned to the local address space if declared in a hierarchical context ** Otherwise the declaration is assigned to the private address space @@ -756,9 +786,9 @@ address space otherwise. [[subsec:genericAddressSpace]] === Generic as default address space -For SYCL backends that can represent the generic address space -(see <>), unannotated pointers and -references are considered to be pointing to the generic address space. +For SYCL backends that can represent the generic address space (see +<>), unannotated pointers and references are +considered to be pointing to the generic address space. [[subsec:inferredAddressSpace]] @@ -767,23 +797,24 @@ references are considered to be pointing to the generic address space. [NOTE] .Note for this version ==== -The address space deduction feature described next is inherited from -the SYCL 1.2.1 specifications. This section will be changed in a future version -to better align with addition of generic address space and generic -as default address space. +The address space deduction feature described next is inherited from the +SYCL 1.2.1 specifications. +This section will be changed in a future version to better align with +addition of generic address space and generic as default address space. ==== -For SYCL backends that cannot represent the generic address space -(see <>), inside kernels the SYCL device -compiler will need to auto-deduce the memory region -of unannotated pointer and reference types during the lowering of types -from {cpp} to the underlying representation. +For SYCL backends that cannot represent the generic address space (see +<>), inside kernels the SYCL device compiler +will need to auto-deduce the memory region of unannotated pointer and +reference types during the lowering of types from {cpp} to the underlying +representation. -If a kernel function or device function contains a pointer or reference type, -then the address space deduction must be attempted using the following rules: +If a kernel function or device function contains a pointer or reference +type, then the address space deduction must be attempted using the following +rules: - * If an explicit pointer class is converted into a {cpp} pointer value, then - the {cpp} pointer value will point to same address space as the one + * If an explicit pointer class is converted into a {cpp} pointer value, + then the {cpp} pointer value will point to same address space as the one represented by the explicit pointer class. * If a variable is declared as a pointer type, but initialized in its declaration to a pointer value with an already-deduced address space, @@ -791,24 +822,26 @@ then the address space deduction must be attempted using the following rules: * If a function parameter is declared as a pointer type, and the argument is a pointer value with a deduced address space, then the function will be compiled as if the parameter had the same address space as its - argument. It is legal for a function to be called in different places - with different address spaces for its arguments: in this case the - function is said to be "`duplicated`" and compiled multiple times. Each - duplicated instance of the function must compile legally in order to - have defined behavior. + argument. + It is legal for a function to be called in different places with + different address spaces for its arguments: in this case the function is + said to be "`duplicated`" and compiled multiple times. + Each duplicated instance of the function must compile legally in order + to have defined behavior. * If a function return type is declared as a pointer type and return statements use address space deduced expressions, then the function will - be compiled as if the return type had the same address space. To compile - legally, all return expressions must deduce to the same address space. - * The rules for pointer types also apply to reference types. i.e. a - reference variable takes its address space from its initializer. A - function with a reference parameter takes its address space from its + be compiled as if the return type had the same address space. + To compile legally, all return expressions must deduce to the same + address space. + * The rules for pointer types also apply to reference types. + i.e. a reference variable takes its address space from its initializer. + A function with a reference parameter takes its address space from its argument. * If no other rule above can be applied to a declaration of a pointer, then it is assumed to be in the private address space. -It is illegal to assign a pointer value addressing one address space to a pointer -variable addressing a different address space. +It is illegal to assign a pointer value addressing one address space to a +pointer variable addressing a different address space. == SYCL offline linking @@ -820,35 +853,36 @@ variable addressing a different address space. === SYCL functions and member functions linkage By default, any function that is odr-used from a <> must be -defined in the same translation unit as that use. However, this restriction is -relaxed if both of the following conditions are met: +defined in the same translation unit as that use. +However, this restriction is relaxed if both of the following conditions are +met: * The implementation defines the [code]#SYCL_EXTERNAL# macro; * The translation unit that calls the function declares the function with [code]#SYCL_EXTERNAL# as described below. -When a function is declared with [code]#SYCL_EXTERNAL#, that macro must be used -on the first declaration of that function in the translation unit. -Redeclarations of the function in the same translation unit may optionally use -[code]#SYCL_EXTERNAL#, but this is not required. +When a function is declared with [code]#SYCL_EXTERNAL#, that macro must be +used on the first declaration of that function in the translation unit. +Redeclarations of the function in the same translation unit may optionally +use [code]#SYCL_EXTERNAL#, but this is not required. -When a function is declared with [code]#SYCL_EXTERNAL#, that function must also -be defined in some translation unit, where the function is declared with -[code]#SYCL_EXTERNAL#. +When a function is declared with [code]#SYCL_EXTERNAL#, that function must +also be defined in some translation unit, where the function is declared +with [code]#SYCL_EXTERNAL#. -A function may only be declared with [code]#SYCL_EXTERNAL# if it has external -linkage by normal C++ rules. +A function may only be declared with [code]#SYCL_EXTERNAL# if it has +external linkage by normal C++ rules. -A function declared with [code]#SYCL_EXTERNAL# may be called from both host and -device code. The macro has no effect when the function is called from host -code. +A function declared with [code]#SYCL_EXTERNAL# may be called from both host +and device code. +The macro has no effect when the function is called from host code. In order to declare a function with [code]#SYCL_EXTERNAL#, the macro name -[code]#SYCL_EXTERNAL# must appear before the function declaration. If the -function is also decorated with {cpp} attributes that appear before the -declaration, the [code]#SYCL_EXTERNAL# may appear before, after, or between -these attributes. The following example demonstrates the use of -[code]#SYCL_EXTERNAL#. +[code]#SYCL_EXTERNAL# must appear before the function declaration. +If the function is also decorated with {cpp} attributes that appear before +the declaration, the [code]#SYCL_EXTERNAL# may appear before, after, or +between these attributes. +The following example demonstrates the use of [code]#SYCL_EXTERNAL#. [source,,linenums] ---- @@ -859,11 +893,12 @@ Functions that are declared using [code]#SYCL_EXTERNAL# have the following additional restrictions beyond those imposed on other device functions: * If the SYCL backend does not support the generic address space then the - function cannot use raw pointers as parameter or return types. Explicit - pointer classes must be used instead; + function cannot use raw pointers as parameter or return types. + Explicit pointer classes must be used instead; * The function cannot call [code]#group::parallel_for_work_item#; -* The function cannot be called from a [code]#parallel_for_work_group# scope. +* The function cannot be called from a [code]#parallel_for_work_group# + scope. // %%%%%%%%%%%%%%%%%%%%%%%%%%%% end compiler_abi %%%%%%%%%%%%%%%%%%%%%%%%%%%% diff --git a/adoc/chapters/extensions.adoc b/adoc/chapters/extensions.adoc index 846c7237..cb1d4aa7 100644 --- a/adoc/chapters/extensions.adoc +++ b/adoc/chapters/extensions.adoc @@ -4,43 +4,50 @@ = SYCL Extensions This chapter describes the mechanism by which the <> can be -extended. Some parts of this chapter are requirements that all implementations -must follow if they extend the <>, while other parts of the chapter -are merely guidelines. Unless a requirement is specifically stated as -normative, all content in this chapter is a non-normative guideline. - -An extension can be either of two flavors: an extension ratified by the Khronos -SYCL group or a vendor supplied extension. In both cases, an extension is an -optional feature set which an implementation need not implement in order to be -conformant with the <>. - -Vendors may choose to define extensions in order to expose custom features or -to gather feedback on an API that is not yet ready for inclusion in the -<>. Once a vendor extension has stabilized, the vendor is -encouraged to promote it to a future version of the <> or to a -ratified Khronos extension. Thus, vendor extensions can be viewed as a -pipeline of features for consideration in future SYCL versions. +extended. +Some parts of this chapter are requirements that all implementations must +follow if they extend the <>, while other parts of the chapter +are merely guidelines. +Unless a requirement is specifically stated as normative, all content in +this chapter is a non-normative guideline. + +An extension can be either of two flavors: an extension ratified by the +Khronos SYCL group or a vendor supplied extension. +In both cases, an extension is an optional feature set which an +implementation need not implement in order to be conformant with the +<>. + +Vendors may choose to define extensions in order to expose custom features +or to gather feedback on an API that is not yet ready for inclusion in the +<>. +Once a vendor extension has stabilized, the vendor is encouraged to promote +it to a future version of the <> or to a ratified Khronos +extension. +Thus, vendor extensions can be viewed as a pipeline of features for +consideration in future SYCL versions. The Khronos SYCL group may define extensions for features that are not yet ready for the <> but are implemented by more than one vendor. These extensions also may be considered for inclusion in a future version of the <>. -This chapter does not describe any particular extension to SYCL. Rather, it -describes the _mechanism_ for defining an extension. Each extension is defined -by its own separate document. If an extension is ratified by the Khronos SYCL -group, that group will release a document describing the extension. If a -vendor defines an extension, the vendor is responsible for releasing its -documentation. +This chapter does not describe any particular extension to SYCL. +Rather, it describes the _mechanism_ for defining an extension. +Each extension is defined by its own separate document. +If an extension is ratified by the Khronos SYCL group, that group will +release a document describing the extension. +If a vendor defines an extension, the vendor is responsible for releasing +its documentation. == Definition of an extension -An extension can take many possible forms. Some examples include: +An extension can take many possible forms. +Some examples include: * adding new types or free functions to the SYCL runtime; - * modifying existing SYCL classes, structs, or enumeration types by - adding new members, member functions, or enumerated values; + * modifying existing SYCL classes, structs, or enumeration types by adding + new members, member functions, or enumerated values; * adding new overloads for existing free functions or member functions; * defining new specializations for existing SYCL templates; * adding new {cpp} attributes; @@ -48,157 +55,176 @@ An extension can take many possible forms. Some examples include: * adding new keywords to the language; * adding a new backend. -An extension may also broaden the definition of existing functions defined in -the <> by defining semantics for cases that are left unspecified by -the <>. +An extension may also broaden the definition of existing functions defined +in the <> by defining semantics for cases that are left +unspecified by the <>. == Requirements for an extension -This section is normative. All vendors which provide an extension must abide -by the requirements described here. +This section is normative. +All vendors which provide an extension must abide by the requirements +described here. -An extension may not change the definition of existing functions defined by the -<> in a way that changes their specified behavior. Also, an -extension may not remove any feature defined by the <>. +An extension may not change the definition of existing functions defined by +the <> in a way that changes their specified behavior. +Also, an extension may not remove any feature defined by the <>. The vendor must choose at least one [code]## which uniquely -identifies the vendor's SYCL implementation. The Khronos SYCL group does not -provide any registry of the strings, so each vendor is responsible for choosing -its own. One way to choose a unique string is to use the vendor's company name -or a marketing name that is associated with the vendor's implementation. +identifies the vendor's SYCL implementation. +The Khronos SYCL group does not provide any registry of the strings, so each +vendor is responsible for choosing its own. +One way to choose a unique string is to use the vendor's company name or a +marketing name that is associated with the vendor's implementation. Ultimately, it is each vendor's responsibility to choose a string that is -unique. The strings "khr" and "KHR" are reserved for the Khronos SYCL group -for its own extensions, so vendors may not use these as a -[code]##. +unique. +The strings "khr" and "KHR" are reserved for the Khronos SYCL group for its +own extensions, so vendors may not use these as a [code]##. The implementation must predefine at least one macro of the form [code]#SYCL_IMPLEMENTATION_# which allows applications to test -whether they are being compiled with that vendor's implementation. For -example, the Acme vendor could predefine a macro whose name is +whether they are being compiled with that vendor's implementation. +For example, the Acme vendor could predefine a macro whose name is [code]#SYCL_IMPLEMENTATION_ACME#. == Guidelines for portable extensions Vendors who want to ensure that their extension does not collide with other -vendors' extensions or with future versions of the <> should follow -the additional rules specified in this section. However, this is not a -requirement for conformance. +vendors' extensions or with future versions of the <> should +follow the additional rules specified in this section. +However, this is not a requirement for conformance. === Extension namespace -If an extension adds new types or free functions, it should avoid adding these -directly in the [code]#sycl::# namespace since future versions of the -<> may also add new identifiers in this namespace. The namespace -[code]#sycl::ext::# is reserved for use by extensions. For -example, the Acme vendor could define extended types and free functions in the -namespace [code]#sycl::ext::acme#, and this would guarantee that they will not -collide with definitions in other vendors' extensions or with future versions -of the <>. +If an extension adds new types or free functions, it should avoid adding +these directly in the [code]#sycl::# namespace since future versions of the +<> may also add new identifiers in this namespace. +The namespace [code]#sycl::ext::# is reserved for use by +extensions. +For example, the Acme vendor could define extended types and free functions +in the namespace [code]#sycl::ext::acme#, and this would guarantee that they +will not collide with definitions in other vendors' extensions or with +future versions of the <>. === Names for extensions to existing classes or enumerations -An extension may add new members or member functions to existing SYCL classes -or new values to existing SYCL enumeration types. To ensure these extensions -do not collide, vendors are encouraged to name them with the prefix -[code]#ext__#. For example, the Acme vendor could add a new -member function to the [code]#sycl::device# class named -[code]#device::ext_acme_fancy()# or a new value to the [code]#sycl::aspect# -enumeration named [code]#aspect::ext_acme_fancier#. +An extension may add new members or member functions to existing SYCL +classes or new values to existing SYCL enumeration types. +To ensure these extensions do not collide, vendors are encouraged to name +them with the prefix [code]#ext__#. +For example, the Acme vendor could add a new member function to the +[code]#sycl::device# class named [code]#device::ext_acme_fancy()# or a new +value to the [code]#sycl::aspect# enumeration named +[code]#aspect::ext_acme_fancier#. In some cases, an extension does not have the freedom to choose a specific -function name. For example, this could happen if the extension adds a new -constructor overload for an existing SYCL class. In cases like this, the -extension should ensure that one of the function parameters has a type that is -defined in the extension's namespace. For example, the Acme vendor could add -a new constructor for [code]#sycl::context# with the signature +function name. +For example, this could happen if the extension adds a new constructor +overload for an existing SYCL class. +In cases like this, the extension should ensure that one of the function +parameters has a type that is defined in the extension's namespace. +For example, the Acme vendor could add a new constructor for +[code]#sycl::context# with the signature [code]#context(ext::acme::frobber&)#. -A similar situation can occur if an existing SYCL template is specialized with -an extended enumerated value. -Obviously, the extension cannot rename the template in this case. Instead, -it is sufficient that the template is specialized with an extended enumerated -value, and this guarantees that the extended specialization will not collide. +A similar situation can occur if an existing SYCL template is specialized +with an extended enumerated value. +Obviously, the extension cannot rename the template in this case. +Instead, it is sufficient that the template is specialized with an extended +enumerated value, and this guarantees that the extended specialization will +not collide. [NOTE] ==== -Vendors are encouraged to use the [code]#ext__# prefix form when -possible for additions to existing SYCL classes because this form makes the -extension's vendor name apparent. People reading application code will -immediately know that a member function is an extension, and they will -immediately know which vendor's documentation to consult. +Vendors are encouraged to use the [code]#ext__# prefix form +when possible for additions to existing SYCL classes because this form makes +the extension's vendor name apparent. +People reading application code will immediately know that a member function +is an extension, and they will immediately know which vendor's documentation +to consult. ==== === Feature test macros Vendors are encouraged to group a related set of extensions together into a "feature" and to predefine a feature-test macro when the implementation -supports the extensions in that feature. The feature-test macro should have -the following form to ensure it is unique: -[code]#SYCL_EXT__#. For example, the Acme vendor -might define a feature-test macro named [code]#SYCL_EXT_ACME_FANCYFEATURE#. +supports the extensions in that feature. +The feature-test macro should have the following form to ensure it is +unique: [code]#SYCL_EXT__#. +For example, the Acme vendor might define a feature-test macro named +[code]#SYCL_EXT_ACME_FANCYFEATURE#. This allows applications to protect code using the extension with [code]##ifdef#, so that the code is skipped when compiled with an implementation that doesn't support the feature. -Since the interface to an extension might change from one release to another, -vendors are also encouraged to predefine the macro's value to the version of -the extension. Vendors should use a numerical value that monotonically -increases for each revision of the extension API. +Since the interface to an extension might change from one release to +another, vendors are also encouraged to predefine the macro's value to the +version of the extension. +Vendors should use a numerical value that monotonically increases for each +revision of the extension API. -Of course, an extension may also predefine other macros. In order to ensure -that these macro names do not collide with other extensions or future versions -of the <>, the name should start with the prefix -[code]#SYCL_EXT_# or [code]#SYCL_IMPLEMENTATION_#. +Of course, an extension may also predefine other macros. +In order to ensure that these macro names do not collide with other +extensions or future versions of the <>, the name should start +with the prefix [code]#SYCL_EXT_# or +[code]#SYCL_IMPLEMENTATION_#. === Attribute namespace -An extension may define new {cpp} attributes. The attribute namespace -[code]#sycl::# is reserved for the <>, so vendors should choose a -different namespace for any attributes they add. +An extension may define new {cpp} attributes. +The attribute namespace [code]#sycl::# is reserved for the <>, so +vendors should choose a different namespace for any attributes they add. === Include file paths An extension may define new [code]##include# files under the [code]#"sycl"# -path. The path prefix [code]#"sycl/ext/"# is reserved for this -purpose. For example, the Acme vendor could add a header file +path. +The path prefix [code]#"sycl/ext/"# is reserved for this +purpose. +For example, the Acme vendor could add a header file [code]#"sycl/ext/acme/fancy.h"# and be guaranteed that it would not conflict with other extensions or with future versions of the <>. === Optional kernel features An extension may also add new optional kernel features -- features which are -supported on some devices but not on others. Vendors are encouraged to follow -the same mechanism outlined in <>. Therefore, -an extended optional kernel feature should have a matching extension to the -[code]#sycl::aspect# enumerated type. +supported on some devices but not on others. +Vendors are encouraged to follow the same mechanism outlined in +<>. +Therefore, an extended optional kernel feature should have a matching +extension to the [code]#sycl::aspect# enumerated type. === Adding a backend -An extension may also add a new backend. If it does, the naming of the -backend APIs follows the normal guidelines for extensions and also follows -the naming pattern for backends that are defined in the <>. To -illustrate: +An extension may also add a new backend. +If it does, the naming of the backend APIs follows the normal guidelines for +extensions and also follows the naming pattern for backends that are defined +in the <>. +To illustrate: -* The extension should add a new value to the [code]#sycl::backend# enumeration - type using a naming scheme like [code]#ext__#. For - example, if the Acme vendor adds a backend named "foo", it would add an - enumerated value named [code]#sycl::backend::ext_acme_foo#. +* The extension should add a new value to the [code]#sycl::backend# + enumeration type using a naming scheme like + [code]#ext__#. + For example, if the Acme vendor adds a backend named "foo", it would add + an enumerated value named [code]#sycl::backend::ext_acme_foo#. * The extension should define the backend's interop API in a namespace named - [code]#sycl::ext::::#. For our hypothetical Acme - example, this would be a namespace named [code]#sycl::ext::acme::foo#. - -* If the backend interop API is available through a separate header file, that - header should be named - [code]#"sycl/ext//backend/.hpp"#. For our - hypothetical Acme example this would be + [code]#sycl::ext::::#. + For our hypothetical Acme example, this would be a namespace named + [code]#sycl::ext::acme::foo#. + +* If the backend interop API is available through a separate header file, + that header should be named + [code]#"sycl/ext//backend/.hpp"#. + For our hypothetical Acme example this would be [code]#"sycl/ext/acme/backend/foo.hpp"#. -* The extension should predefine a macro for the backend when it is "active". +* The extension should predefine a macro for the backend when it is + "active". The name of this macro should be - [code]#SYCL_EXT__BACKEND_#. For our hypothetical - Acme example this would be [code]#SYCL_EXT_ACME_BACKEND_FOO#. + [code]#SYCL_EXT__BACKEND_#. + For our hypothetical Acme example this would be + [code]#SYCL_EXT_ACME_BACKEND_FOO#. // %%%%%%%%%%%%%%%%%%%%%%%%%%%% end extensions %%%%%%%%%%%%%%%%%%%%%%%%%%%% diff --git a/adoc/chapters/feature_sets.adoc b/adoc/chapters/feature_sets.adoc index 94acb198..cf7198a5 100644 --- a/adoc/chapters/feature_sets.adoc +++ b/adoc/chapters/feature_sets.adoc @@ -6,45 +6,48 @@ As of SYCL 2020 there are now two distinct feature sets which a SYCL implementation can conform to, in order to better fit the requirements of -different domains, such as embedded, mobile, and safety critical, which may have -limitations because of the toolchains used. +different domains, such as embedded, mobile, and safety critical, which may +have limitations because of the toolchains used. -A SYCL implementation can choose to conform to either the full feature set or -the reduced feature set. +A SYCL implementation can choose to conform to either the full feature set +or the reduced feature set. [[sec:feature-sets.full]] == Full feature set -The full feature set includes all features specified in the <> with -no exceptions. +The full feature set includes all features specified in the <> +with no exceptions. [[sec:feature-sets.reduced]] == Reduced feature set The reduced feature set makes certain features optional or restricted to -specific forms. The following list defines all the differences between the -reduced feature set and the full feature set. - - . *Un-named SYCL kernel functions:* <> - which are defined using a lambda expression and therefore have no standard - name are required to be provided a name via the kernel name template parameter - of kernel invocation functions such as [code]#parallel_for#. This overrides - the <> rules for <> naming as specified in - <>. +specific forms. +The following list defines all the differences between the reduced feature +set and the full feature set. + + . *Un-named SYCL kernel functions:* <> which are defined using a lambda expression and therefore + have no standard name are required to be provided a name via the kernel + name template parameter of kernel invocation functions such as + [code]#parallel_for#. + This overrides the <> rules for <> + naming as specified in <>. . *Address space mode:* The <> mode used in the reduced feature set is not required to be - <>, regardless of SYCL - backend in use. + <>, regardless of + SYCL backend in use. Instead the <> mode may always be used. . *Declarations:* In addition to the requirements specified in <>, the reduced feature set does not require - support for odr-use inside <> of variables - declared [code]#const# or [code]#constexpr# with static storage duration. + support for odr-use inside <> of + variables declared [code]#const# or [code]#constexpr# with static + storage duration. [[sec:feature-sets.compatibility]] @@ -52,23 +55,25 @@ reduced feature set and the full feature set. In order to avoid introducing any kind of divergence the reduced and full feature sets are defined such that the full feature set is a subsumption of -the reduced feature set. This means that any applications which are -developed for the reduced feature set will be compatible with both a SYCL -reduced implementation and a SYCL full implementation. +the reduced feature set. +This means that any applications which are developed for the reduced feature +set will be compatible with both a SYCL reduced implementation and a SYCL +full implementation. [[sec:feature-sets.conformance]] == Conformance One of the reasons for having this be defined in the specification is that -hardware vendors which wish to support SYCL on their platform(s) want to be able -to demonstrate their support for it by passing conformance. However, if passing -conformance means adopting features which they do not believe to be necessary at -an additional development effort then this may deter them. - -Each feature set has its own route for passing conformance allowing adopters of -SYCL to specify the feature set they wish to test conformance against. The -conformance test suite would then alter or disable the tests within the test -suite according to how the feature sets are differentiated above. +hardware vendors which wish to support SYCL on their platform(s) want to be +able to demonstrate their support for it by passing conformance. +However, if passing conformance means adopting features which they do not +believe to be necessary at an additional development effort then this may +deter them. + +Each feature set has its own route for passing conformance allowing adopters +of SYCL to specify the feature set they wish to test conformance against. +The conformance test suite would then alter or disable the tests within the +test suite according to how the feature sets are differentiated above. // %%%%%%%%%%%%%%%%%%%%%%%%%%%% end feature_sets %%%%%%%%%%%%%%%%%%%%%%%%%%%% diff --git a/adoc/chapters/glossary.adoc b/adoc/chapters/glossary.adoc index 3fae1e0f..331cef9e 100644 --- a/adoc/chapters/glossary.adoc +++ b/adoc/chapters/glossary.adoc @@ -16,37 +16,40 @@ [[accessor]]accessor:: An accessor is a class which allows a <> to access data managed by a <> or <> class or allows a <> - to access local memory on a <>. Accessors are also used to express - the dependencies among the different <>. + to access local memory on a <>. + Accessors are also used to express the dependencies among the different + <>. For the full description please refer to <> [[application-scope]]application scope:: The application scope starts with the construction first <> class object and finishes with the destruction of the - last one. Application refers to the {cpp} <> and not - the <>. + last one. + Application refers to the {cpp} <> and not the + <>. [[aspect]]aspect:: A characteristic of a <> which determines whether it supports - some optional feature. Aspects are always boolean, so a <> - either has or does not have an aspect. + some optional feature. + Aspects are always boolean, so a <> either has or does not have + an aspect. [[async-error]]asynchronous error:: A SYCL asynchronous error is an error occurring after the host API call invoking the error causing action has returned, such that the error - cannot be thrown as a typical {cpp} exception from a host API call. Such - errors are typically generated from device kernel invocations which are - executed when SYCL task graph dependencies are satisfied, which occur - asynchronously from host code execution. For the full description and - associated asynchronous error handling mechanisms, please refer to - <>. + cannot be thrown as a typical {cpp} exception from a host API call. + Such errors are typically generated from device kernel invocations which + are executed when SYCL task graph dependencies are satisfied, which + occur asynchronously from host code execution. + For the full description and associated asynchronous error handling + mechanisms, please refer to <>. [[async-handler]]async_handler:: An asynchronous error handler object is a function class instance providing necessary code for handling all the asynchronous errors triggered from the execution of command groups on a queue, within a - context or an associated event. For the full description please refer to - <>. + context or an associated event. + For the full description please refer to <>. [[barrier]]barrier:: A barrier is either a <>, or a kernel execution @@ -55,27 +58,29 @@ [[blocking-accessor]]blocking accessor:: A blocking accessor is an <> which provides immediate access - and continues to provide access until it is destroyed. For the full - description please refer to <> + and continues to provide access until it is destroyed. + For the full description please refer to <> [[buffer]]buffer:: + -- The buffer class manages data for the SYCL {cpp} host application and the -SYCL device kernels. The buffer class may acquire ownership of some host -pointers passed to its constructors according to the constructor kind. +SYCL device kernels. +The buffer class may acquire ownership of some host pointers passed to its +constructors according to the constructor kind. The buffer class, together with the accessor class, is responsible for tracking memory transfers and guaranteeing data consistency among the -different kernels. The <> manages the memory allocations -on both the host and the <> within the lifetime of the buffer -object. For the full description please refer to <>. +different kernels. +The <> manages the memory allocations on both the host and the +<> within the lifetime of the buffer object. +For the full description please refer to <>. -- [[bundle-state]]bundle state:: A SYCL bundle state represents the state of a <> and - therefore its capabilities in the SYCL programming API. Possible states - are <>, <> or <>. + therefore its capabilities in the SYCL programming API. + Possible states are <>, <> or <>. [[command]]command:: A request to execute work that is submitted to a <> such as the @@ -84,61 +89,63 @@ object. For the full description please refer to <>. [[command-group]]command group:: In SYCL, the operations required to process data on a <> are - represented using a <>. Each - <> is given a unique <> + represented using a <>. + Each <> is given a unique <> object to perform all the necessary work required to correctly process - data on a <> using a kernel. In this way, the group of - commands for transferring and processing data is enqueued as a command - group on a <> for execution. A command group is submitted - atomically to a SYCL queue. + data on a <> using a kernel. + In this way, the group of commands for transferring and processing data + is enqueued as a command group on a <> for execution. + A command group is submitted atomically to a SYCL queue. [[command-group-function-object]]command group function object:: - A type which is callable with [code]#operator()# that takes a - reference to a <>, that defines a <> which - can be submitted by a <>. The function object can be a named - type, lambda function or [code]#std::function#. + A type which is callable with [code]#operator()# that takes a reference + to a <>, that defines a <> which can be + submitted by a <>. + The function object can be a named type, lambda function or + [code]#std::function#. [[handler]]command group handler:: The command group handler class provides the interface for the commands - that can be executed inside the <>. It is - provided as a scoped object to all of the data access requests within - the command group scope. For the full description please refer to - <>. + that can be executed inside the <>. + It is provided as a scoped object to all of the data access requests + within the command group scope. + For the full description please refer to <>. [[command-group-scope]]command group scope:: The command group scope is the function scope defined by the - <>. The command group <> - object lifetime is restricted to the command group scope. For more - details see <>. + <>. + The command group <> object lifetime is restricted to the + command group scope. + For more details see <>. [[queue-barrier]]command queue barrier:: The SYCL API provides two variants for functions that force - synchronization on a SYCL command queue. The - [code]#sycl::queue::wait()# and - [code]#sycl::queue::wait_and_throw()# functions force the SYCL - command queue to wait for the execution of the - <> before it is able to continue - executing. + synchronization on a SYCL command queue. + The [code]#sycl::queue::wait()# and + [code]#sycl::queue::wait_and_throw()# functions force the SYCL command + queue to wait for the execution of the <> + before it is able to continue executing. [[constant-memory]]constant memory:: - A region of memory that remains constant during the execution of - a kernel. The <> allocates and initializes memory - objects placed into constant memory. + A region of memory that remains constant during the execution of a + kernel. + The <> allocates and initializes memory objects placed + into constant memory. [[context]]context:: - A <> represents the runtime data structures and state - required by a <> API to interact with a group of <> - associated with a <>. The context is defined as the - [code]#sycl::context# class, for further details please see - <>. + A <> represents the runtime data structures and state required + by a <> API to interact with a group of <> + associated with a <>. + The context is defined as the [code]#sycl::context# class, for further + details please see <>. [[control-flow]]control flow:: When all <> in a <> are executing the same sequence of statements, they are said to be executing under _converged_ - control flow. Control flow _diverges_ when different work-items in a - group execute a different sequence of statements, typically as a result - of evaluating conditions differently (e.g. in selection statements or - loops). + control flow. + Control flow _diverges_ when different work-items in a group execute a + different sequence of statements, typically as a result of evaluating + conditions differently (e.g. in selection statements or loops). [[core-spec]]core SYCL specification:: The text of the SYCL language specification (this document), excluding @@ -146,44 +153,46 @@ object. For the full description please refer to <>. extensions. [[descendent-device]]descendent device:: - The descendent devices of device _D_ include all of the sub-devices of _D_, - all of the sub-devices of those devices, etc. + The descendent devices of device _D_ include all of the sub-devices of + _D_, all of the sub-devices of those devices, etc. [[device]]device:: A SYCL device is an abstraction of a piece of hardware that can execute <>. [[device-compiler]]device compiler:: - A SYCL device compiler is a compiler that produces <> - binaries from a valid <>. For the full description - please refer to <>. + A SYCL device compiler is a compiler that produces <> binaries + from a valid <>. + For the full description please refer to <>. [[device-copyable]]device copyable:: - Data that is shared between the host and the devices must generally - have a type that abides by the restrictions listed in + Data that is shared between the host and the devices must generally have + a type that abides by the restrictions listed in <> for a device copyable type. [[device-function]]device function:: - A device function is any function in a <> - that can be run on a <>. This includes - <> and, recursively, functions - they call. + A device function is any function in a <> that can be + run on a <>. + This includes <> and, + recursively, functions they call. [[device-image]]device image:: - A device image is a representation of one or more <> in an - implementation-defined format. A device image could be a compiled version - of the kernels in an intermediate language representation which needs to be - translated at runtime into a form that can be invoked on a <>, it - could be a compiled version of the kernels in a native code format that is - ready to be invoked without further translation, or it could be a source - code representation which needs to be compiled before it can be invoked. + A device image is a representation of one or more <> in + an implementation-defined format. + A device image could be a compiled version of the kernels in an + intermediate language representation which needs to be translated at + runtime into a form that can be invoked on a <>, it could be a + compiled version of the kernels in a native code format that is ready to + be invoked without further translation, or it could be a source code + representation which needs to be compiled before it can be invoked. Other representations are possible too. [[device-selector]]device selector:: - A way to select a device used in various places. This is a callable - object taking a <> reference and returning an integer rank. - One of the device with the highest non-negative value is selected. See - <> for more details. + A way to select a device used in various places. + This is a callable object taking a <> reference and returning an + integer rank. + One of the device with the highest non-negative value is selected. + See <> for more details. [[event]]event:: A SYCL object that represents the status of an operation that is being @@ -199,41 +208,45 @@ object. For the full description please refer to <>. [[global-id]]global id:: As in OpenCL, a global ID is used to uniquely identify a <> - and is derived from the number of global <> specified - when executing a kernel. A global ID is a one, two or three-dimensional - value that starts at 0 per dimension. + and is derived from the number of global <> + specified when executing a kernel. + A global ID is a one, two or three-dimensional value that starts at 0 + per dimension. [[global-memory]]global memory:: - Global memory is a memory region accessible to all <> - executing on a <>. + Global memory is a memory region accessible to all + <> executing on a <>. [[group]]group:: A group of work-items within the index space of a SYCL kernel execution, such as a <> or <>. [[group-barrier]]group barrier:: - A synchronization function within a group of <>. All the - <> of a group must execute the barrier construct before any - <> continues execution beyond the barrier. Additionally all work-items - in the group execute a release <> prior to synchronizing at the - barrier, all work-items in the group execute an acquire <> after - synchronizing at the barrier, and there is an implicit synchronization between - these acquire and release fences as if through an atomic operation on an - atomic object internal to the barrier implementation. + A synchronization function within a group of <>. + All the <> of a group must execute the barrier + construct before any <> continues execution beyond the + barrier. + Additionally all work-items in the group execute a release <> + prior to synchronizing at the barrier, all work-items in the group + execute an acquire <> after synchronizing at the barrier, and + there is an implicit synchronization between these acquire and release + fences as if through an atomic operation on an atomic object internal to + the barrier implementation. [[h-item]]h-item:: - A unique identifier representing a single <> within the - index space of a SYCL kernel hierarchical execution. Can be one, two or - three dimensional. In the SYCL interface a <> is represented - by the [code]#h_item# class (see <>). + A unique identifier representing a single <> within the index + space of a SYCL kernel hierarchical execution. + Can be one, two or three dimensional. + In the SYCL interface a <> is represented by the [code]#h_item# + class (see <>). [[host]]host:: - Host is the system that executes the {cpp} application including the SYCL - API. + Host is the system that executes the {cpp} application including the + SYCL API. [[host-pointer]]host pointer:: - A pointer to memory on the host. Cannot be accessed directly from a - <>. + A pointer to memory on the host. + Cannot be accessed directly from a <>. [[host-task]]host task:: A <> which invokes a native {cpp} callable, scheduled @@ -244,30 +257,32 @@ object. For the full description please refer to <>. to schedule a native {cpp} function. [[id]]id:: - It is a unique identifier of an item in an index space. It can be one, - two or three dimensional index space, since the SYCL kernel execution - model is an <>. It is one of the index space classes. For - the full description please refer to <>. + It is a unique identifier of an item in an index space. + It can be one, two or three dimensional index space, since the SYCL + kernel execution model is an <>. + It is one of the index space classes. + For the full description please refer to <>. [[image]]image:: Images in SYCL, like buffers, are abstractions of multidimensional - structured arrays. Image can refer to [code]#unsampled_image# and - [code]#sampled_image#. For the full description please refer to - <>. + structured arrays. + Image can refer to [code]#unsampled_image# and [code]#sampled_image#. + For the full description please refer to <>. [[implementation-defined]]implementation-defined:: Behavior that is explicitly allowed to vary between conforming - implementations of SYCL. A SYCL implementer is required to document the - implementation-defined behavior. + implementations of SYCL. + A SYCL implementer is required to document the implementation-defined + behavior. [[index-space-classes]]index space classes:: - Like in OpenCL, the kernel execution model defines an - <> index space. + Like in OpenCL, the kernel execution model defines an <> index + space. The <> class that defines an <> is the - [code]#sycl::nd_range#, which takes as input the sizes of global - and local work-items, represented using the [code]#sycl::range# - class. The kernel library classes for indexing in the defined - <> are the following classes: + [code]#sycl::nd_range#, which takes as input the sizes of global and + local work-items, represented using the [code]#sycl::range# class. + The kernel library classes for indexing in the defined <> are + the following classes: + * [code]#sycl::id# : The basic index class representing an <>; * [code]#sycl::item# : The <> index class that contains the @@ -279,28 +294,32 @@ object. For the full description please refer to <>. [[input]]input:: A state which a <> can be in, representing - <> as a source or intermediate representation + <> as a source or + intermediate representation [[item]]item:: An item id is an interface used to retrieve the <>, - <> and <>. For further details see - <>. + <> and <>. + For further details see <>. [[kernel]]kernel:: - A kernel represents a <> that has been compiled for a - device, including all of the <> it calls. - A kernel is implicitly created when a <> is submitted - to a device via a <>. However, a kernel can - also be created manually by pre-compiling a <> (see - <>). + A kernel represents a <> that has been compiled + for a device, including all of the <> + it calls. + A kernel is implicitly created when a <> is + submitted to a device via a <>. + However, a kernel can also be created manually by pre-compiling a + <> (see <>). [[kernel-bundle]]kernel bundle:: - A kernel bundle is a collection of <> that are - associated with the same <> and with a set of <>. + A kernel bundle is a collection of <> that + are associated with the same <> and with a set of + <>. Kernel bundles have one of three states: <>, <> or - <>. Kernel bundles in the executable state are ready to be - invoked on a device, whereas bundles in the other states need to be - translated into the executable state before they can be invoked. + <>. + Kernel bundles in the executable state are ready to be invoked on a + device, whereas bundles in the other states need to be translated into + the executable state before they can be invoked. [[kernel-handler]]kernel handler:: A representation of a <> being invoked that is @@ -310,23 +329,24 @@ object. For the full description please refer to <>. [[kernel-invocation-command]]kernel invocation command:: A type of command that can be used inside a <> in order - to schedule a <>, includes - [code]#single_task#, all variants of [code]#parallel_for# and - [code]#parallel_for_workgroup#. + to schedule a <>, includes [code]#single_task#, + all variants of [code]#parallel_for# and [code]#parallel_for_workgroup#. [[kernel-name]]kernel name:: A kernel name is a class type that is used to assign a name to the kernel function, used to link the host system with the kernel object - output by the device compiler. For details on naming kernels please see - <>. + output by the device compiler. + For details on naming kernels please see <>. [[kernel-scope]]kernel scope:: The function scope of the [code]#operator()# on a - <>. Note that any function or member function called from - the kernel is also compiled in kernel scope. The kernel scope allows {cpp} - language extensions as well as restrictions to reflect the capabilities - of devices. The extensions and restrictions are defined in the - SYCL device compiler specification. + <>. + Note that any function or member function called from the kernel is also + compiled in kernel scope. + The kernel scope allows {cpp} language extensions as well as + restrictions to reflect the capabilities of devices. + The extensions and restrictions are defined in the SYCL device compiler + specification. [[local-id]]local id:: A unique identifier of a <> among other work-items of a @@ -338,88 +358,94 @@ object. For the full description please refer to <>. [[native-backend-object]]native backend object:: An opaque object defined by a specific backend that represents a - high-level SYCL object on said backend. There is no guarantee of having - native backend objects for all SYCL types. + high-level SYCL object on said backend. + There is no guarantee of having native backend objects for all SYCL + types. [[native-specialization-constant]]native-specialization constant:: - A <> in a device image whose value can be used by - an online compiler as an immediate value during the compilation. + A <> in a device image whose value can be used + by an online compiler as an immediate value during the compilation. [[nd-item]]nd-item:: A unique identifier representing a single <> and - <> within the index space of a SYCL kernel execution. Can - be one, two or three dimensional. In the SYCL interface an <> - is represented by the [code]#nd_item# class (see - <>). + <> within the index space of a SYCL kernel execution. + Can be one, two or three dimensional. + In the SYCL interface an <> is represented by the + [code]#nd_item# class (see <>). [[nd-range]]nd-range:: A representation of the index space of a SYCL kernel execution, the - distribution of <> within into <>. + distribution of <> within into + <>. Contains a <> specifying the number of global <>, a <> specifying the number of local - <> and a <> specifying the global offset. Can be - one, two or three dimensional. The minimum size of <> - within the <> is 0 per dimension; where any dimension is set to zero, - the index space in all dimensions will be zero. - In the SYCL interface an - <> is represented by the [code]#nd_range# class (see - <>). + <> and a <> specifying the global offset. + Can be one, two or three dimensional. + The minimum size of <> within the <> is 0 per + dimension; where any dimension is set to zero, the index space in all + dimensions will be zero. + In the SYCL interface an <> is represented by the + [code]#nd_range# class (see <>). [[mem-fence]]mem-fence:: - A memory fence provides control over re-ordering of memory load - and store operations when coupled with an atomic operation that - synchronizes two fences with each other (or when the fences are part of - a <> in which case there is implicit synchronization - as if an atomic operation has synchronized the fences). The - [code]#sycl::atomic_fence# function acts as a fence across all - work-items and devices specified by a [code]#memory_scope# - argument. + A memory fence provides control over re-ordering of memory load and + store operations when coupled with an atomic operation that synchronizes + two fences with each other (or when the fences are part of a + <> in which case there is implicit synchronization as if + an atomic operation has synchronized the fences). + The [code]#sycl::atomic_fence# function acts as a fence across all + work-items and devices specified by a [code]#memory_scope# argument. [[object]]object:: A state which a <> can be in, representing - <> as a non-executable object. + <> as a non-executable + object. [[platform]]platform:: A collection of <> managed by a single <>. [[private-memory]]private memory:: - A region of memory private to a <>. Variables defined in one - work-item's private memory are not visible to another work-item. - The [code]#sycl::private_memory# class provides - access to the work-item's private memory for the hierarchical API as it - is described at <>. + A region of memory private to a <>. + Variables defined in one work-item's private memory are not visible to + another work-item. + The [code]#sycl::private_memory# class provides access to the + work-item's private memory for the hierarchical API as it is described + at <>. [[queue]]queue:: A SYCL command queue is an object that holds command groups to be - executed on a SYCL <>. SYCL provides a heterogeneous platform - integration using device queue, which is the minimum requirement for a - SYCL application to run on a SYCL <>. For the full description - please refer to <>. + executed on a SYCL <>. + SYCL provides a heterogeneous platform integration using device queue, + which is the minimum requirement for a SYCL application to run on a SYCL + <>. + For the full description please refer to <>. [[range]]range:: A representation of a number of <> or <> within the index space of a SYCL kernel - execution. Can be one, two or three dimensional. In the SYCL interface a - <> is represented by the [code]#range# class - (see <>). + execution. + Can be one, two or three dimensional. + In the SYCL interface a <> is represented by the [code]#range# + class (see <>). [[ranged-accessor]]ranged accessor:: A ranged accessor is a host or buffer <> that was constructed - with a non-zero offset into the data buffer or with an access range smaller - than the range of the data buffer, or both. Please refer to - <> for more info. + with a non-zero offset into the data buffer or with an access range + smaller than the range of the data buffer, or both. + Please refer to <> for more info. [[reduction]]reduction:: An operation that produces a single value by combining multiple values - in an unspecified order using a binary operator. If the operator is - non-associative or non-commutative, the behavior of a reduction may be - non-deterministic. + in an unspecified order using a binary operator. + If the operator is non-associative or non-commutative, the behavior of a + reduction may be non-deterministic. [[root-device]]root device:: - A device that is not a sub-device. The function - [code]#device::get_devices()# returns a vector of all the root devices. + A device that is not a sub-device. + The function [code]#device::get_devices()# returns a vector of all the + root devices. [[rule-of-five]]rule of five:: For a given class, if at least one of the copy constructor, move @@ -435,15 +461,15 @@ object. For the full description please refer to <>. [[smcp]]SMCP:: The single-source multiple compiler-passes (SMCP) - technique allows a single-source file to be parsed by multiple - compilers for building native programs per compilation target. For - example, a standard {cpp} CPU compiler for targeting <> will - parse the <> to create the {cpp} <> - which offloads parts of the computation to other - <>. A SYCL device compiler will parse the same - source file and target only SYCL kernels. For the full description - please refer to <>. See <> for another - approach. + technique allows a single-source file to be parsed by multiple compilers + for building native programs per compilation target. + For example, a standard {cpp} CPU compiler for targeting <> will + parse the <> to create the {cpp} <> which + offloads parts of the computation to other <>. + A SYCL device compiler will parse the same source file and target only + SYCL kernels. + For the full description please refer to <>. + See <> for another approach. [[specialization-constant]]specialization constant:: A constant variable where the value is not known until compilation of @@ -456,15 +482,15 @@ object. For the full description please refer to <>. <> for retrieving the value during invocation. [[sscp]]SSCP:: - The single-source single compiler-pass (SSCP) technique - allows a single-source file to be parsed only once by a single - compiler. For example, the SYCL compiler will parse the - <> once. Then, from this single intermediate - representation, for each kind of device architecture a compilation - flow will generate the binary for each kernel and another - compilation flow will generate the <> code of the {cpp} - <>. For the full description please refer to - <>. See <> for another approach. + The single-source single compiler-pass (SSCP) technique allows a + single-source file to be parsed only once by a single compiler. + For example, the SYCL compiler will parse the <> once. + Then, from this single intermediate representation, for each kind of + device architecture a compilation flow will generate the binary for each + kernel and another compilation flow will generate the <> code of + the {cpp} <>. + For the full description please refer to <>. + See <> for another approach. [[string-kernel-name]]string kernel name:: The name of a <> in string form, this can be the @@ -472,10 +498,10 @@ object. For the full description please refer to <>. <>. [[sub-group]]sub-group:: - The SYCL sub-group ([code]#sycl::sub_group# class) is a - representation of a collection of related work-items within a - <>. For further details for the [code]#sycl::sub_group# class - see <>. + The SYCL sub-group ([code]#sycl::sub_group# class) is a representation + of a collection of related work-items within a <>. + For further details for the [code]#sycl::sub_group# class see + <>. [[sub-group-barrier]]sub-group barrier:: A <> for all <> in a <>. @@ -484,40 +510,43 @@ object. For the full description please refer to <>. A <> for all <> in a <>. [[sycl-application]]SYCL application:: - A SYCL application is a {cpp} application which uses the SYCL programming - model in order to execute <> on <>. + A SYCL application is a {cpp} application which uses the SYCL + programming model in order to execute <> on + <>. [[backend]]SYCL backend:: An implementation of the SYCL programming model using an heterogeneous - programming API. A SYCL backend exposes one or multiple SYCL - <>. For example, the OpenCL backend, via the ICD loader, - can expose multiple OpenCL <>. + programming API. + A SYCL backend exposes one or multiple SYCL <>. + For example, the OpenCL backend, via the ICD loader, can expose multiple + OpenCL <>. [[backend-api]]SYCL backend API:: The exposed API for writing SYCL code against a given <>. [[sycl-library]]SYCL {cpp} template library:: - The template library is a set of {cpp} templated classes which provide the - programming interface to the SYCL developer. + The template library is a set of {cpp} templated classes which provide + the programming interface to the SYCL developer. [[sycl-file]]SYCL file:: A SYCL {cpp} source file that contains SYCL API calls. [[sycl-kernel-function]]SYCL kernel function:: - A type which is callable with [code]#operator()# that takes an - <>, <>, <> or <>, and an optional - [code]#kernel_handler# as its last parameter. This type can be passed to - kernel enqueue member functions of the <>. A - <> defines an entry point to a <>. The - function object can be a named <> type or lambda + A type which is callable with [code]#operator()# that takes an <>, + <>, <> or <>, and an optional + [code]#kernel_handler# as its last parameter. + This type can be passed to kernel enqueue member functions of the + <>. + A <> defines an entry point to a <>. + The function object can be a named <> type or lambda function. [[sycl-runtime]]SYCL runtime:: - A SYCL runtime is an implementation of the SYCL API specification. The - SYCL runtime manages the different <>, + A SYCL runtime is an implementation of the SYCL API specification. + The SYCL runtime manages the different <>, <>, <> as well as memory - handling of data between host and <> <> - to enable semantically correct execution of SYCL programs. + handling of data between host and <> <> to + enable semantically correct execution of SYCL programs. [[type-kernel-name]]type kernel name:: The name of a <> in type form, this can be either @@ -528,7 +557,8 @@ object. For the full description please refer to <>. + -- Unified Shared Memory (USM) provides a pointer-based alternative to the -<> programming model. USM enables: +<> programming model. +USM enables: * easier integration into existing code bases by representing allocations as pointers rather than buffers, with full support for pointer @@ -542,35 +572,40 @@ See <> -- [[work-group]]work-group:: - The SYCL work-group ([code]#sycl::group# class) is a representation - of a collection of related <> that execute on a single - compute unit. The <> in the group execute the same - kernel-instance and <>. - For further details for the [code]#sycl::group# - class see <>. + The SYCL work-group ([code]#sycl::group# class) is a representation of a + collection of related <> that execute on a single + compute unit. + The <> in the group execute the same + kernel-instance and <>. + For further details for the [code]#sycl::group# class see + <>. [[work-group-barrier]]work-group barrier:: - A <> for all <> in a <>. + A <> for all <> in a + <>. [[work-group-mem-fence]]work-group mem-fence:: A <> for all <> in a <>. [[work-group-id]]work-group id:: - As in OpenCL, SYCL kernels execute in <>. The group ID - is the ID of the <> that a <> is executing - within. A group ID is an one, two or three dimensional value that starts - at 0 per dimension. + As in OpenCL, SYCL kernels execute in <>. + The group ID is the ID of the <> that a <> is + executing within. + A group ID is an one, two or three dimensional value that starts at 0 + per dimension. [[work-group-range]]work-group range:: A group range is the size of the <> for every dimension. [[work-item]]work-item:: The SYCL work-item is a representation of a <> among a - collection of parallel executions of a kernel invoked on a <> - by a <>. A <> is executed by one or more processing - elements as part of a <> executing on a compute unit. A - <> is distinguished from other <> by its - <> or the combination of its <> and its + collection of parallel executions of a kernel invoked on a <> by + a <>. + A <> is executed by one or more processing elements + as part of a <> executing on a compute unit. + A <> is distinguished from other <> by + its <> or the combination of its <> and its <> within a <>. :work-items: <> diff --git a/adoc/chapters/host_backend.adoc b/adoc/chapters/host_backend.adoc index 1c514546..4df03db9 100644 --- a/adoc/chapters/host_backend.adoc +++ b/adoc/chapters/host_backend.adoc @@ -22,8 +22,8 @@ = Host backend specification This chapter describes how SYCL is mapped on the <>. -The <> exposes the host where the SYCL application is executing -as a platform to dispatch SYCL kernels. +The <> exposes the host where the SYCL application is +executing as a platform to dispatch SYCL kernels. The <> exposes at least one <>. @@ -33,31 +33,35 @@ The <> exposes at least one <>. The SYCL host device implements all functionality required to execute the SYCL kernels directly on the host, without relying on a third party API. -It has full SYCL capabilities and reports them through the SYCL information retrieval -interface. At least one SYCL host device must be exposed in the SYCL host -backend in all SYCL implementations, and it must always be available. -Any {cpp} application debugger, if available on the system, -can be used for debugging SYCL kernels executing on a SYCL host device. +It has full SYCL capabilities and reports them through the SYCL information +retrieval interface. +At least one SYCL host device must be exposed in the SYCL host backend in +all SYCL implementations, and it must always be available. +Any {cpp} application debugger, if available on the system, can be used for +debugging SYCL kernels executing on a SYCL host device. // From Architecture, Section 3.3 -When a SYCL implementation executes kernels on the host device, -it is free to use whatever parallel execution facilities available on the -host, as long as it executes within the semantics of the kernel execution model -defined by the SYCL kernel execution model. +When a SYCL implementation executes kernels on the host device, it is free +to use whatever parallel execution facilities available on the host, as long +as it executes within the semantics of the kernel execution model defined by +the SYCL kernel execution model. -Kernel math library functions on the host must conform to OpenCL math precision -requirements. The SYCL host device needs to be queried for the capabilities it -provides. This ensures consistency when executing any SYCL general application. +Kernel math library functions on the host must conform to OpenCL math +precision requirements. +The SYCL host device needs to be queried for the capabilities it provides. +This ensures consistency when executing any SYCL general application. The <> must report as supporting images and therefore support the minimum image formats. -The range of image formats supported by the host device is implementation-defined, -but must match the minimum requirements of the OpenCL specification. +The range of image formats supported by the host device is +implementation-defined, but must match the minimum requirements of the +OpenCL specification. -SYCL implementors can provide extensions on the host-device to match any other -backend-specific extension. This allows developers to rely on the host device -to execute their programs when said backend is not available. +SYCL implementors can provide extensions on the host-device to match any +other backend-specific extension. +This allows developers to rely on the host device to execute their programs +when said backend is not available. === SYCL memory model on the host @@ -78,18 +82,18 @@ All SYCL device memories are available on devices from the host backend. == Interoperability with the host application -The host backend must ensure all functionality of the SYCL generic programming -model is always available to developers. +The host backend must ensure all functionality of the SYCL generic +programming model is always available to developers. However, since there is no heterogeneous API behind the host backend (it directly targets the host platform), there are no native types for SYCL objects to map to in the SYCL application. Inside SYCL kernels, the host backend must ensure interoperability with -existing host code, so that existing host libraries can be used inside -SYCL kernels executing on the host. +existing host code, so that existing host libraries can be used inside SYCL +kernels executing on the host. In particular, when retrieving a raw pointer from a multi pointer object, -the pointer returned must be usable by any library accessible by the -SYCL application. +the pointer returned must be usable by any library accessible by the SYCL +application. // %%%%%%%%%%%%%%%%%%%%%%%%%%%% end host_backend %%%%%%%%%%%%%%%%%%%%%%%%%%%% diff --git a/adoc/chapters/information_descriptors.adoc b/adoc/chapters/information_descriptors.adoc index a24d967f..c36f75ce 100644 --- a/adoc/chapters/information_descriptors.adoc +++ b/adoc/chapters/information_descriptors.adoc @@ -4,8 +4,8 @@ [[sec:information-descriptors]] = Information descriptors -This appendix contains the definitions of all the SYCL information descriptors -introduced in <>. +This appendix contains the definitions of all the SYCL information +descriptors introduced in <>. [[appendix.platform.descriptors]] == Platform information descriptors @@ -54,8 +54,8 @@ include::{header_dir}/queueInfo.h[lines=4..-1] [[appendix.kernel.descriptors]] == Kernel information descriptors -The following interface includes all the information descriptors -that apply to kernels as described in <>. +The following interface includes all the information descriptors that apply +to kernels as described in <>. [source,,linenums] ---- include::{header_dir}/kernelInfo.h[lines=4..-1] @@ -65,9 +65,9 @@ include::{header_dir}/kernelInfo.h[lines=4..-1] [[appendix.event.descriptors]] == Event information descriptors -The following interface includes all the information descriptors -for the [code]#event# class as described in <> -and <>. +The following interface includes all the information descriptors for the +[code]#event# class as described in <> and +<>. [source,,linenums] ---- include::{header_dir}/eventInfo.h[lines=4..-1] diff --git a/adoc/chapters/introduction.adoc b/adoc/chapters/introduction.adoc index 35a36d41..4d2e0d8f 100644 --- a/adoc/chapters/introduction.adoc +++ b/adoc/chapters/introduction.adoc @@ -3,88 +3,95 @@ [[introduction]] = Introduction -SYCL (pronounced "`sickle`") is a royalty-free, cross-platform -abstraction {cpp} programming model for heterogeneous computing. SYCL -builds on the underlying concepts, portability and efficiency of -parallel API or standards like OpenCL while adding much of the ease of -use and flexibility of single-source {cpp}. +SYCL (pronounced "`sickle`") is a royalty-free, cross-platform abstraction +{cpp} programming model for heterogeneous computing. +SYCL builds on the underlying concepts, portability and efficiency of +parallel API or standards like OpenCL while adding much of the ease of use +and flexibility of single-source {cpp}. Developers using SYCL are able to write standard modern {cpp} code, with many of the techniques they are accustomed to, such as inheritance and -templates. At the same time, developers have access to the full range -of capabilities of the underlying implementation (such as OpenCL) both -through the features of the SYCL libraries and, where necessary, -through interoperation with code written directly using the underneath -implementation, via their APIs. +templates. +At the same time, developers have access to the full range of capabilities +of the underlying implementation (such as OpenCL) both through the features +of the SYCL libraries and, where necessary, through interoperation with code +written directly using the underneath implementation, via their APIs. To reduce programming effort and increase the flexibility with which -developers can write code, SYCL extends the concepts found in -standards like OpenCL model in a few ways beyond the general use of {cpp} -features: +developers can write code, SYCL extends the concepts found in standards like +OpenCL model in a few ways beyond the general use of {cpp} features: * execution of parallel kernels on a heterogeneous device is made - simultaneously convenient and flexible. Common parallel patterns are - prioritized with simple syntax, which through a series {cpp} types allow - the programmer to express additional requirements, such as synchronization, - if needed; + simultaneously convenient and flexible. + Common parallel patterns are prioritized with simple syntax, which + through a series {cpp} types allow the programmer to express additional + requirements, such as synchronization, if needed; * when using buffers and accessors, data access in SYCL is separated from - data storage. By relying on the {cpp}-style resource acquisition is - initialization (RAII) idiom to capture data dependencies between device - code blocks, the runtime library can track data movement and provide - correct behavior without the complexity of manually managing event - dependencies between kernel instances and without the programmer having to - explicitly move data. This approach enables the data-parallel task-graphs - that might be already part of the execution model to be built up easily - and safely by SYCL programmers; + data storage. + By relying on the {cpp}-style resource acquisition is initialization + (RAII) idiom to capture data dependencies between device code blocks, + the runtime library can track data movement and provide correct behavior + without the complexity of manually managing event dependencies between + kernel instances and without the programmer having to explicitly move + data. + This approach enables the data-parallel task-graphs that might be + already part of the execution model to be built up easily and safely by + SYCL programmers; * Unified Shared Memory (<>) provides a mechanism for explicit data - allocation and movement. This approach enables the use of pointer-based - algorithms and data structures on heterogeneous devices, and allows for - increased re-use of code across host and device; - * the hierarchical parallelism syntax offers a way of expressing - data parallelism similar to the OpenCL device or OpenMP target - device execution model in an easy-to-understand modern {cpp} form. It - more cleanly layers parallel loops and synchronization points to + allocation and movement. + This approach enables the use of pointer-based algorithms and data + structures on heterogeneous devices, and allows for increased re-use of + code across host and device; + * the hierarchical parallelism syntax offers a way of expressing data + parallelism similar to the OpenCL device or OpenMP target device + execution model in an easy-to-understand modern {cpp} form. + It more cleanly layers parallel loops and synchronization points to avoid fragmentation of code and to more efficiently map to CPU-style architectures. SYCL retains the execution model, runtime feature set and device -capabilities inspired by the OpenCL standard. This standard imposes -some limitations on the full range of {cpp} features that SYCL is able -to support. This ensures portability of device code across as wide a -range of devices as possible. As a result, while the code can be -written in standard {cpp} syntax with interoperability with standard {cpp} -programs, the entire set of {cpp} features is not available in SYCL -device code. In particular, SYCL device code, as defined by this -specification, does not support virtual function calls, function -pointers in general, exceptions, runtime type information or the full -set of {cpp} libraries that may depend on these features or on features -of a particular host compiler. Nevertheless, these basic restrictions -can be relieved by some specific Khronos or vendor extensions. +capabilities inspired by the OpenCL standard. +This standard imposes some limitations on the full range of {cpp} features +that SYCL is able to support. +This ensures portability of device code across as wide a range of devices as +possible. +As a result, while the code can be written in standard {cpp} syntax with +interoperability with standard {cpp} programs, the entire set of {cpp} +features is not available in SYCL device code. +In particular, SYCL device code, as defined by this specification, does not +support virtual function calls, function pointers in general, exceptions, +runtime type information or the full set of {cpp} libraries that may depend +on these features or on features of a particular host compiler. +Nevertheless, these basic restrictions can be relieved by some specific +Khronos or vendor extensions. SYCL implements an <> design which offers the power of source -integration while allowing toolchains to remain flexible. The <> -design supports embedding of code intended to be compiled for a device, -for example a GPU, inline with host code. This embedding of code offers three -primary benefits: +integration while allowing toolchains to remain flexible. +The <> design supports embedding of code intended to be compiled for a +device, for example a GPU, inline with host code. +This embedding of code offers three primary benefits: Simplicity:: For novice programmers using frameworks like OpenCL, the separation of host and device source code in OpenCL can become complicated to deal with, particularly when similar kernel code is used for multiple - different operations on different data types. A single compiler flow and - integrated tool chain combined with libraries that perform a lot of - simple tasks simplifies initial OpenCL programs to a minimum complexity. - This reduces the learning curve for programmers new to heterogeneous programming and allows - them to concentrate on parallelization techniques rather than syntax. + different operations on different data types. + A single compiler flow and integrated tool chain combined with libraries + that perform a lot of simple tasks simplifies initial OpenCL programs to + a minimum complexity. + This reduces the learning curve for programmers new to heterogeneous + programming and allows them to concentrate on parallelization techniques + rather than syntax. Reuse:: - {cpp}'s type system allows for complex interactions between different code - units and supports efficient abstract interface design and reuse of - library code. For example, a [keyword]#transform# or [keyword]#map# - operation applied to an array of data may allow specialization on both - the operation applied to each element of the array and on the type of - the data. The <> design of SYCL enables this interaction to - bridge the host code/device code boundary such that the device code to - be specialized on both of these factors directly from the host code. + {cpp}'s type system allows for complex interactions between different + code units and supports efficient abstract interface design and reuse of + library code. + For example, a [keyword]#transform# or [keyword]#map# operation applied + to an array of data may allow specialization on both the operation + applied to each element of the array and on the type of the data. + The <> design of SYCL enables this interaction to bridge the host + code/device code boundary such that the device code to be specialized on + both of these factors directly from the host code. Efficiency:: Tight integration with the type system and reuse of library code enables a compiler to perform inlining of code and to produce efficient @@ -92,56 +99,60 @@ Efficiency:: having to generate kernel source strings dynamically. The use of {cpp} features such as generic programming, templated code, -functional programming and inheritance on top of existing -heterogeneous execution model opens a wide scope for innovation in -software design for heterogeneous systems. Clean integration of device -and host code within a single {cpp} type system enables the development -of modern, templated generic and adaptable libraries that build -simple, yet efficient, interfaces to offer more developers access to -heterogeneous computing capabilities and devices. SYCL is intended to -serve as a foundation for innovation in programming models for -heterogeneous systems, that builds on open and widely implemented +functional programming and inheritance on top of existing heterogeneous +execution model opens a wide scope for innovation in software design for +heterogeneous systems. +Clean integration of device and host code within a single {cpp} type system +enables the development of modern, templated generic and adaptable libraries +that build simple, yet efficient, interfaces to offer more developers access +to heterogeneous computing capabilities and devices. +SYCL is intended to serve as a foundation for innovation in programming +models for heterogeneous systems, that builds on open and widely implemented standard foundation like OpenCL or Vulkan. -SYCL is designed to be as close to standard {cpp} as possible. In -practice, this means that as long as no dependence is created on -SYCL's integration with the underlying implementation, a -standard {cpp} compiler can compile SYCL programs and they will run -correctly on a host CPU. Any use of specialized low-level features can -be masked using the C preprocessor in the same way that -compiler-specific intrinsics may be hidden to ensure portability -between different host compilers. +SYCL is designed to be as close to standard {cpp} as possible. +In practice, this means that as long as no dependence is created on SYCL's +integration with the underlying implementation, a standard {cpp} compiler +can compile SYCL programs and they will run correctly on a host CPU. +Any use of specialized low-level features can be masked using the C +preprocessor in the same way that compiler-specific intrinsics may be hidden +to ensure portability between different host compilers. SYCL is designed to allow a compilation flow where the source file is passed -through multiple different compilers, including a standard {cpp} host compiler of -the developer's choice, and where the resulting application combines the results -of these compilation passes. This is distinct from a single-source flow that -might use language extensions that preclude the use of a standard host compiler. -The SYCL standard does not preclude the use of a single compiler flow, but is -designed to not require it. SYCL can also be implemented purely as a library, -in which case no special compiler support is required at all. +through multiple different compilers, including a standard {cpp} host +compiler of the developer's choice, and where the resulting application +combines the results of these compilation passes. +This is distinct from a single-source flow that might use language +extensions that preclude the use of a standard host compiler. +The SYCL standard does not preclude the use of a single compiler flow, but +is designed to not require it. +SYCL can also be implemented purely as a library, in which case no special +compiler support is required at all. -The advantages of this design are two-fold. First, it offers better integration -with existing tool chains. An application that already builds using a chosen -compiler can continue to do so when SYCL code is added. Using the SYCL tools on -a source file within a project will both compile for a device and let -the same source file be compiled using the same host compiler that the rest of -the project is compiled with. Linking and library relationships are unaffected. -This design simplifies porting of pre-existing applications to SYCL. Second, the -design allows the optimal compiler to be chosen for each device where different -vendors may provide optimized tool-chains. +The advantages of this design are two-fold. +First, it offers better integration with existing tool chains. +An application that already builds using a chosen compiler can continue to +do so when SYCL code is added. +Using the SYCL tools on a source file within a project will both compile for +a device and let the same source file be compiled using the same host +compiler that the rest of the project is compiled with. +Linking and library relationships are unaffected. +This design simplifies porting of pre-existing applications to SYCL. +Second, the design allows the optimal compiler to be chosen for each device +where different vendors may provide optimized tool-chains. -To summarize, SYCL enables computational kernels to be written inside -{cpp} source files as normal {cpp} code, leading to the concept of -"`single-source`" programming. This means that software developers can -develop and use generic algorithms and data structures using standard -{cpp} template techniques, while still supporting multi-platform, -multi-device heterogeneous execution. Access to the low level APIs of -an underlying implementation (such as OpenCL) is also supported. -The specification has been designed to enable implementation -across as wide a variety of platforms as possible as well as ease of -integration with other platform-specific technologies, thereby letting -both users and implementers build on top of SYCL as an open platform -for system-wide heterogeneous processing innovation. +To summarize, SYCL enables computational kernels to be written inside {cpp} +source files as normal {cpp} code, leading to the concept of +"`single-source`" programming. +This means that software developers can develop and use generic algorithms +and data structures using standard {cpp} template techniques, while still +supporting multi-platform, multi-device heterogeneous execution. +Access to the low level APIs of an underlying implementation (such as +OpenCL) is also supported. +The specification has been designed to enable implementation across as wide +a variety of platforms as possible as well as ease of integration with other +platform-specific technologies, thereby letting both users and implementers +build on top of SYCL as an open platform for system-wide heterogeneous +processing innovation. // %%%%%%%%%%%%%%%%%%%%%%%%%%%% end introduction %%%%%%%%%%%%%%%%%%%%%%%%%%%% diff --git a/adoc/chapters/opencl_backend.adoc b/adoc/chapters/opencl_backend.adoc index 94ce3b96..3be22881 100644 --- a/adoc/chapters/opencl_backend.adoc +++ b/adoc/chapters/opencl_backend.adoc @@ -4,20 +4,20 @@ [[chapter:opencl-backend]] = OpenCL backend specification -This chapter describes how the SYCL general programming model is mapped on top -of OpenCL, and how the SYCL generic interoperability interface must be -implemented by vendors providing SYCL for OpenCL implementations to ensure SYCL -applications written for the OpenCL backend are interoperable. +This chapter describes how the SYCL general programming model is mapped on +top of OpenCL, and how the SYCL generic interoperability interface must be +implemented by vendors providing SYCL for OpenCL implementations to ensure +SYCL applications written for the OpenCL backend are interoperable. [[sec:opencl:native-interop-application]] == SYCL application interoperability native backend objects -For each <> class which supports <> interoperability, -specializations of [code]#backend_traits::input_type# -and [code]#backend_traits::return_type# must be defined as the -type of <> interoperability <> -associated with [code]#SyclType# for the <>. +For each <> class which supports <> +interoperability, specializations of [code]#backend_traits::input_type# and +[code]#backend_traits::return_type# must be defined as the type of +<> interoperability <> associated +with [code]#SyclType# for the <>. The types of the native backend objects for <> interoperability are described in <>. @@ -25,13 +25,14 @@ interoperability are described in <>. [[sec:opencl:native-interop-kernel]] == Kernel function interoperability native backend objects -For each <> class which supports kernel function interoperability, -a specialization of [code]#backend_traits::return_type# must be defined as the type of kernel -function interoperability <> associated with [code]#SyclType# -for the <>. +For each <> class which supports kernel function +interoperability, a specialization of [code]#backend_traits::return_type# +must be defined as the type of kernel function interoperability +<> associated with [code]#SyclType# for the +<>. -The types of the native backend objects for kernel function interoperability are -described in <>. +The types of the native backend objects for kernel function interoperability +are described in <>. [[table.opencl.kernelfunctioninterop.nativeobjects]] .Types of native backend objects kernel function interoperability @@ -65,13 +66,14 @@ include::{header_dir}/openclBackend/samplerImagePair.h[lines=4..-1] On destruction of the last copy of an instance of a SYCL class which is specified to have reference semantics as described in -<> that was constructed using one of the <> -interoperability [code]#make_*# functions specified in -<> additional lifetime related operations may -be performed which are required for the underlying <>. +<> that was constructed using one of the +<> interoperability [code]#make_*# functions specified in +<> additional lifetime related operations +may be performed which are required for the underlying +<>. -The additional behavior performed by the OpenCL <> for each SYCL class -is described in <>. +The additional behavior performed by the OpenCL <> for each SYCL +class is described in <>. [[table.opencl.interop.destructors]] .Destructor behavior of interop constructed objects with reference semantics @@ -94,61 +96,62 @@ is described in <>. // From 3.8 SYCL for OpenCL Framework == SYCL for OpenCL framework -The SYCL framework allows applications to -use a host and one or more OpenCL devices as a single heterogeneous parallel -computer system. The framework contains the following components: +The SYCL framework allows applications to use a host and one or more OpenCL +devices as a single heterogeneous parallel computer system. +The framework contains the following components: * <>: The template library provides a set of {cpp} templates - and classes which provide the programming model to the user. It enables - the creation of runtime classes such as SYCL queues, buffers and images, - as well as access to some underlying OpenCL runtime object, such as - contexts, platforms, devices and program objects. - * <>: The <> interfaces with the - underlying OpenCL implementations and handles scheduling of commands in - queues, moving of data between host and devices, manages contexts, - programs, kernel compilation and memory management. + and classes which provide the programming model to the user. + It enables the creation of runtime classes such as SYCL queues, buffers + and images, as well as access to some underlying OpenCL runtime object, + such as contexts, platforms, devices and program objects. + * <>: The <> interfaces with the underlying + OpenCL implementations and handles scheduling of commands in queues, + moving of data between host and devices, manages contexts, programs, + kernel compilation and memory management. * [keyword]#OpenCL Implementation(s)#: The SYCL system assumes the existence of one or more OpenCL implementations available on the host machine. - * SYCL <>: The SYCL <> compile - SYCL {cpp} kernels into a format which can be executed on an OpenCL device - at runtime. There may be more than one SYCL device compiler in a SYCL - implementation. The format of the compiled SYCL kernels is not defined. + * SYCL <>: The SYCL + <> compile SYCL {cpp} kernels into a + format which can be executed on an OpenCL device at runtime. + There may be more than one SYCL device compiler in a SYCL + implementation. + The format of the compiled SYCL kernels is not defined. A SYCL device compiler may, or may not, also compile the host parts of the program. -The OpenCL backend is enabled using the [code]#sycl::backend::opencl# -value of [code]#enum class backend#. That means that when the OpenCL -backend is active, the value of +The OpenCL backend is enabled using the [code]#sycl::backend::opencl# value +of [code]#enum class backend#. +That means that when the OpenCL backend is active, the value of [code]#sycl::is_backend_active::value# will be [code]#true#. == Mapping of SYCL programming model on top of OpenCL -The SYCL programming model was originally designed as a high-level model -for the OpenCL API, hence the mapping of SYCL on the OpenCL API is -mostly straightforward. +The SYCL programming model was originally designed as a high-level model for +the OpenCL API, hence the mapping of SYCL on the OpenCL API is mostly +straightforward. -When the OpenCL backend is active on a SYCL application, all visible -OpenCL platforms are exported as SYCL platforms. +When the OpenCL backend is active on a SYCL application, all visible OpenCL +platforms are exported as SYCL platforms. // From Architecture, Section 3.3 -When a SYCL implementation executes kernels on an OpenCL -device, it achieves this by enqueuing OpenCL *commands* to -execute computations on the processing elements within a device. The -processing elements within an OpenCL compute unit may execute a single -stream of instructions as ALUs within a SIMD unit (which execute in -lockstep with a single stream of instructions), as independent SPMD -units (where each PE maintains its own program counter) or as some -combination of the two. +When a SYCL implementation executes kernels on an OpenCL device, it achieves +this by enqueuing OpenCL *commands* to execute computations on the +processing elements within a device. +The processing elements within an OpenCL compute unit may execute a single +stream of instructions as ALUs within a SIMD unit (which execute in lockstep +with a single stream of instructions), as independent SPMD units (where each +PE maintains its own program counter) or as some combination of the two. === Backend specific information descriptors -Some of the SYCL information descriptors are backend-defined. For the OpenCL -backend these information descriptors map directly to OpenCL properties as -described in the table below: +Some of the SYCL information descriptors are backend-defined. +For the OpenCL backend these information descriptors map directly to OpenCL +properties as described in the table below: [[table.opencl.info]] .Mapping of SYCL information descriptors to OpenCL properties @@ -164,15 +167,18 @@ described in the table below: The memory model for SYCL devices running on OpenCL platforms follows the memory model of the OpenCL version they conform to. -In addition to <> , <> and <> memory, -the OpenCL backend permits the use of <> space in SYCL: +In addition to <> , <> and <> +memory, the OpenCL backend permits the use of <> space in +SYCL: - * <> is a region of memory that remains constant - during the execution of a kernel. A pointer to the generic address space cannot - represent an address to this memory region. + * <> is a region of memory that remains + constant during the execution of a kernel. + A pointer to the generic address space cannot represent an address to + this memory region. -Work-items executing in a kernel have access to four distinct memory regions, -with the mapping between SYCL and OpenCL described in <>. +Work-items executing in a kernel have access to four distinct memory +regions, with the mapping between SYCL and OpenCL described in +<>. [[table.opencl.memory]] .Mapping of SYCL memory regions into OpenCL memory regions @@ -187,48 +193,60 @@ with the mapping between SYCL and OpenCL described in <>. === OpenCL interface for buffer command accessors -The enumerator [code]#target::constant_buffer# is deprecated, but will remain a -part of the OpenCL backend as an extension. This enables SYCL kernel functions -to access the contents of a buffer through the OpenCL device’s constant memory. +The enumerator [code]#target::constant_buffer# is deprecated, but will +remain a part of the OpenCL backend as an extension. +This enables SYCL kernel functions to access the contents of a buffer +through the OpenCL device’s constant memory. // From 3.4.1.1 OpenCL resources managed by SYCL Application === OpenCL resources managed by SYCL application In OpenCL, a developer must create a <> to be able to execute -commands on a device. Creating a context involves choosing a <> -and a list of <>. In SYCL, contexts, platforms and devices all -exist, but the user can choose whether to specify them or have the SYCL -implementation create them automatically. The minimum required object for -submitting work to devices in SYCL is the <>, which contains -references to a platform, device and context internally. +commands on a device. +Creating a context involves choosing a <> and a list of +<>. +In SYCL, contexts, platforms and devices all exist, but the user can choose +whether to specify them or have the SYCL implementation create them +automatically. +The minimum required object for submitting work to devices in SYCL is the +<>, which contains references to a platform, device and context +internally. The resources managed by SYCL are: - . <>: all features of OpenCL are implemented by platforms. A - platform can be viewed as a given hardware vendor's runtime and the - devices accessible through it. Some devices will only be accessible to - one vendor's runtime and hence multiple platforms may be present. SYCL - manages the different platforms for the user. In SYCL, a platform - resource is accessible through a [code]#sycl::platform# object. - . <>: any OpenCL resource that is acquired by the user is - attached to a context. A context contains a collection of devices that - the host can use and manages memory objects that can be shared between - the devices. Data movement between devices within a context may be - efficient and hidden by the underlying OpenCL runtime while data - movement between contexts may involve the host. A given context can only - wrap devices owned by a single platform. In SYCL, a context resource is - accessible through a [code]#sycl::context# object. + . <>: all features of OpenCL are implemented by + platforms. + A platform can be viewed as a given hardware vendor's runtime and the + devices accessible through it. + Some devices will only be accessible to one vendor's runtime and hence + multiple platforms may be present. + SYCL manages the different platforms for the user. + In SYCL, a platform resource is accessible through a + [code]#sycl::platform# object. + . <>: any OpenCL resource that is acquired by the user + is attached to a context. + A context contains a collection of devices that the host can use and + manages memory objects that can be shared between the devices. + Data movement between devices within a context may be efficient and + hidden by the underlying OpenCL runtime while data movement between + contexts may involve the host. + A given context can only wrap devices owned by a single platform. + In SYCL, a context resource is accessible through a + [code]#sycl::context# object. . <>: platforms provide one or more devices for executing - kernels. In SYCL, a device is accessible through a - [code]#sycl::device# object. - . <>: OpenCL objects that store implementation - data for the SYCL kernels. These objects are only required for advanced use - in SYCL and are encapsulated in the [code]#sycl::kernel_bundle# class. - . <>: SYCL kernels execute in command queues. The user must - create a queue, which references an associated context, platform and - device. The context, platform and device may be chosen automatically, or - specified by the user. In SYCL, command queues are accessible through - [code]#sycl::queue# objects. + kernels. + In SYCL, a device is accessible through a [code]#sycl::device# object. + . <>: OpenCL objects that store + implementation data for the SYCL kernels. + These objects are only required for advanced use in SYCL and are + encapsulated in the [code]#sycl::kernel_bundle# class. + . <>: SYCL kernels execute in command queues. + The user must create a queue, which references an associated context, + platform and device. + The context, platform and device may be chosen automatically, or + specified by the user. + In SYCL, command queues are accessible through [code]#sycl::queue# + objects. // Removed from OpenCL Spec document // In OpenCL, queues can operate using in-order execution or out-of-order @@ -247,60 +265,62 @@ The resources managed by SYCL are: // a program that mixes standard OpenCL C kernels and OpenCL API code with // SYCL code and expect fully compatible interoperability. -The OpenCL backend for SYCL ensures maximum compatibility between SYCL -and OpenCL kernels and API. This includes supporting devices with -different capabilities and support for different versions of the -OpenCL C language, in addition to supporting SYCL kernels written in {cpp}. +The OpenCL backend for SYCL ensures maximum compatibility between SYCL and +OpenCL kernels and API. +This includes supporting devices with different capabilities and support for +different versions of the OpenCL C language, in addition to supporting SYCL +kernels written in {cpp}. // Original from 3.6.11, Interfacing with OpenCL // https://cvs.khronos.org/bugzilla/show_bug.cgi?id=10426 <> classes which encapsulate an OpenCL opaque type such as -SYCL [code]#context# or SYCL [code]#queue# must provide an -interoperability constructor taking an instance of the OpenCL opaque type. +SYCL [code]#context# or SYCL [code]#queue# must provide an interoperability +constructor taking an instance of the OpenCL opaque type. When the OpenCL object supports reference counting, these constructors must retain that instance to increase the reference count of the OpenCL resource. -Likewise, the destructor for the <> classes which encapsulate a -reference counted OpenCL opaque type must release that instance to decrease the -reference count of the OpenCL resource. Since the OpenCL [code]#platform_id# -is not reference counted, the encapsulating SYCL [code]#platform# class neither -retains nor releases this OpenCL resource. +Likewise, the destructor for the <> classes which encapsulate +a reference counted OpenCL opaque type must release that instance to +decrease the reference count of the OpenCL resource. +Since the OpenCL [code]#platform_id# is not reference counted, the +encapsulating SYCL [code]#platform# class neither retains nor releases this +OpenCL resource. Note that an instance of a <> class which encapsulates an OpenCL opaque type can encapsulate any number of instances of the OpenCL type, unless it was constructed via the interoperability constructor, in which case it can encapsulate only a single instance of the OpenCL type. -The lifetime of a <> class that encapsulates an OpenCL -opaque type and the instance of that opaque type retrieved via the +The lifetime of a <> class that encapsulates an OpenCL opaque +type and the instance of that opaque type retrieved via the [code]#get_native()# free function are not tied in either direction given -correct usage of OpenCL reference counting. For example if a user were to -retrieve a [code]#cl_command_queue# instance from a SYCL -[code]#queue# instance and then immediately destroy the SYCL -[code]#queue# instance, the [code]#cl_command_queue# instance is -still valid. Or if a user were to construct a SYCL [code]#queue# -instance from a [code]#cl_command_queue# instance and then immediately -release the [code]#cl_command_queue# instance, the SYCL -[code]#queue# instance is still valid. +correct usage of OpenCL reference counting. +For example if a user were to retrieve a [code]#cl_command_queue# instance +from a SYCL [code]#queue# instance and then immediately destroy the SYCL +[code]#queue# instance, the [code]#cl_command_queue# instance is still +valid. +Or if a user were to construct a SYCL [code]#queue# instance from a +[code]#cl_command_queue# instance and then immediately release the +[code]#cl_command_queue# instance, the SYCL [code]#queue# instance is still +valid. Note that a <> class that encapsulates an OpenCL opaque type is not responsible for any incorrect use of OpenCL reference counting -outside of the <>. For example if a user were to retrieve a -[code]#cl_command_queue# instance from a SYCL [code]#queue# -instance and then release the [code]#cl_command_queue# instance more -than once without any prior retain then the SYCL [code]#queue# instance -that the [code]#cl_command_queue# instance was retrieved from is now -undefined. - -Note that an instance of the SYCL [code]#buffer# or SYCL -[code]#image# class templates constructed via the interoperability -constructor is free to copy from the [code]#cl_mem# into another memory -allocation within the <> to achieve normal SYCL semantics, -for as long as the SYCL [code]#buffer# or SYCL [code]#image# -instance is alive. - -<> relates SYCL objects -to their OpenCL native type in the SYCL application. +outside of the <>. +For example if a user were to retrieve a [code]#cl_command_queue# instance +from a SYCL [code]#queue# instance and then release the +[code]#cl_command_queue# instance more than once without any prior retain +then the SYCL [code]#queue# instance that the [code]#cl_command_queue# +instance was retrieved from is now undefined. + +Note that an instance of the SYCL [code]#buffer# or SYCL [code]#image# class +templates constructed via the interoperability constructor is free to copy +from the [code]#cl_mem# into another memory allocation within the +<> to achieve normal SYCL semantics, for as long as the SYCL +[code]#buffer# or SYCL [code]#image# instance is alive. + +<> relates SYCL objects to their OpenCL native type in +the SYCL application. [[table.opencl.interop]] .List of native types per SYCL object in the OpenCL backend @@ -406,7 +426,8 @@ unsampled_image The interoperability interface will return a list of active images in the SYCL runtime. |==== -Inside the SYCL kernel, the SYCL API offers interoperability with OpenCL device types. +Inside the SYCL kernel, the SYCL API offers interoperability with OpenCL +device types. <> describes the mapping of kernel types. [[table.opencl.kerneltypes]] @@ -434,27 +455,27 @@ multi_ptr::get_decorated() // From 3.7 memory object -When a buffer or image is allocated on more than -one OpenCL device, if these devices are on separate contexts then multiple -[code]#cl_mem# objects may be allocated for the memory object, depending on -whether the object has actively been used on these devices yet or not. +When a buffer or image is allocated on more than one OpenCL device, if these +devices are on separate contexts then multiple [code]#cl_mem# objects may be +allocated for the memory object, depending on whether the object has +actively been used on these devices yet or not. // From 3.10 Language restrictions in kernels The OpenCL C function qualifier [code]#+__kernel+# and the access -qualifiers: [code]#+__read_only+#, [code]#+__write_only+# and [code]#+__read_write+# -are not exposed in SYCL via keywords, but are instead encapsulated in -SYCL's parameter passing system inside accessors. Users wishing to -achieve the OpenCL equivalent of these qualifiers in SYCL should -instead use SYCL accessors with equivalent semantics. +qualifiers: [code]#+__read_only+#, [code]#+__write_only+# and +[code]#+__read_write+# are not exposed in SYCL via keywords, but are instead +encapsulated in SYCL's parameter passing system inside accessors. +Users wishing to achieve the OpenCL equivalent of these qualifiers in SYCL +should instead use SYCL accessors with equivalent semantics. // From 3.10.1 SYCL Linker -Any OpenCL C function included in a pre-built OpenCL library can be -defined as an [code]#extern "C"# function and the OpenCL program -has to be linked against any SYCL program that contains kernels using -the external function. In this case, the data types used have to comply with -the interoperability aliases defined in <>. +Any OpenCL C function included in a pre-built OpenCL library can be defined +as an [code]#extern "C"# function and the OpenCL program has to be linked +against any SYCL program that contains kernels using the external function. +In this case, the data types used have to comply with the interoperability +aliases defined in <>. == Programming interface @@ -465,8 +486,8 @@ The following section describes the OpenCL-specific API. The OpenCL backend provides the following specializations of the [code]#make_{sycl_class}# template functions which are defined in -<>. These functions are in the -[code]#sycl# namespace. +<>. +These functions are in the [code]#sycl# namespace. [width="100%",options="header",separator="@",cols="40%,60%"] |==== @@ -626,8 +647,9 @@ Throws an [code]#exception# with the [code]#errc::invalid# error code === Extension query Platforms and devices with an OpenCL backend may support extensions. -For convenience, the extensions supported by a platform or device can be queried -through the following functions provided in the [code]#sycl::opencl# namespace. +For convenience, the extensions supported by a platform or device can be +queried through the following functions provided in the [code]#sycl::opencl# +namespace. [width="100%",options="header",separator="@",cols="35%,65%"] |==== @@ -657,9 +679,10 @@ bool has_extension(const sycl::device& syclDevice, const std::string& extension) === Reference counting -Most OpenCL objects are reference counted. The SYCL general programming model -doesn't require that native objects are reference counted. However, for -convenience, the following function is provided in the +Most OpenCL objects are reference counted. +The SYCL general programming model doesn't require that native objects are +reference counted. +However, for convenience, the following function is provided in the [code]#sycl::opencl# namespace. [width="100%",options="header",separator="@",cols="35%,65%"] @@ -677,9 +700,10 @@ template cl_uint get_reference_count(openCLT obj) === Errors and limitations If there is an OpenCL error associated with an exception triggered, then the -OpenCL error code can be obtained by the free function [code]#cl_int sycl::opencl::get_error_code(sycl::exception&)#. In the case where there is -no OpenCL error associated with the exception triggered, the OpenCL error -code will be [code]#CL_SUCCESS#. +OpenCL error code can be obtained by the free function [code]#cl_int +sycl::opencl::get_error_code(sycl::exception&)#. +In the case where there is no OpenCL error associated with the exception +triggered, the OpenCL error code will be [code]#CL_SUCCESS#. // TODO: Errors and limitations @@ -703,18 +727,19 @@ code will be [code]#CL_SUCCESS#. [[sec:opencl:interop-kernel-bundle]] === Interoperability with kernel bundles -In <> any kernel function that is enqueued over an nd-range -is represented by a [code]#cl_kernel# and must be compiled and linked via a -[code]#cl_program# using [code]#clBuildProgram#, +In <> any kernel function that is enqueued over an +nd-range is represented by a [code]#cl_kernel# and must be compiled and +linked via a [code]#cl_program# using [code]#clBuildProgram#, [code]#clCompileProgram# and [code]#clLinkProgram#. -For OpenCL <> this detail is abstracted away by <> and -a [code]#kernel_bundle# object containing all <> -is retrieved by calling the free function [code]#get_kernel_bundle#. +For OpenCL <> this detail is abstracted away by <> and a [code]#kernel_bundle# object containing all +<> is retrieved by calling the +free function [code]#get_kernel_bundle#. The OpenCL <> specification provides additional free functions -which provide convenience functions for constructing kernel bundles -from OpenCL specific objects. +which provide convenience functions for constructing kernel bundles from +OpenCL specific objects. [source,,linenums] ---- @@ -728,24 +753,27 @@ kernel_bundle create_bundle(const context& ctxt, const std::vector& devs, const std::vector& clPrograms) ---- - . _Preconditions:_ The <> specified by [code]#ctxt# - must be associated with the OpenCL <>. + . _Preconditions:_ The <> specified by [code]#ctxt# must be + associated with the OpenCL <>. All devices in [code]#devs# must be associated with [code]#ctxt#. - All OpenCL programs in [code]#clPrograms# must be associated with [code]#ctxt#. + All OpenCL programs in [code]#clPrograms# must be associated with + [code]#ctxt#. + -- -_Effects:_ Constructs a <> in the specified [code]#bundle_state# -from the provided list of OpenCL programs and associated with the -<> specified by [code]#syclContext# by invoking the necessary OpenCL APIs. -Follows the same rules as calling [code]#make_kernel_bundle# on a single OpenCL program, -except that the rules apply to all OpenCL programs in [code]#clPrograms#. -Multiple programs will be linked together into a single one -if required by the requested [code]#State#. -The constructed [code]#kernel_bundle# will retain all provided OpenCL programs -and will also release them on destruction. - -_Throws:_ An [code]#exception# with the [code]#errc::build# error code if any error is produced -by invoking the OpenCL APIs. +_Effects:_ Constructs a <> in the specified +[code]#bundle_state# from the provided list of OpenCL programs and +associated with the <> specified by [code]#syclContext# by invoking +the necessary OpenCL APIs. +Follows the same rules as calling [code]#make_kernel_bundle# on a single +OpenCL program, except that the rules apply to all OpenCL programs in +[code]#clPrograms#. +Multiple programs will be linked together into a single one if required by +the requested [code]#State#. +The constructed [code]#kernel_bundle# will retain all provided OpenCL +programs and will also release them on destruction. + +_Throws:_ An [code]#exception# with the [code]#errc::build# error code if +any error is produced by invoking the OpenCL APIs. -- [source,,linenums] @@ -754,69 +782,73 @@ kernel_bundle create_bundle(const context& ctxt, const std::vector& devs, const std::vector& clKernels) ---- - . _Preconditions:_ The <> specified by [code]#ctxt# - must be associated with the OpenCL <>. + . _Preconditions:_ The <> specified by [code]#ctxt# must be + associated with the OpenCL <>. All devices in [code]#devs# must be associated with [code]#ctxt#. - All OpenCL kernels in [code]#clKernels# must be associated with [code]#ctxt#. + All OpenCL kernels in [code]#clKernels# must be associated with + [code]#ctxt#. + -- -_Effects:_ Constructs an executable <> -from the provided list of OpenCL kernels and associated with the -<> specified by [code]#syclContext# by invoking the necessary OpenCL APIs. -[code]#cl_kernel# objects might be associated with different [code]#cl_program# objects, -the kernel bundle will encapsulate all of them. - -_Throws:_ An [code]#exception# with the [code]#errc::build# error code if any error is produced -by invoking the OpenCL APIs. +_Effects:_ Constructs an executable <> from the provided list +of OpenCL kernels and associated with the <> specified by +[code]#syclContext# by invoking the necessary OpenCL APIs. +[code]#cl_kernel# objects might be associated with different +[code]#cl_program# objects, the kernel bundle will encapsulate all of them. + +_Throws:_ An [code]#exception# with the [code]#errc::build# error code if +any error is produced by invoking the OpenCL APIs. -- === Interoperability with kernels -A [code]#kernel_bundle# object contains one or multiple OpenCL programs -and one or multiple OpenCL kernels. -Calling [code]#kernel_bundle::get_kernel# returns a [code]#kernel# object +A [code]#kernel_bundle# object contains one or multiple OpenCL programs and + one or multiple OpenCL kernels. + Calling [code]#kernel_bundle::get_kernel# returns a [code]#kernel# object which can be invoked by any of -<> such as [code]#parallel_for# which take -a [code]#kernel# but not <>. +<> such as +[code]#parallel_for# which take a [code]#kernel# but not +<>. Calling [code]#make_kernel# must trigger a call to [code]#clRetainKernel# -and the resulting [code]#kernel# object must call -[code]#clReleaseKernel# on destruction. +and the resulting [code]#kernel# object must call [code]#clReleaseKernel# on +destruction. -It is also possible to construct a <> from previously created OpenCL -[code]#cl_kernel# objects by calling the free function [code]#create_bundle# -as described in <>. +It is also possible to construct a <> from previously created +OpenCL [code]#cl_kernel# objects by calling the free function +[code]#create_bundle# as described in <>. -The kernel arguments for the OpenCL C kernel kernel can either be set prior to -creating the [code]#kernel# object or by calling [code]#set_arg# or [code]#set_args# -member functions of the [code]#handler# class. +The kernel arguments for the OpenCL C kernel kernel can either be set prior +to creating the [code]#kernel# object or by calling [code]#set_arg# or +[code]#set_args# member functions of the [code]#handler# class. If kernel arguments are set prior to creating the [code]#kernel# object the -<> is not responsible for managing the data of these arguments. +<> is not responsible for managing the data of these +arguments. [[sec:opencl:kernel-conventions-sycl]] === OpenCL kernel conventions and SYCL -OpenCL and SYCL use opposite conventions for the unit stride dimension. SYCL -aligns with {cpp} conventions, which is important to understand from a -performance perspective when porting code to SYCL. The unit stride -dimension, at least for data, is implicit in the linearization equations in -SYCL (<>) and OpenCL. SYCL aligns with -{cpp} array subscript ordering [code]#arr[a][b][c]#, in that range -constructor dimension ordering used to launch a kernel (e.g. -[code]#range<3> R{a,b,c}#) and range and ID queries within a kernel, -are ordered in the same way as the {cpp} multi-dimensional subscript operators +OpenCL and SYCL use opposite conventions for the unit stride dimension. +SYCL aligns with {cpp} conventions, which is important to understand from a +performance perspective when porting code to SYCL. +The unit stride dimension, at least for data, is implicit in the +linearization equations in SYCL (<>) and +OpenCL. +SYCL aligns with {cpp} array subscript ordering [code]#arr[a][b][c]#, in +that range constructor dimension ordering used to launch a kernel (e.g. +[code]#range<3> R{a,b,c}#) and range and ID queries within a kernel, are +ordered in the same way as the {cpp} multi-dimensional subscript operators (unit stride on the right). -When specifying a [code]#range# as the global or local size -in a [code]#parallel_for# that invokes an OpenCL interop kernel (through -[code]#cl_kernel# interop), -the highest dimension of the range in SYCL will map to the -lowest dimension within the OpenCL kernel. That statement applies to both -an underlying enqueue operation such as [code]#clEnqueueNDRangeKernel# -in OpenCL, and also ID and size queries within the OpenCL kernel. +When specifying a [code]#range# as the global or local size in a +[code]#parallel_for# that invokes an OpenCL interop kernel (through +[code]#cl_kernel# interop), the highest dimension of the range in SYCL will +map to the lowest dimension within the OpenCL kernel. +That statement applies to both an underlying enqueue operation such as +[code]#clEnqueueNDRangeKernel# in OpenCL, and also ID and size queries +within the OpenCL kernel. For example, a 3D global range specified in SYCL as: [source] @@ -847,17 +879,19 @@ of: size_t cl_interop_range[2] = { r1, r0 }; ---- -The mapping of highest dimension in SYCL to lowest dimension in OpenCL applies to all -operations where a multi-dimensional construct must be mapped, such as when mapping SYCL -explicit memory operations to OpenCL APIs like [code]#clEnqueueCopyBufferRect#. +The mapping of highest dimension in SYCL to lowest dimension in OpenCL +applies to all operations where a multi-dimensional construct must be +mapped, such as when mapping SYCL explicit memory operations to OpenCL APIs +like [code]#clEnqueueCopyBufferRect#. Work-item and work-group ID and range queries have the same reversed -convention for unit stride dimension between SYCL and OpenCL. For example, -with three, two, or one dimensional SYCL global ranges, OpenCL and SYCL -kernel code queries relate to the range as shown in -<>. The "SYCL kernel query" column -applies for SYCL-defined kernels, and the "OpenCL kernel query" column -applies for kernels defined through OpenCL interop. +convention for unit stride dimension between SYCL and OpenCL. +For example, with three, two, or one dimensional SYCL global ranges, OpenCL +and SYCL kernel code queries relate to the range as shown in +<>. +The "SYCL kernel query" column applies for SYCL-defined kernels, and the +"OpenCL kernel query" column applies for kernels defined through OpenCL +interop. // Jon: Need to code-format most of these cells and use gray backgrounds on // column-spanning sub-titles. @@ -915,12 +949,13 @@ applies for kernels defined through OpenCL interop. === Data types -The OpenCL C language standard <> defines its own built-in -scalar data types, and these have additional requirements in terms of size and -signedness on top of what is guaranteed by ISO {cpp}. For the purpose of -interoperability and portability, SYCL defines a set of aliases to {cpp} types -within the [code]#sycl::opencl# namespace using the [code]#cl_# -prefix. These aliases are described in <>. +The OpenCL C language standard <> defines its own +built-in scalar data types, and these have additional requirements in terms +of size and signedness on top of what is guaranteed by ISO {cpp}. +For the purpose of interoperability and portability, SYCL defines a set of +aliases to {cpp} types within the [code]#sycl::opencl# namespace using the +[code]#cl_# prefix. +These aliases are described in <>. [[table.types.aliases]] @@ -1025,57 +1060,57 @@ cl_half == Preprocessor directives and macros - * [code]#SYCL_BACKEND_OPENCL# substitutes to [code]#1# if the OpenCL <> - is active while building the SYCL application. + * [code]#SYCL_BACKEND_OPENCL# substitutes to [code]#1# if the OpenCL + <> is active while building the SYCL application. === Offline linking with OpenCL C libraries -SYCL supports linking <> with OpenCL C libraries -during offline compilation or during online compilation by the -<> within a SYCL application. +SYCL supports linking <> with +OpenCL C libraries during offline compilation or during online compilation +by the <> within a SYCL application. -Linking with OpenCL C kernel functions offline is an optional feature -and is unspecified. Linking with OpenCL C kernel functions online is -performed by using the SYCL [code]#kernel_bundle# class to compile and -link an OpenCL C source; using the [code]#compile_with_source# or -[code]#build_with_source# member functions. +Linking with OpenCL C kernel functions offline is an optional feature and is +unspecified. +Linking with OpenCL C kernel functions online is performed by using the SYCL +[code]#kernel_bundle# class to compile and link an OpenCL C source; using +the [code]#compile_with_source# or [code]#build_with_source# member +functions. OpenCL C functions that are linked with, using either offline or online -compilation, must be declared as [code]#extern "C"# function -declarations. The function parameters of these function declarations must be -defined as the OpenCL C interoperability aliases; [code]#pointer# of -the [code]#multi_ptr# class template, [code]#vector_t# of the -[code]#vec# class template and scalar data type aliases described in -<>. +compilation, must be declared as [code]#extern "C"# function declarations. +The function parameters of these function declarations must be defined as +the OpenCL C interoperability aliases; [code]#pointer# of the +[code]#multi_ptr# class template, [code]#vector_t# of the [code]#vec# class +template and scalar data type aliases described in <>. // \include{opencl_extensions} // %%%%%%%%%%%%%%%%%%%%%%%%%%%% begin opencl_extensions %%%%%%%%%%%%%%%%%%%%%%%%%%%% == SYCL support of non-core OpenCL features -In addition to the OpenCL core features, SYCL also provides support for OpenCL -extensions which provide features in OpenCL via khr extensions. +In addition to the OpenCL core features, SYCL also provides support for +OpenCL extensions which provide features in OpenCL via khr extensions. -Some extensions are natively supported within the SYCL interface, however some -can only be used via the OpenCL interoperability interface. The SYCL interface -required for native extensions must be available. However if the respective -extension is not supported by the executing SYCL [code]#device#, the -<> must throw an [code]#exception# with the -[code]#errc::feature_not_supported# or [code]#errc::kernel_not_supported# error -codes. +Some extensions are natively supported within the SYCL interface, however +some can only be used via the OpenCL interoperability interface. +The SYCL interface required for native extensions must be available. +However if the respective extension is not supported by the executing SYCL +[code]#device#, the <> must throw an [code]#exception# with +the [code]#errc::feature_not_supported# or +[code]#errc::kernel_not_supported# error codes. -The OpenCL backend exposes some khr extensions to SYCL applications through the -[code]#sycl::aspect# enumerated type. Therefore, applications can query -for the existence of these khr extensions by calling the [code]#device::has()# -or [code]#platform::has()# member functions. +The OpenCL backend exposes some khr extensions to SYCL applications through +the [code]#sycl::aspect# enumerated type. +Therefore, applications can query for the existence of these khr extensions +by calling the [code]#device::has()# or [code]#platform::has()# member +functions. All OpenCL extensions are available through the OpenCL interoperability interface, but some can also be used through core SYCL APIs. <> shows which these are. -<> also shows the mapping from each OpenCL -extension name to its associated SYCL device [code]#aspect# when one is -available. +<> also shows the mapping from each OpenCL extension +name to its associated SYCL device [code]#aspect# when one is available. [[table.extensionsupport]] @@ -1100,72 +1135,76 @@ available. === Half precision floating-point The half scalar data type: [code]#half# and the half vector data types: -[code]#half1#, [code]#half2#, [code]#half3#, -[code]#half4#, [code]#half8# and [code]#half16# must be -available at compile-time. However a kernel using these types is only -supported on devices that have [code]#aspect::fp16#, as described in -<>. +[code]#half1#, [code]#half2#, [code]#half3#, [code]#half4#, [code]#half8# +and [code]#half16# must be available at compile-time. +However a kernel using these types is only supported on devices that have +[code]#aspect::fp16#, as described in <>. The conversion rules for half precision types follow the same rules as in -the OpenCL 1.2 extensions specification <>. +the OpenCL 1.2 extensions specification <>. The math functions for half precision types follow the same rules as in the -OpenCL 1.2 extensions specification <>. The allowed error in ULP(Unit in the Last Place) is -less than 8192, corresponding to <>. +OpenCL 1.2 extensions specification <>. +The allowed error in ULP(Unit in the Last Place) is less than 8192, +corresponding to <>. === Writing to 3D image memory objects -The [code]#unsampled_image_accessor# class -in SYCL supports member functions for writing -3D image memory objects, but this functionality is only allowed on a device -if the extension [code]#cl_khr_3d_image_writes# is -supported on that <>. +The [code]#unsampled_image_accessor# class in SYCL supports member functions +for writing 3D image memory objects, but this functionality is only allowed +on a device if the extension [code]#cl_khr_3d_image_writes# is supported on +that <>. // TODO: Should opencl::aspect::3d_image_writes be promoted to a core SYCL aspect? === Interoperability with OpenGL -Interoperability between SYCL and OpenGL is not directly provided by the SYCL interface, -however can be achieved via the SYCL OpenCL interoperability interface. +Interoperability between SYCL and OpenGL is not directly provided by the +SYCL interface, however can be achieved via the SYCL OpenCL interoperability +interface. == Correspondence of some OpenCL features to SYCL This section describes the correspondence between some OpenCL features and -features in the <> that provide similar functionality. All content -in this section is non-normative. +features in the <> that provide similar functionality. +All content in this section is non-normative. === Work-item functions -The OpenCL 1.2 specification document <> -defines work-item functions that tell various information about the currently -executing work-item in an OpenCL kernel. SYCL provides equivalent -functionality through the item and group classes that are defined in -<>, <> and <>. +The OpenCL 1.2 specification document <> defines work-item functions that tell various +information about the currently executing work-item in an OpenCL kernel. +SYCL provides equivalent functionality through the item and group classes +that are defined in <>, <> and +<>. === Vector data load and store functions The functionality from the OpenCL functions as defined in the OpenCL 1.2 -specification document <> is available in SYCL through -the [code]#vec# class in <>. +specification document <> is available in SYCL through the [code]#vec# class in +<>. === Synchronization functions -In SYCL the OpenCL [keyword]#synchronization functions# are available through -the [code]#nd_item# class (<>), as they are applied to -work-items for local or global address spaces. Please -see <>. +In SYCL the OpenCL [keyword]#synchronization functions# are available +through the [code]#nd_item# class (<>), as they are applied to +work-items for local or global address spaces. +Please see <>. === [code]#printf# function The functionality of the [code]#printf# function is covered by the -[code]#stream# class (<>), which has the -capability to print to standard output all of the SYCL classes and primitives, -and covers the capabilities defined in the OpenCL 1.2 specification -document <>. +[code]#stream# class (<>), which has the capability to print +to standard output all of the SYCL classes and primitives, and covers the +capabilities defined in the OpenCL 1.2 specification document <>. // %%%%%%%%%%%%%%%%%%%%%%%%%%%% end opencl_extensions %%%%%%%%%%%%%%%%%%%%%%%%%%%% diff --git a/adoc/chapters/programming_interface.adoc b/adoc/chapters/programming_interface.adoc index b2caa55d..da88165b 100644 --- a/adoc/chapters/programming_interface.adoc +++ b/adoc/chapters/programming_interface.adoc @@ -2,56 +2,60 @@ = SYCL programming interface The SYCL programming interface provides a common abstracted feature set to -one or more <> APIs. This section describes the {cpp} library -interface to the <> which executes across those <>. +one or more <> APIs. +This section describes the {cpp} library interface to the <> +which executes across those <>. The entirety of the SYCL interface defined in this section is required to be -available for any <>, with the exception of the interoperability -interface, which is described in general terms in this document, not -pertaining to any particular <>. +available for any <>, with the exception of the +interoperability interface, which is described in general terms in this +document, not pertaining to any particular <>. -SYCL guarantees that all the member functions and special member functions of -the SYCL classes described are thread safe. +SYCL guarantees that all the member functions and special member functions +of the SYCL classes described are thread safe. The underlying types for all enumerations defined in this specification are -implementation-defined. In addition, all enumerators within an enumeration -have some implementation-defined unique value unless the specification -specifically indicates a values for the enumerator. +implementation-defined. +In addition, all enumerators within an enumeration have some +implementation-defined unique value unless the specification specifically +indicates a values for the enumerator. [[sec:backends]] == Backends -The <> that can be supported by a SYCL implementation are identified -using the [code]#enum class backend#. +The <> that can be supported by a SYCL +implementation are identified using the [code]#enum class backend#. [source,,linenums] ---- include::{header_dir}/backends.h[lines=4..-1] ---- -The [code]#enum class backend# is implementation-defined and must be populated -with a unique identifier for each <> that the SYCL implementation can -support. Note that the <> listed in the [code]#enum -class backend# are not guaranteed to be available in a given installation. +The [code]#enum class backend# is implementation-defined and must be +populated with a unique identifier for each <> that the SYCL +implementation can support. +Note that the <> listed in the [code]#enum class +backend# are not guaranteed to be available in a given installation. -Each named <> enumerated in the [code]#enum class backend# -must be associated with a <> specification. -Many sections of this specification -will refer to the associated <> specification. +Each named <> enumerated in the [code]#enum class backend# must be +associated with a <> specification. +Many sections of this specification will refer to the associated <> +specification. [[sec:backend-macros]] === Backend macros As the identifiers defined in [code]#enum class backend# are -implementation-defined, and the associated backends not guaranteed to be available, -a SYCL implementation must also define a preprocessor macro for each of -these identifiers. If the <> is defined by the Khronos SYCL group, the -name of the macro has the form [code]#SYCL_BACKEND_#, where -_backend_name_ is the associated identifier from [code]#backend# in -all upper-case. See <> for the name of the macro -if the vendor defines the <> outside of the Khronos SYCL group. +implementation-defined, and the associated backends not guaranteed to be +available, a SYCL implementation must also define a preprocessor macro for +each of these identifiers. +If the <> is defined by the Khronos SYCL group, the name of the +macro has the form [code]#SYCL_BACKEND_#, where _backend_name_ +is the associated identifier from [code]#backend# in all upper-case. +See <> for the name of the macro if the vendor defines +the <> outside of the Khronos SYCL group. If a backend listed in the [code]#enum class backend# is not available, the associated macro must be left undefined. @@ -60,57 +64,60 @@ associated macro must be left undefined. == Generic vs non-generic SYCL The SYCL programming API is split into two categories; generic SYCL and -non-generic SYCL. Almost everything in the SYCL programming API is considered -generic SYCL. However any usage of the [code]#enum class backend# is -considered non-generic SYCL and should only be used for <> specialized -code paths, as the identifiers defined in [code]#backend# are +non-generic SYCL. +Almost everything in the SYCL programming API is considered generic SYCL. +However any usage of the [code]#enum class backend# is considered +non-generic SYCL and should only be used for <> specialized code +paths, as the identifiers defined in [code]#backend# are implementation-defined. In any non-generic SYCL application code where the [code]#backend# enum class is used, the expression must be guarded with a preprocessor -[code]#{hash}ifdef# guard using the associated preprocessor macro to ensure that -the SYCL application will compile even if the SYCL implementation does not -support that <> being specialized for. +[code]#{hash}ifdef# guard using the associated preprocessor macro to ensure +that the SYCL application will compile even if the SYCL implementation does +not support that <> being specialized for. [[sec:headers-and-namespaces]] == Header files and namespaces -SYCL provides one standard header file: [code]##, which needs to -be included in every translation unit that uses the SYCL programming API. +SYCL provides one standard header file: [code]##, which needs +to be included in every translation unit that uses the SYCL programming API. All SYCL classes, constants, types and functions defined by this specification should exist within the [code]#::sycl# namespace. -For compatibility with SYCL 1.2.1, SYCL provides another standard -header file: [code]##, which can be included in -place of [code]##. In that case, all SYCL classes, constants, -types and functions defined by this specification should exist within the -[code]#::cl::sycl# {cpp} namespace. +For compatibility with SYCL 1.2.1, SYCL provides another standard header +file: [code]##, which can be included in place of +[code]##. +In that case, all SYCL classes, constants, types and functions defined by +this specification should exist within the [code]#::cl::sycl# {cpp} +namespace. For consistency, the programming API will only refer to the [code]## header and the [code]#::sycl# namespace, but this should be considered synonymous with the SYCL 1.2.1 header and namespace. -Include paths starting with [code]#"sycl/ext/"# and [code]#"sycl/backend/"# are -reserved for extensions to SYCL and for backend interop headers respectively. -Other include paths starting with [code]#"sycl/"# and the [code]#sycl::detail# -namespace are reserved for implementation details. +Include paths starting with [code]#"sycl/ext/"# and [code]#"sycl/backend/"# +are reserved for extensions to SYCL and for backend interop headers +respectively. +Other include paths starting with [code]#"sycl/"# and the +[code]#sycl::detail# namespace are reserved for implementation details. -When a <> is defined by the Khronos SYCL group, functionality -for that <> is available via the header +When a <> is defined by the Khronos SYCL group, functionality for +that <> is available via the header [code]#"sycl/backend/.hpp"#, and all <>-specific -functionality is made available in the namespace [code]#sycl::# -where [code]## is the name of the <> as defined in the -<> specification. +functionality is made available in the namespace +[code]#sycl::# where [code]## is the name of the +<> as defined in the <> specification. -<> defines the allowable header files and -namespaces for any extensions that a vendor may provide, including any -<> that the vendor may define outside of the Khronos SYCL group. +<> defines the allowable header files and namespaces for +any extensions that a vendor may provide, including any <> that the +vendor may define outside of the Khronos SYCL group. -Unless otherwise specified, the behavior of a SYCL program is undefined -if it adds any entity to namespace [code]#sycl# or to a -namespace within namespace [code]#sycl#. +Unless otherwise specified, the behavior of a SYCL program is undefined if +it adds any entity to namespace [code]#sycl# or to a namespace within +namespace [code]#sycl#. == Class availability @@ -119,60 +126,29 @@ In SYCL some <> classes are available to the SYCL application, some are available within a <> and some are available on both and can be passed as arguments to a <>. -Each of the following <> classes: -[code]#buffer#, -[code]#buffer_allocator#, -[code]#context#, -[code]#device#, -[code]#device_image#, -[code]#event#, -[code]#exception#, -[code]#handler#, -[code]#host_accessor#, -[code]#host_sampled_image_accessor#, -[code]#host_unsampled_image_accessor#, -[code]#id#, -[code]#image_allocator#, -[code]#kernel#, -[code]#kernel_id#, -[code]#marray#, -[code]#kernel_bundle#, -[code]#nd_range#, -[code]#platform#, -[code]#queue#, -[code]#range#, -[code]#sampled_image#, -[code]#image_sampler#, -[code]#stream#, -[code]#unsampled_image# and -[code]#vec# -must be available to the host application. - -Each of the following <> classes: -[code]#accessor#, -[code]#atomic_ref#, -[code]#device_event#, -[code]#group#, -[code]#h_item#, -[code]#id#, -[code]#item#, -[code]#local_accessor#, -[code]#marray#, -[code]#multi_ptr#, -[code]#nd_item#, -[code]#range#, -[code]#reducer#, -[code]#sampled_image_accessor#, -[code]#stream#, -[code]#sub_group#, -[code]#unsampled_image_accessor# and -[code]#vec# -must be available within a <>. +Each of the following <> classes: [code]#buffer#, +[code]#buffer_allocator#, [code]#context#, [code]#device#, +[code]#device_image#, [code]#event#, [code]#exception#, [code]#handler#, +[code]#host_accessor#, [code]#host_sampled_image_accessor#, +[code]#host_unsampled_image_accessor#, [code]#id#, [code]#image_allocator#, +[code]#kernel#, [code]#kernel_id#, [code]#marray#, [code]#kernel_bundle#, +[code]#nd_range#, [code]#platform#, [code]#queue#, [code]#range#, +[code]#sampled_image#, [code]#image_sampler#, [code]#stream#, +[code]#unsampled_image# and [code]#vec# must be available to the host +application. + +Each of the following <> classes: [code]#accessor#, +[code]#atomic_ref#, [code]#device_event#, [code]#group#, [code]#h_item#, +[code]#id#, [code]#item#, [code]#local_accessor#, [code]#marray#, +[code]#multi_ptr#, [code]#nd_item#, [code]#range#, [code]#reducer#, +[code]#sampled_image_accessor#, [code]#stream#, [code]#sub_group#, +[code]#unsampled_image_accessor# and [code]#vec# must be available within a +<>. == Common interface -When a dimension template parameter is used in SYCL classes, it is -defaulted as 1 in most cases. +When a dimension template parameter is used in SYCL classes, it is defaulted +as 1 in most cases. [[sec:backend-interoperability]] @@ -180,56 +156,43 @@ defaulted as 1 in most cases. Many of the <> classes may be implemented such that they encapsulate an object unique to the <> that underpins the -functionality of that class. Where appropriate, these classes may provide an -interface for interoperating between the <> object and the -<> in order to support interoperability within an -application between SYCL and the associated <>. +functionality of that class. +Where appropriate, these classes may provide an interface for interoperating +between the <> object and the <> in +order to support interoperability within an application between SYCL and the +associated <>. There are three forms of interoperability with <> classes: interoperability on the <> with the <>, -interoperability within a <> with the equivalent kernel -language types of the <>, and interoperability within a <> -with the [code]#interop_handle#. +interoperability within a <> with the equivalent +kernel language types of the <>, and interoperability within a +<> with the [code]#interop_handle#. <> interoperability, <> interoperability and <> interoperability -are provided via different interfaces and may have different behavior for the -same SYCL object. +are provided via different interfaces and may have different behavior for +the same SYCL object. <> interoperability may be provided for -[code]#buffer#, -[code]#context#, -[code]#device#, -[code]#device_image#, -[code]#event#, -[code]#kernel#, -[code]#kernel_bundle#, -[code]#platform#, -[code]#queue#, -[code]#sampled_image#, and -[code]#unsampled_image#. - -<> interoperability may be provided for -[code]#accessor#, -[code]#device_event#, -[code]#local_accessor#, -[code]#sampled_image_accessor#, -[code]#stream# and -[code]#unsampled_image_accessor# -inside <> only and is not available outside of that scope. - -<> interoperability may be provided for -[code]#accessor#, -[code]#sampled_image_accessor#, -[code]#unsampled_image_accessor#, -[code]#queue#, -[code]#device#, -[code]#context# -inside the scope of a <> only, see <>. - -Support for <> interoperability is optional and therefore not required -to be provided by a SYCL implementation. A SYCL application using <> -interoperability is considered to be non-generic SYCL. +[code]#buffer#, [code]#context#, [code]#device#, [code]#device_image#, +[code]#event#, [code]#kernel#, [code]#kernel_bundle#, [code]#platform#, +[code]#queue#, [code]#sampled_image#, and [code]#unsampled_image#. + +<> interoperability may be +provided for [code]#accessor#, [code]#device_event#, [code]#local_accessor#, +[code]#sampled_image_accessor#, [code]#stream# and +[code]#unsampled_image_accessor# inside <> only and is not +available outside of that scope. + +<> interoperability may be provided for [code]#accessor#, +[code]#sampled_image_accessor#, [code]#unsampled_image_accessor#, +[code]#queue#, [code]#device#, [code]#context# inside the scope of a +<> only, see <>. + +Support for <> interoperability is optional and therefore not +required to be provided by a SYCL implementation. +A SYCL application using <> interoperability is considered to be +non-generic SYCL. Details on the interoperability for a given <> are available on the <> specification document for that <>. @@ -245,41 +208,41 @@ A series of type traits are provided for <> interoperability, defined in the [code]#backend_traits# class. A specialization of [code]#backend_traits# must be provided for each named -<> enumerated in the enum class [code]#backend# that is available at -compile time. +<> enumerated in the enum class [code]#backend# that is available +at compile time. * For each <> class [code]#T# which supports <> interoperability with the <>, a - specialization of [code]#input_type# must be defined as the type - of <> interoperability <> + specialization of [code]#input_type# must be defined as the type of + <> interoperability <> associated with [code]#T# for the <>, specified in the <> specification. - [code]#input_type# is used when constructing SYCL objects - from backend specific native objects. + [code]#input_type# is used when constructing SYCL objects from backend + specific native objects. See the relevant backend specification for details. * For each <> class [code]#T# which supports <> interoperability with the <>, a - specialization of [code]#return_type# must be defined as the type - of <> interoperability <> + specialization of [code]#return_type# must be defined as the type of + <> interoperability <> associated with [code]#T# for the <>, specified in the <> specification. - [code]#return_type# is used when retrieving - the backend specific native object from a SYCL object. + [code]#return_type# is used when retrieving the backend specific native + object from a SYCL object. See the relevant backend specification for details. - * For each <> class [code]#T# which supports kernel - function interoperability with the <>, a specialization of - [code]#return_type# within [code]#backend_traits# must be - defined as the type of the kernel function interoperability - <> associated with [code]#T# for the - <>, specified in the backend specification. + * For each <> class [code]#T# which supports kernel function + interoperability with the <>, a specialization of + [code]#return_type# within [code]#backend_traits# must be defined as the + type of the kernel function interoperability <> + associated with [code]#T# for the <>, specified in the backend + specification. See the relevant backend specification for details. -The type alias [code]#backend_input_t# is provided -to enable less verbose access to the [code]#input_type# type -within [code]#backend_traits# for a specific SYCL object of type [code]#T#. -The type alias [code]#backend_return_t# is provided -to enable less verbose access to the [code]#return_type# type -within [code]#backend_traits# for a specific SYCL object of type [code]#T#. +The type alias [code]#backend_input_t# is provided to enable less verbose +access to the [code]#input_type# type within [code]#backend_traits# for a +specific SYCL object of type [code]#T#. +The type alias [code]#backend_return_t# is provided to enable less verbose +access to the [code]#return_type# type within [code]#backend_traits# for a +specific SYCL object of type [code]#T#. ==== Template function [code]#get_native# @@ -290,25 +253,23 @@ include::{header_dir}/interop/templateFunctionGetNative.h[lines=4..-1] For each <> class [code]#T# which supports <> interoperability, a specialization of -[code]#get_native# must be defined, which takes an instance of -[code]#T# and returns a <> interoperability -<> associated with [code]#syclObject# which -can be used for <> interoperability. The lifetime of the -object returned are backend-defined and specified in the backend -specification. - -For each <> class [code]#T# which supports kernel -function interoperability, a specialization of [code]#get_native# must -be defined, which takes an instance of [code]#T# and returns the kernel -function interoperability <> associated with -[code]#syclObject# which can be used for kernel function -interoperability. The availability and behavior of these template -functions is defined by the <> specification document. - -The [code]#get_native# function -must throw an [code]#exception# with the -[code]#errc::backend_mismatch# error code -if the backend of the SYCL object +[code]#get_native# must be defined, which takes an instance of [code]#T# and +returns a <> interoperability <> +associated with [code]#syclObject# which can be used for +<> interoperability. +The lifetime of the object returned are backend-defined and specified in the +backend specification. + +For each <> class [code]#T# which supports kernel function +interoperability, a specialization of [code]#get_native# must be defined, +which takes an instance of [code]#T# and returns the kernel function +interoperability <> associated with +[code]#syclObject# which can be used for kernel function interoperability. +The availability and behavior of these template functions is defined by the +<> specification document. + +The [code]#get_native# function must throw an [code]#exception# with the +[code]#errc::backend_mismatch# error code if the backend of the SYCL object doesn't match the target backend. [[sec:backend-interoperability-make]] @@ -321,102 +282,90 @@ include::{header_dir}/interop/templateFunctionMakeX.h[lines=4..-1] For each <> class [code]#T# which supports <> interoperability, a specialization of the appropriate -template function [code]#make_{sycl_class}# where -[code]#{sycl_class}# is the class name of [code]#T#, must be -defined, which takes a <> interoperability -<> and constructs and returns an instance of -[code]#T#. The availability and behavior of these template -functions is defined by the <> specification document. - -Overloads of the [code]#make_{sycl_class}# function -which take a SYCL <> object as an argument -must throw an [code]#exception# with the -[code]#errc::backend_mismatch# error code -if the backend of the provided SYCL context -doesn't match the target backend. +template function [code]#make_{sycl_class}# where [code]#{sycl_class}# is +the class name of [code]#T#, must be defined, which takes a +<> interoperability <> and +constructs and returns an instance of [code]#T#. +The availability and behavior of these template functions is defined by the +<> specification document. + +Overloads of the [code]#make_{sycl_class}# function which take a SYCL +<> object as an argument must throw an [code]#exception# with the +[code]#errc::backend_mismatch# error code if the backend of the provided +SYCL context doesn't match the target backend. [[sec:reference-semantics]] === Common reference semantics -Each of the following <> classes: -[code]#accessor#, -[code]#buffer#, -[code]#context#, -[code]#device#, -[code]#device_image#, -[code]#event#, -[code]#host_accessor#, -[code]#host_sampled_image_accessor#, -[code]#host_unsampled_image_accessor#, -[code]#kernel#, -[code]#kernel_id#, -[code]#kernel_bundle#, -[code]#local_accessor#, -[code]#platform#, -[code]#queue#, -[code]#sampled_image#, -[code]#sampled_image_accessor#, -[code]#stream#, -[code]#unsampled_image# and -[code]#unsampled_image_accessor# -must obey the following statements, where [code]#T# is the runtime class type: - - * [code]#T# must be copy constructible and copy assignable on the - host application and within SYCL kernel functions in the case that - [code]#T# is a valid kernel argument. Any instance of - [code]#T# that is constructed as a copy of another instance, via - either the copy constructor or copy assignment operator, must behave - as-if it were the original instance and as-if any action performed on it - were also performed on the original instance and must represent the same - underlying <> as the original instance where - applicable. - * [code]#T# must be destructible on the host application and within - SYCL kernel functions in the case that [code]#T# is a valid kernel - argument. When any instance of [code]#T# is destroyed, including as - a result of the copy assignment operator, any behavior specific to - [code]#T# that is specified as performed on destruction is only - performed if this instance is the last remaining host copy, in - accordance with the above definition of a copy. - * [code]#T# must be move constructible and move assignable on the - host application and within SYCL kernel functions in the case that T is - a valid kernel argument. Any instance of T that is constructed as a move - of another instance, via either the move constructor or move assignment - operator, must replace the original instance rendering said instance - invalid and must represent the same underlying <> as - the original instance where applicable. +Each of the following <> classes: [code]#accessor#, +[code]#buffer#, [code]#context#, [code]#device#, [code]#device_image#, +[code]#event#, [code]#host_accessor#, [code]#host_sampled_image_accessor#, +[code]#host_unsampled_image_accessor#, [code]#kernel#, [code]#kernel_id#, +[code]#kernel_bundle#, [code]#local_accessor#, [code]#platform#, +[code]#queue#, [code]#sampled_image#, [code]#sampled_image_accessor#, +[code]#stream#, [code]#unsampled_image# and [code]#unsampled_image_accessor# +must obey the following statements, where [code]#T# is the runtime class +type: + + * [code]#T# must be copy constructible and copy assignable on the host + application and within SYCL kernel functions in the case that [code]#T# + is a valid kernel argument. + Any instance of [code]#T# that is constructed as a copy of another + instance, via either the copy constructor or copy assignment operator, + must behave as-if it were the original instance and as-if any action + performed on it were also performed on the original instance and must + represent the same underlying <> as the original + instance where applicable. + * [code]#T# must be destructible on the host application and within SYCL + kernel functions in the case that [code]#T# is a valid kernel argument. + When any instance of [code]#T# is destroyed, including as a result of + the copy assignment operator, any behavior specific to [code]#T# that is + specified as performed on destruction is only performed if this instance + is the last remaining host copy, in accordance with the above definition + of a copy. + * [code]#T# must be move constructible and move assignable on the host + application and within SYCL kernel functions in the case that T is a + valid kernel argument. + Any instance of T that is constructed as a move of another instance, via + either the move constructor or move assignment operator, must replace + the original instance rendering said instance invalid and must represent + the same underlying <> as the original instance + where applicable. * [code]#T# must be equality comparable on the host application. - Equality between two instances of [code]#T# (i.e. [code]#a == b#) must be true if one instance is a copy of the other and non-equality - between two instances of [code]#T# (i.e. [code]#a != b#) must - be true if neither instance is a copy of the other, in accordance with - the above definition of a copy, unless either instance has become - invalidated by a move operation. By extension of the requirements above, - equality on [code]#T# must guarantee to be reflexive (i.e. [code]#a == a#), - symmetric (i.e. [code]#a == b# implies [code]#b == a# and [code]#a != b# - implies [code]#b != a#) and transitive (i.e. [code]#a == b && b == c# - implies [code]#c == a#). - * A specialization of [code]#std::hash# for [code]#T# must exist - on the host application that returns a unique value such that if two - instances of T are equal, in accordance with the above definition, then - their resulting hash values are also equal and subsequently if two hash - values are not equal, then their corresponding instances are also not - equal, in accordance with the above definition. - -Some <> classes will have additional behavior associated -with copy, movement, assignment or destruction semantics. If these are -specified they are in addition to those specified above unless stated -otherwise. - -Each of the runtime classes mentioned above must provide a common -interface of special member functions in order to fulfill the copy, -move, destruction requirements and hidden friend functions in order to -fulfill the equality requirements. - -A hidden friend function is a function first declared via a -[code]#friend# declaration with no additional out of class or namespace -scope declarations. Hidden friend functions are only visible to ADL -(Argument Dependent Lookup) and are hidden from qualified and unqualified -lookup. Hidden friend functions have the benefits of avoiding accidental -implicit conversions and faster compilation. + Equality between two instances of [code]#T# (i.e. [code]#a == b#) must + be true if one instance is a copy of the other and non-equality between + two instances of [code]#T# (i.e. [code]#a != b#) must be true if neither + instance is a copy of the other, in accordance with the above definition + of a copy, unless either instance has become invalidated by a move + operation. + By extension of the requirements above, equality on [code]#T# must + guarantee to be reflexive (i.e. [code]#a == a#), symmetric (i.e. + [code]#a == b# implies [code]#b == a# and [code]#a != b# implies + [code]#b != a#) and transitive (i.e. [code]#a == b && b == c# implies + [code]#c == a#). + * A specialization of [code]#std::hash# for [code]#T# must exist on the + host application that returns a unique value such that if two instances + of T are equal, in accordance with the above definition, then their + resulting hash values are also equal and subsequently if two hash values + are not equal, then their corresponding instances are also not equal, in + accordance with the above definition. + +Some <> classes will have additional behavior associated with +copy, movement, assignment or destruction semantics. +If these are specified they are in addition to those specified above unless +stated otherwise. + +Each of the runtime classes mentioned above must provide a common interface +of special member functions in order to fulfill the copy, move, destruction +requirements and hidden friend functions in order to fulfill the equality +requirements. + +A hidden friend function is a function first declared via a [code]#friend# +declaration with no additional out of class or namespace scope declarations. +Hidden friend functions are only visible to ADL (Argument Dependent Lookup) +and are hidden from qualified and unqualified lookup. +Hidden friend functions have the benefits of avoiding accidental implicit +conversions and faster compilation. These common special member functions and hidden friend functions are described in <> and @@ -510,43 +459,41 @@ bool operator!=(const T& lhs, const T& rhs) [[sec:byval-semantics]] === Common by-value semantics -Each of the following <> classes: [code]#id#, -[code]#range#, [code]#item#, [code]#nd_item#, -[code]#h_item#, [code]#group#, [code]#sub_group# and -[code]#nd_range# must follow the following statements, where -[code]#T# is the runtime class type: +Each of the following <> classes: [code]#id#, [code]#range#, +[code]#item#, [code]#nd_item#, [code]#h_item#, [code]#group#, +[code]#sub_group# and [code]#nd_range# must follow the following statements, +where [code]#T# is the runtime class type: - * [code]#T# must be default copy constructible and copy assignable on - the host application (in the case where T is available on the host) and + * [code]#T# must be default copy constructible and copy assignable on the + host application (in the case where T is available on the host) and within SYCL kernel functions. - * [code]#T# must be default destructible on the host application (in - the case where T is available on the host) and within SYCL kernel - functions. - * [code]#T# must be default move constructible and default move - assignable on the host application (in the case where T is available on - the host) and within SYCL kernel functions. - * [code]#T# must be equality comparable on the host application (in - the case where T is available on the host) and within SYCL kernel - functions. Equality between two instances of [code]#T# (i.e. - [code]#a == b#) must be true if the value of all members are equal - and non-equality between two instances of [code]#T# (i.e. - [code]#a != b#) must be true if the value of any members are not - equal, unless either instance has become invalidated by a move - operation. By extension of the requirements above, equality on - [code]#T# must guarantee to be reflexive (i.e. [code]#a == a#), - symmetric (i.e. [code]#a == b# implies [code]#b == a# and [code]#a != b# - implies [code]#b != a#) and transitive (i.e. [code]#a == b && b == c# - implies [code]#c == a#). - -Some <> classes will have additional behavior associated -with copy, movement, assignment or destruction semantics. If these are -specified they are in addition to those specified above unless stated -otherwise. - -Each of the runtime classes mentioned above must provide a common -interface of special member functions and member functions in order to -fulfill the copy, move, destruction and equality requirements, -following the <> and the <>. + * [code]#T# must be default destructible on the host application (in the + case where T is available on the host) and within SYCL kernel functions. + * [code]#T# must be default move constructible and default move assignable + on the host application (in the case where T is available on the host) + and within SYCL kernel functions. + * [code]#T# must be equality comparable on the host application (in the + case where T is available on the host) and within SYCL kernel functions. + Equality between two instances of [code]#T# (i.e. [code]#a == b#) must + be true if the value of all members are equal and non-equality between + two instances of [code]#T# (i.e. [code]#a != b#) must be true if the + value of any members are not equal, unless either instance has become + invalidated by a move operation. + By extension of the requirements above, equality on [code]#T# must + guarantee to be reflexive (i.e. [code]#a == a#), symmetric (i.e. + [code]#a == b# implies [code]#b == a# and [code]#a != b# implies + [code]#b != a#) and transitive (i.e. [code]#a == b && b == c# implies + [code]#c == a#). + +Some <> classes will have additional behavior associated with +copy, movement, assignment or destruction semantics. +If these are specified they are in addition to those specified above unless +stated otherwise. + +Each of the runtime classes mentioned above must provide a common interface +of special member functions and member functions in order to fulfill the +copy, move, destruction and equality requirements, following the +<> and the <>. These common special member functions and hidden friend functions are described in <> and @@ -622,35 +569,25 @@ bool operator!=(const T& lhs, const T& rhs) === Properties -Each of the following <> classes: -[code]#accessor#, -[code]#buffer#, -[code]#host_accessor#, -[code]#host_sampled_image_accessor#, -[code]#host_unsampled_image_accessor#, -[code]#context#, -[code]#local_accessor#, -[code]#queue#, -[code]#sampled_image#, -[code]#sampled_image_accessor#, -[code]#stream#, -[code]#unsampled_image#, -[code]#unsampled_image_accessor# and -[code]#usm_allocator# -provide an optional parameter in each of -their constructors to provide a [code]#property_list# which -contains zero or more properties. Each of those properties augments -the semantics of the class with a particular feature. Each of those -classes must also provide [code]#has_property# and -[code]#get_property# member functions for querying for a -particular property. +Each of the following <> classes: [code]#accessor#, +[code]#buffer#, [code]#host_accessor#, [code]#host_sampled_image_accessor#, +[code]#host_unsampled_image_accessor#, [code]#context#, +[code]#local_accessor#, [code]#queue#, [code]#sampled_image#, +[code]#sampled_image_accessor#, [code]#stream#, [code]#unsampled_image#, +[code]#unsampled_image_accessor# and [code]#usm_allocator# provide an +optional parameter in each of their constructors to provide a +[code]#property_list# which contains zero or more properties. +Each of those properties augments the semantics of the class with a +particular feature. +Each of those classes must also provide [code]#has_property# and +[code]#get_property# member functions for querying for a particular +property. The listing below illustrates the usage of various buffer properties, described in <>. -The example illustrates how using properties does not affect the type -of the object, thus, does not prevent the usage of SYCL objects in -containers. +The example illustrates how using properties does not affect the type of the +object, thus, does not prevent the usage of SYCL objects in containers. [source,,linenums] ---- @@ -658,18 +595,19 @@ include::{code_dir}/propertyExample.cpp[lines=4..-1] ---- Each property is represented by a unique class and an instance of a property -is an instance of that type. Some properties can be default constructed -while others will require an argument on construction. A property may be -applicable to more than one class, however some properties may not be -compatible with each other. See the requirements for the properties of the -SYCL [code]#buffer# class, SYCL [code]#unsampled_image# class and -SYCL [code]#sampled_image# class in <> -and <> respectively. - -Properties can be passed to a <> class -via an instance of [code]#property_list#. -These properties get tied to the <> class instance -and copies of the object will contain the same properties. +is an instance of that type. +Some properties can be default constructed while others will require an +argument on construction. +A property may be applicable to more than one class, however some properties +may not be compatible with each other. +See the requirements for the properties of the SYCL [code]#buffer# class, +SYCL [code]#unsampled_image# class and SYCL [code]#sampled_image# class in +<> and <> respectively. + +Properties can be passed to a <> class via an instance of +[code]#property_list#. +These properties get tied to the <> class instance and copies +of the object will contain the same properties. A SYCL implementation or a <> may provide additional properties other than those defined here, provided they are defined in accordance with @@ -677,15 +615,14 @@ the requirements described in <>. ==== Properties interface -Each of the runtime classes mentioned above must provide a common -interface of member functions in order to fulfill the property -interface requirements. +Each of the runtime classes mentioned above must provide a common interface +of member functions in order to fulfill the property interface requirements. A synopsis of the common properties interface, the SYCL -[code]#property_list# class and the SYCL property classes is provided -below. The member functions of the common properties interface are listed in -<>. The constructors of the SYCL -[code]#property_list# class are listed in +[code]#property_list# class and the SYCL property classes is provided below. +The member functions of the common properties interface are listed in +<>. +The constructors of the SYCL [code]#property_list# class are listed in <>. [source,,linenums] @@ -801,41 +738,42 @@ Construct a SYCL [code]#property_list# with zero or more properties. [[sec:device-selection]] === Device selection -Since a system can have several SYCL-compatible devices attached, it -is useful to have a way to select a specific device or a set of -devices to construct a specific object such as a -[code]#device# (see <>) or a -[code]#queue# (see <>), or -perform some operations on a device subset. +Since a system can have several SYCL-compatible devices attached, it is +useful to have a way to select a specific device or a set of devices to +construct a specific object such as a [code]#device# (see +<>) or a [code]#queue# (see +<>), or perform some operations on a device +subset. -Device selection is done either by already having a specific instance -of a [code]#device# (see <>) or by -providing a <> which is a ranking function that will give -an integer ranking value to all the devices on the system. +Device selection is done either by already having a specific instance of a +[code]#device# (see <>) or by providing a +<> which is a ranking function that will give an integer +ranking value to all the devices on the system. [[sec:device-selector]] ==== Device selector -The interface for a <> is any object that meets the C++ named -requirement [code]#Callable#, taking a parameter of type [code]#const device &# -and returning a value that is implicitly convertible to [code]#int#. +The interface for a <> is any object that meets the C++ +named requirement [code]#Callable#, taking a parameter of type [code]#const +device &# and returning a value that is implicitly convertible to +[code]#int#. -At any point where the <> needs to select a SYCL [code]#device# -using a <>, the system queries all +At any point where the <> needs to select a SYCL +[code]#device# using a <>, the system queries all <> from all <> in the system, calls the <> on each device and selects the one -which returns the highest score. If the highest value is strictly negative no -device is selected. +which returns the highest score. +If the highest value is strictly negative no device is selected. In places where only one device has to be picked and the high score is obtained by more than one device, then one of the tied devices will be -returned, but which one is not defined and may depend on enumeration -order, for example, outside the control of the SYCL runtime. +returned, but which one is not defined and may depend on enumeration order, +for example, outside the control of the SYCL runtime. -Some predefined <> are provided by the system as -described on <> in a header file with -some definition similar to the following: +Some predefined <> are provided by the +system as described on <> in a header file with some +definition similar to the following: [[table.device.selectors]] @@ -944,7 +882,8 @@ and to <> for examples. include::{header_dir}/deviceSelector.h[lines=4..-1] ---- -Typical examples of default and user-provided <> could be: +Typical examples of default and user-provided <> could be: [source,,linenums] ---- @@ -1006,29 +945,30 @@ auto dev4 = device{aspect_selector( [NOTE] ==== -In SYCL 1.2.1 the predefined device selectors were actually types -that had to be instantiated to be used. Now they are just -instances. To simplify porting code using the old type -instantiations, a backward-compatible API is still provided, such as -[code]#sycl::default_selector#. The new predefined device -selectors have their new names appended with "_v" to avoid -conflicts, thus following the naming style used by traits in the {cpp} -standard library. There is no requirement for the implementation to -have for example [code]#sycl::gpu_selector_v# being an instance -of [code]#sycl::gpu_selector#. +In SYCL 1.2.1 the predefined device selectors were actually types that had +to be instantiated to be used. +Now they are just instances. +To simplify porting code using the old type instantiations, a +backward-compatible API is still provided, such as +[code]#sycl::default_selector#. +The new predefined device selectors have their new names appended with "_v" +to avoid conflicts, thus following the naming style used by traits in the +{cpp} standard library. +There is no requirement for the implementation to have for example +[code]#sycl::gpu_selector_v# being an instance of +[code]#sycl::gpu_selector#. ==== NOTE: Implementation note: the SYCL API might rely on SFINAE or {cpp20} -concepts to resolve some ambiguity in constructors with default -parameters. +concepts to resolve some ambiguity in constructors with default parameters. [[sec:platform-class]] === Platform class -The SYCL [code]#platform# class encapsulates a single SYCL platform on -which SYCL kernel functions may be executed. A SYCL platform must be -associated with a single <>. +The SYCL [code]#platform# class encapsulates a single SYCL platform on which +SYCL kernel functions may be executed. +A SYCL platform must be associated with a single <>. A SYCL [code]#platform# is also associated with one or more SYCL [code]#devices# associated with the same <>. @@ -1037,26 +977,26 @@ All member functions of the [code]#platform# class are synchronous and errors are handled by throwing synchronous SYCL exceptions. The execution environment for a SYCL application has a fixed number of -platforms which does not vary as the application executes. The application -can get a list of all these platforms via [code]#platform::get_platforms()#, -and the order of the platform objects is the same each time the application -calls that function. The [code]#platform# class also provides constructors, -but constructing a new [code]#platform# instance merely creates a new object -that is a copy of one of the objects returned by -[code]#platform::get_platforms()#. +platforms which does not vary as the application executes. +The application can get a list of all these platforms via +[code]#platform::get_platforms()#, and the order of the platform objects is +the same each time the application calls that function. +The [code]#platform# class also provides constructors, but constructing a +new [code]#platform# instance merely creates a new object that is a copy of +one of the objects returned by [code]#platform::get_platforms()#. -The SYCL [code]#platform# class provides the common reference semantics -(see <>). +The SYCL [code]#platform# class provides the common reference semantics (see +<>). ==== Platform interface -A synopsis of the SYCL [code]#platform# class is provided below. The -constructors, member functions and static member functions of the SYCL -[code]#platform# class are listed in -<>, <> and -<> respectively. The additional common -special member functions and common member functions are listed in -<> in +A synopsis of the SYCL [code]#platform# class is provided below. +The constructors, member functions and static member functions of the SYCL +[code]#platform# class are listed in <>, +<> and <> +respectively. +The additional common special member functions and common member functions +are listed in <> in <> and <> respectively. @@ -1185,12 +1125,12 @@ static std::vector get_platforms() A <> can be queried for information using the [code]#get_info# member function of the [code]#platform# class, specifying one of the info -parameters in [code]#info::platform#. The possible values for each info -parameter and any restrictions are defined in the specification of the -<> associated with the <>. All info parameters in -[code]#info::platform# are specified in <> and the -synopsis for [code]#info::platform# is described in -<>. +parameters in [code]#info::platform#. +The possible values for each info parameter and any restrictions are defined +in the specification of the <> associated with the <>. +All info parameters in [code]#info::platform# are specified in +<> and the synopsis for [code]#info::platform# is +described in <>. [[table.platform.info]] @@ -1245,37 +1185,37 @@ Returns the extensions supported by the <>. [[sec:interface.context.class]] === Context class -The <> class represents a SYCL <>. A <> -represents the runtime data structures and state required by a <> -API to interact with a group of devices associated with a platform. +The <> class represents a SYCL <>. +A <> represents the runtime data structures and state required by a +<> API to interact with a group of devices associated with a +platform. -The SYCL [code]#context# class provides the common reference semantics -(see <>). +The SYCL [code]#context# class provides the common reference semantics (see +<>). ==== Context interface -The constructors and member functions of the SYCL [code]#context# class -are listed in <> and -<>, respectively. The additional common special -member functions and common member functions are listed in -<> in +The constructors and member functions of the SYCL [code]#context# class are +listed in <> and <>, +respectively. +The additional common special member functions and common member functions +are listed in <> in <> and <>, respectively. -All member functions of the <> class are synchronous and errors -are handled by throwing synchronous SYCL exceptions. +All member functions of the <> class are synchronous and errors are +handled by throwing synchronous SYCL exceptions. -All constructors of the SYCL <> class will construct an -instance associated with a particular <>, determined by the -constructor parameters or, in the case of the default constructor, the -SYCL [code]#device# produced by the -[code]#default_selector_v#. +All constructors of the SYCL <> class will construct an instance +associated with a particular <>, determined by the constructor +parameters or, in the case of the default constructor, the SYCL +[code]#device# produced by the [code]#default_selector_v#. A SYCL [code]#context# can optionally be constructed with an -[code]#async_handler# parameter. In this case the -[code]#async_handler# is used to report asynchronous SYCL exceptions, -as described in <>. +[code]#async_handler# parameter. +In this case the [code]#async_handler# is used to report asynchronous SYCL +exceptions, as described in <>. Information about a SYCL <> may be queried through the [code]#get_info()# member function. @@ -1384,11 +1324,12 @@ std::vector get_devices() const A <> can be queried for information using the [code]#get_info# member function of the [code]#context# class, specifying one of the info -parameters in [code]#info::context#. The possible values for each info -parameter and any restrictions are defined in the specification of the -<> associated with the <>. All info parameters in -[code]#info::context# are specified in <> and the synopsis -for [code]#info::context# is described in <>. +parameters in [code]#info::context#. +The possible values for each info parameter and any restrictions are defined +in the specification of the <> associated with the <>. +All info parameters in [code]#info::context# are specified in +<> and the synopsis for [code]#info::context# is +described in <>. [[table.context.info]] @@ -1496,7 +1437,8 @@ info::context::atomic_fence_scope_capabilities [[sec:context-properties]] ==== Context properties -The [code]#property_list# constructor parameters are present for extensibility. +The [code]#property_list# constructor parameters are present for +extensibility. // \input{device_class} // %%%%%%%%%%%%%%%%%%%%%%%%%%%% begin device_class %%%%%%%%%%%%%%%%%%%%%%%%%%%% @@ -1504,41 +1446,41 @@ The [code]#property_list# constructor parameters are present for extensibility. [[sec:device-class]] === Device class -The SYCL [code]#device# class encapsulates a single SYCL device on -which <> can be executed. +The SYCL [code]#device# class encapsulates a single SYCL device on which +<> can be executed. -All member functions of the [code]#device# class are synchronous and -errors are handled by throwing synchronous SYCL exceptions. +All member functions of the [code]#device# class are synchronous and errors +are handled by throwing synchronous SYCL exceptions. The execution environment for a SYCL application has a fixed number of -<> which does not vary as the application executes. +<> which does not vary as the application +executes. The application can get a list of all these devices via -[code]#device::get_devices()#, and the order of the device objects is the same -each time the application calls that function (assuming the parameter to that -function is the same for each call). The [code]#device# class also provides -constructors, but constructing a new [code]#device# instance merely creates a -new object that is a copy of one of the objects returned by -[code]#device::get_devices()#. +[code]#device::get_devices()#, and the order of the device objects is the +same each time the application calls that function (assuming the parameter +to that function is the same for each call). +The [code]#device# class also provides constructors, but constructing a new +[code]#device# instance merely creates a new object that is a copy of one of +the objects returned by [code]#device::get_devices()#. A SYCL [code]#device# can be partitioned into multiple SYCL devices, by -calling the [code]#create_sub_devices()# member function template. The -resulting SYCL [code]#devices# are considered sub devices, and it is -valid to partition these sub devices further. The range of support for this -feature is <> and device specific and can be queried for through -[code]#get_info()#. +calling the [code]#create_sub_devices()# member function template. +The resulting SYCL [code]#devices# are considered sub devices, and it is +valid to partition these sub devices further. +The range of support for this feature is <> and device specific and +can be queried for through [code]#get_info()#. -The SYCL [code]#device# class provides the common reference semantics -(see <>). +The SYCL [code]#device# class provides the common reference semantics (see +<>). ==== Device interface -A synopsis of the SYCL [code]#device# class is provided below. The -constructors, member functions and static member functions of the SYCL -[code]#device# class are listed in -<>, <> and -<> respectively. The additional common special -member functions and common member functions are listed in -<> in +A synopsis of the SYCL [code]#device# class is provided below. +The constructors, member functions and static member functions of the SYCL +[code]#device# class are listed in <>, +<> and <> respectively. +The additional common special member functions and common member functions +are listed in <> in <> and <>, respectively. @@ -1781,11 +1723,12 @@ get_devices(info::device_type deviceType = info::device_type::all) A <> can be queried for information using the [code]#get_info# member function of the [code]#device# class, specifying one of the info -parameters in [code]#info::device#. The possible values for each info -parameter and any restriction are defined in the specification of the -<> associated with the <>. All info parameters in -[code]#info::device# are specified in <> and the synopsis -for [code]#info::device# is described in <>. +parameters in [code]#info::device#. +The possible values for each info parameter and any restriction are defined +in the specification of the <> associated with the <>. +All info parameters in [code]#info::device# are specified in +<> and the synopsis for [code]#info::device# is described +in <>. [[table.device.info]] @@ -2081,23 +2024,24 @@ info::device::half_fp_config * [code]#info::fp_config::inf_nan:# INF and quiet NaNs are supported. * [code]#info::fp_config::round_to_nearest:# round to nearest even rounding mode is supported. - * [code]#info::fp_config::round_to_zero:# round to zero rounding mode - is supported. - * [code]#info::fp_config::round_to_inf:# round to positive and - negative infinity rounding modes are supported. + * [code]#info::fp_config::round_to_zero:# round to zero rounding mode is + supported. + * [code]#info::fp_config::round_to_inf:# round to positive and negative + infinity rounding modes are supported. * [code]#info::fp_config::fma:# IEEE754-2008 fused multiply add is supported. - * [code]#info::fp_config::correctly_rounded_divide_sqrt:# divide and - sqrt are correctly rounded as defined by the IEEE754 specification. + * [code]#info::fp_config::correctly_rounded_divide_sqrt:# divide and sqrt + are correctly rounded as defined by the IEEE754 specification. This property is deprecated. - * [code]#info::fp_config::soft_float:# basic floating-point - operations (such as addition, subtraction, multiplication) are - implemented in software. + * [code]#info::fp_config::soft_float:# basic floating-point operations + (such as addition, subtraction, multiplication) are implemented in + software. If half precision is supported by this SYCL [code]#device# (i.e. the -[code]#device# has [code]#aspect::fp16# there is no minimum -floating-point capability. If half support is not supported the returned -[code]#std::vector# must be empty. +[code]#device# has [code]#aspect::fp16# there is no minimum floating-point +capability. +If half support is not supported the returned [code]#std::vector# must be +empty. -- a@ @@ -2116,22 +2060,22 @@ info::device::single_fp_config * [code]#info::fp_config::inf_nan:# INF and quiet NaNs are supported. * [code]#info::fp_config::round_to_nearest:# round to nearest even rounding mode is supported. - * [code]#info::fp_config::round_to_zero:# round to zero rounding mode - is supported. - * [code]#info::fp_config::round_to_inf:# round to positive and - negative infinity rounding modes are supported. + * [code]#info::fp_config::round_to_zero:# round to zero rounding mode is + supported. + * [code]#info::fp_config::round_to_inf:# round to positive and negative + infinity rounding modes are supported. * [code]#info::fp_config::fma:# IEEE754-2008 fused multiply add is supported. - * [code]#info::fp_config::correctly_rounded_divide_sqrt:# divide and - sqrt are correctly rounded as defined by the IEEE754 specification. + * [code]#info::fp_config::correctly_rounded_divide_sqrt:# divide and sqrt + are correctly rounded as defined by the IEEE754 specification. This property is deprecated. - * [code]#info::fp_config::soft_float:# basic floating-point - operations (such as addition, subtraction, multiplication) are - implemented in software. + * [code]#info::fp_config::soft_float:# basic floating-point operations + (such as addition, subtraction, multiplication) are implemented in + software. -If this SYCL [code]#device# is not of type -[code]#info::device_type::custom# then the minimum floating-point -capability must be: [code]#info::fp_config::round_to_nearest# and +If this SYCL [code]#device# is not of type [code]#info::device_type::custom# +then the minimum floating-point capability must be: +[code]#info::fp_config::round_to_nearest# and [code]#info::fp_config::inf_nan#. -- @@ -2151,27 +2095,26 @@ info::device::double_fp_config * [code]#info::fp_config::inf_nan:# INF and NaNs are supported. * [code]#info::fp_config::round_to_nearest:# round to nearest even rounding mode is supported. - * [code]#info::fp_config::round_to_zero:# round to zero rounding mode - is supported. - * [code]#info::fp_config::round_to_inf:# round to positive and - negative infinity rounding modes are supported. + * [code]#info::fp_config::round_to_zero:# round to zero rounding mode is + supported. + * [code]#info::fp_config::round_to_inf:# round to positive and negative + infinity rounding modes are supported. * [code]#info::fp_config::fma:# IEEE754-2008 fused multiply-add is supported. - * [code]#info::fp_config::soft_float:# basic floating-point - operations (such as addition, subtraction, multiplication) are - implemented in software. + * [code]#info::fp_config::soft_float:# basic floating-point operations + (such as addition, subtraction, multiplication) are implemented in + software. If double precision is supported by this SYCL [code]#device# (i.e. the -[code]#device# has [code]#aspect::fp64# and this SYCL -[code]#device# is not of type [code]#info::device_type::custom# -then the minimum floating-point capability must be: -[code]#info::fp_config::fma#, +[code]#device# has [code]#aspect::fp64# and this SYCL [code]#device# is not +of type [code]#info::device_type::custom# then the minimum floating-point +capability must be: [code]#info::fp_config::fma#, [code]#info::fp_config::round_to_nearest#, [code]#info::fp_config::round_to_zero#, -[code]#info::fp_config::round_to_inf#, -[code]#info::fp_config::inf_nan# and -[code]#info::fp_config::denorm#. If double support is not supported the -returned [code]#std::vector# must be empty. +[code]#info::fp_config::round_to_inf#, [code]#info::fp_config::inf_nan# and +[code]#info::fp_config::denorm#. +If double support is not supported the returned [code]#std::vector# must be +empty. -- a@ @@ -2526,10 +2469,10 @@ info::device::extensions @ [.code]#std::vector# a@ Deprecated, use [code]#info::device::aspects# instead. -- -Returns a [code]#std::vector# of extension names (the extension names -do not contain any spaces) supported by this SYCL [code]#device#. The -extension names returned can be vendor supported extension names and one or -more of the following Khronos approved extension names: +Returns a [code]#std::vector# of extension names (the extension names do not +contain any spaces) supported by this SYCL [code]#device#. +The extension names returned can be vendor supported extension names and one +or more of the following Khronos approved extension names: * [code]#cl_khr_int64_base_atomics# * [code]#cl_khr_int64_extended_atomics# @@ -2557,8 +2500,8 @@ Khronos extension names must be returned by all device that support OpenCL C * [code]#cl_khr_local_int32_base_atomics# * [code]#cl_khr_local_int32_extended_atomics# * [code]#cl_khr_byte_addressable_store# - * [code]#cl_khr_fp64# (for backward compatibility if double precision - is supported) + * [code]#cl_khr_fp64# (for backward compatibility if double precision is + supported) Please refer to the OpenCL 1.2 Extension Specification for a detailed description of these extensions. @@ -2677,8 +2620,8 @@ info::device::partition_type_affinity_domain ==== Device aspects Every SYCL <> has an associated set of <> which -identify characteristics of the [code]#device#. Aspects are defined via -the [code]#enum class aspect# enumeration: +identify characteristics of the [code]#device#. +Aspects are defined via the [code]#enum class aspect# enumeration: [source,,linenums] ---- @@ -2687,11 +2630,13 @@ include::{header_dir}/deviceEnumClassAspect.h[lines=4..-1] SYCL applications can query the aspects for a [code]#device# via [code]#device::has()# in order to determine whether the [code]#device# -supports any optional features. <> lists the aspects that -are defined in the <> and tells which optional features correspond -to each. Backends and extensions may provide additional aspects and additional -optional device features. If so, the <> specification document or the -extension document describes them. +supports any optional features. +<> lists the aspects that are defined in the +<> and tells which optional features correspond to each. +Backends and extensions may provide additional aspects and additional +optional device features. +If so, the <> specification document or the extension document +describes them. [[table.device.aspect]] .Device aspects defined by the <> @@ -2747,10 +2692,11 @@ aspect::emulated [NOTE] ==== As an example, a vendor might support both a hardware FPGA device and a -software emulated FPGA, where the emulated FPGA has all the same features -as the hardware one but runs more slowly and can provide additional profiling -or diagnostic information. In such a case, an application's -<> can use [code]#aspect::emulated# to distinguish the two. +software emulated FPGA, where the emulated FPGA has all the same features as +the hardware one but runs more slowly and can provide additional profiling +or diagnostic information. +In such a case, an application's <> can use +[code]#aspect::emulated# to distinguish the two. ==== a@ @@ -2884,12 +2830,15 @@ aspect::usm_system_allocations |==== The implementation also provides two traits that the application can use to -query aspects at compilation time. The traits [code]#any_device_has# -and [code]#all_devices_have# are set according to the collection of +query aspects at compilation time. +The traits [code]#any_device_has# and +[code]#all_devices_have# are set according to the collection of devices _D_ that can possibly execute device code, as determined by the -compilation environment. The trait [code]#any_device_has# inherits -from [code]#std::true_type# only if at least one device in _D_ has the -specified aspect. The trait [code]#all_devices_have# inherits from +compilation environment. +The trait [code]#any_device_has# inherits from +[code]#std::true_type# only if at least one device in _D_ has the specified +aspect. +The trait [code]#all_devices_have# inherits from [code]#std::true_type# only if all devices in _D_ have the specified aspect. [source,,linenums] @@ -2897,9 +2846,10 @@ specified aspect. The trait [code]#all_devices_have# inherits from include::{header_dir}/aspectTraits.h[lines=4..-1] ---- -Applications can use these traits to reduce their code size. The following -example demonstrates one way to use these traits to avoid instantiating a -templated kernel for device features that are not supported by any device. +Applications can use these traits to reduce their code size. +The following example demonstrates one way to use these traits to avoid +instantiating a templated kernel for device features that are not supported +by any device. [source,,linenums] ---- @@ -2907,33 +2857,36 @@ include::{code_dir}/aspectTraitExample.cpp[lines=4..-1] ---- The kernel function [code]#MyKernel# is templated to use a different -algorithm depending on whether the device has the aspect [code]#aspect::fp16#, -and the call to [code]#dev.has()# chooses the kernel function instantiation -that matches the device's capabilities. However, the use of -[code]#any_device_has_v# and [code]#all_devices_have_v# entirely avoid -useless instantiations of the kernel function. For example, when the -compilation environment does not support any devices with [code]#aspect::fp16#, -[code]#any_device_has_v# is [code]#false#, and the kernel -function is never instantiated with support for the [code]#sycl::half# type. +algorithm depending on whether the device has the aspect +[code]#aspect::fp16#, and the call to [code]#dev.has()# chooses the kernel +function instantiation that matches the device's capabilities. +However, the use of [code]#any_device_has_v# and [code]#all_devices_have_v# +entirely avoid useless instantiations of the kernel function. +For example, when the compilation environment does not support any devices +with [code]#aspect::fp16#, [code]#any_device_has_v# is +[code]#false#, and the kernel function is never instantiated with support +for the [code]#sycl::half# type. [NOTE] ==== Like any trait, the definitions of [code]#any_device_has# and [code]#all_devices_have# are uniform across all parts of a SYCL application. If an implementation uses <>, all compiler passes define a particular -aspect's specialization of the traits the same way, regardless of whether that -compiler pass' device supports the aspect. Thus, [code]#any_device_has# and -[code]#all_devices_have# cannot be used to determine whether any particular -device supports an aspect. Instead, applications must use -[code]#device::has()# or [code]#platform::has()# for this. +aspect's specialization of the traits the same way, regardless of whether +that compiler pass' device supports the aspect. +Thus, [code]#any_device_has# and [code]#all_devices_have# cannot be used to +determine whether any particular device supports an aspect. +Instead, applications must use [code]#device::has()# or +[code]#platform::has()# for this. ==== [NOTE] ==== -An implementation could choose to provide command line options which affect the -set of devices that it supports. If so, those command line options would also -affect these traits. For example, if an implementation provides a command line -option that disables [code]#aspect::accelerator# devices, the trait +An implementation could choose to provide command line options which affect +the set of devices that it supports. +If so, those command line options would also affect these traits. +For example, if an implementation provides a command line option that +disables [code]#aspect::accelerator# devices, the trait [code]#any_device_has# would inherit from [code]#std::false_type# when that command line option was specified. ==== @@ -2941,10 +2894,11 @@ option that disables [code]#aspect::accelerator# devices, the trait [NOTE] ==== These traits only reflect the supported devices at the time the SYCL -application is compiled. It's possible that unsupported devices are still -visible to the application when it runs. However, if a device _D_ is not -supported when the application is compiled, the application will not be able -to submit kernels to that device _D_. +application is compiled. +It's possible that unsupported devices are still visible to the application +when it runs. +However, if a device _D_ is not supported when the application is compiled, +the application will not be able to submit kernels to that device _D_. ==== // %%%%%%%%%%%%%%%%%%%%%%%%%%%% end device_class %%%%%%%%%%%%%%%%%%%%%%%%%%%% @@ -2959,59 +2913,65 @@ to submit kernels to that device _D_. The SYCL [code]#queue# class encapsulates a single SYCL queue which schedules kernels on a SYCL device. -A SYCL [code]#queue# can be used to submit <> to be -executed by the <> using the [code]#submit# member +A SYCL [code]#queue# can be used to submit <> +to be executed by the <> using the [code]#submit# member function. All member functions of the [code]#queue# class are synchronous and errors -are handled by throwing synchronous SYCL exceptions. The [code]#submit# -member function synchronously invokes the provided +are handled by throwing synchronous SYCL exceptions. +The [code]#submit# member function synchronously invokes the provided <> (as described in -<>) in the calling thread, thereby scheduling a -<> for asynchronous execution. Any error in the submission of a -<> is handled by throwing a synchronous SYCL exception. -Any errors from the <> after it has been submitted are handled by -passing <> at specific times to an -<>, as described in <>. - -A SYCL [code]#queue# can wait for all <> that it has -submitted by calling [code]#wait# or [code]#wait_and_throw#. - -The default constructor of the SYCL [code]#queue# class will -construct a queue based on the SYCL [code]#device# returned from -the [code]#default_selector_v# (see <>). -All other constructors construct a queue as determined by the -parameters provided. All constructors will implicitly construct a SYCL -[code]#platform#, [code]#device# and [code]#context# in order to -facilitate the construction of the queue. - -Each constructor takes as the last -parameter an optional SYCL [code]#property_list# to provide properties to -the SYCL [code]#queue#. +<>) in the calling thread, thereby scheduling +a <> for asynchronous execution. +Any error in the submission of a <> is handled by throwing a +synchronous SYCL exception. +Any errors from the <> after it has been submitted are +handled by passing <> at specific times to +an <>, as described in <>. + +A SYCL [code]#queue# can wait for all <> that +it has submitted by calling [code]#wait# or [code]#wait_and_throw#. + +The default constructor of the SYCL [code]#queue# class will construct a +queue based on the SYCL [code]#device# returned from the +[code]#default_selector_v# (see <>). +All other constructors construct a queue as determined by the parameters +provided. +All constructors will implicitly construct a SYCL [code]#platform#, +[code]#device# and [code]#context# in order to facilitate the construction +of the queue. + +Each constructor takes as the last parameter an optional SYCL +[code]#property_list# to provide properties to the SYCL [code]#queue#. A SYCL [code]#queue# may be destroyed even when there are uncompleted -<> that have been submitted to the queue. Doing so does not -block. Instead, any commands that have been submitted to the queue begin -execution when their requisites are satisfied, just as they would had the queue -not been destroyed. Any event objects for those commands are signaled in the -normal manner when the command completes. Resources associated with the queue -will be freed by the time the last command completes. +<> that have been submitted to the queue. +Doing so does not block. +Instead, any commands that have been submitted to the queue begin execution +when their requisites are satisfied, just as they would had the queue not +been destroyed. +Any event objects for those commands are signaled in the normal manner when +the command completes. +Resources associated with the queue will be freed by the time the last +command completes. -The SYCL [code]#queue# class provides the common reference semantics -(see <>). +The SYCL [code]#queue# class provides the common reference semantics (see +<>). ==== Queue interface -A synopsis of the SYCL [code]#queue# class is provided below. The -constructors and member functions of the SYCL [code]#queue# class are +A synopsis of the SYCL [code]#queue# class is provided below. +The constructors and member functions of the SYCL [code]#queue# class are listed in <> and <> -respectively. The additional common special member functions and common member -functions are listed in <> in +respectively. +The additional common special member functions and common member functions +are listed in <> in <> and <>, respectively. -Some queue member functions are shortcuts to member functions of the [code]#handler# class. +Some queue member functions are shortcuts to member functions of the +[code]#handler# class. These are listed in <>. // Interface for class: queue @@ -3292,30 +3252,32 @@ requested by the template parameter [code]#Param#. ==== Queue shortcut functions Queue shortcut functions are member functions of the [code]#queue# class -that implicitly create a command group with an implicit command group [code]#handler# -consisting of a single command, -a call to the member function of the handler object with the same signature -(e.g. [code]#queue::single_task# will call [code]#handler::single_task# with the same arguments), -and submit the command group. -The main signature difference comes from the return type: -member functions of the [code]#handler# return [code]#void#, -whereas corresponding queue shortcut functions return an [code]#event# object -that represents the submitted command group. -Queue shortcuts can additionally take a list of events to wait on, -as if passing the event list to [code]#handler::depends_on# for the implicit command group. +that implicitly create a command group with an implicit command group +[code]#handler# consisting of a single command, a call to the member +function of the handler object with the same signature (e.g. +[code]#queue::single_task# will call [code]#handler::single_task# with the +same arguments), and submit the command group. +The main signature difference comes from the return type: member functions +of the [code]#handler# return [code]#void#, whereas corresponding queue +shortcut functions return an [code]#event# object that represents the +submitted command group. +Queue shortcuts can additionally take a list of events to wait on, as if +passing the event list to [code]#handler::depends_on# for the implicit +command group. The full list of queue shortcuts is defined in <>. -The list of handler member functions is defined in <>. - -It is not allowed to capture accessors into the implicitly created command group. -If a queue shortcut function launches a kernel -(via [code]#single_task# or [code]#parallel_for#), -only USM pointers are allowed inside such kernels. -However, queue shortcuts that perform non-kernel operations -can be provided with a valid placeholder accessor as an argument. -In that case there is an additional step performed: -the implicit command group [code]#handler# calls [code]#handler::require# -on each accessor passed in as a function argument. +The list of handler member functions is defined in +<>. + +It is not allowed to capture accessors into the implicitly created command +group. +If a queue shortcut function launches a kernel (via [code]#single_task# or +[code]#parallel_for#), only USM pointers are allowed inside such kernels. +However, queue shortcuts that perform non-kernel operations can be provided +with a valid placeholder accessor as an argument. +In that case there is an additional step performed: the implicit command +group [code]#handler# calls [code]#handler::require# on each accessor passed +in as a function argument. An example of using queue shortcuts is shown below. @@ -3678,13 +3640,14 @@ a@ Equivalent to submitting a command-group containing ==== Queue information descriptors -A <> can be queried for information using the [code]#get_info# -member function of the [code]#queue# class, specifying one of the info -parameters in [code]#info::queue#. The possible values for each info parameter -and any restriction are defined in the specification of the <> -associated with the <>. All info parameters in [code]#info::queue# are -specified in <> and the synopsis for [code]#info::queue# is -described in <>. +A <> can be queried for information using the [code]#get_info# member +function of the [code]#queue# class, specifying one of the info parameters +in [code]#info::queue#. +The possible values for each info parameter and any restriction are defined +in the specification of the <> associated with the <>. +All info parameters in [code]#info::queue# are specified in +<> and the synopsis for [code]#info::queue# is described +in <>. [[table.queue.info]] .Queue information descriptors @@ -3715,9 +3678,8 @@ info::queue::device [[sec:queue-properties]] ==== Queue properties -The properties that can be provided when constructing the SYCL -[code]#queue# class are describe in -<>. +The properties that can be provided when constructing the SYCL [code]#queue# +class are describe in <>. [[table.properties.queue]] @@ -3759,8 +3721,8 @@ property::queue::in_order |==== -The constructors of the [code]#queue# [code]#property# -classes are listed in <>. +The constructors of the [code]#queue# [code]#property# classes are listed in +<>. [[table.constructors.properties.queue]] @@ -3791,22 +3753,21 @@ property::queue::in_order::in_order() Queue errors come in two forms: - * *Synchronous Errors* are those that we would expect to be - reported directly at the point of waiting on an event, and hence waiting - for a queue to complete, as well as any immediate errors reported by - enqueuing work onto a queue. Such errors are reported through {cpp} - exceptions. - * <> are those that are produced or detected after - associated host API calls have returned (so can't be thrown as - exceptions by the API call), and that are handled by an - <> through which the errors are reported. Handling of - asynchronous errors from a queue occurs at specific times, as described - by <>. + * *Synchronous Errors* are those that we would expect to be reported + directly at the point of waiting on an event, and hence waiting for a + queue to complete, as well as any immediate errors reported by enqueuing + work onto a queue. + Such errors are reported through {cpp} exceptions. + * <> are those that are produced or + detected after associated host API calls have returned (so can't be + thrown as exceptions by the API call), and that are handled by an + <> through which the errors are reported. + Handling of asynchronous errors from a queue occurs at specific times, + as described by <>. -Note that if there are <> to be processed when a queue -is destructed, the handler is called and -this might delay or block the destruction, according to the behavior -of the handler. +Note that if there are <> to be processed +when a queue is destructed, the handler is called and this might delay or +block the destruction, according to the behavior of the handler. // %%%%%%%%%%%%%%%%%%%%%%%%%%%% end queue_class %%%%%%%%%%%%%%%%%%%%%%%%%%%% @@ -3817,27 +3778,28 @@ of the handler. An <> in SYCL is an object that represents the status of an operation that is being executed by the SYCL runtime. -Typically in SYCL, data dependency and execution order is handled implicitly by -the SYCL runtime. However, in some circumstances developers want fine grain control -of the execution, or want to retrieve properties of a command that is running. +Typically in SYCL, data dependency and execution order is handled implicitly +by the SYCL runtime. +However, in some circumstances developers want fine grain control of the +execution, or want to retrieve properties of a command that is running. -Note that, although an event represents the status of a particular operation, -the dependencies of a certain event can be used to keep track of multiple steps -required to synchronize said operation. +Note that, although an event represents the status of a particular +operation, the dependencies of a certain event can be used to keep track of +multiple steps required to synchronize said operation. A SYCL event is returned by the submission of a <>. -The dependencies of the event returned via the submission of the command group -are the implementation-defined commands associated with the <> -execution. - -The SYCL [code]#event# class provides the common reference semantics -(see <>). - -The constructors and member functions of the SYCL [code]#event# class -are listed in <> and -<>, respectively. The additional common special -member functions and common member functions are listed in -<> and +The dependencies of the event returned via the submission of the command +group are the implementation-defined commands associated with the +<> execution. + +The SYCL [code]#event# class provides the common reference semantics (see +<>). + +The constructors and member functions of the SYCL [code]#event# class are +listed in <> and <>, +respectively. +The additional common special member functions and common member functions +are listed in <> and <>, respectively. // Interface for class: event.h @@ -3995,11 +3957,12 @@ template typename Param::return_type get_profiling_info() const An <> can be queried for information using the [code]#get_info# member function of the [code]#event# class, specifying one of the info -parameters in [code]#info::event#. The possible values for each info parameter -and any restrictions are defined in the specification of the <> -associated with the <>. All info parameters in [code]#info::event# are -specified in <> and the synopsis for [code]#info::event# is -described in <>. +parameters in [code]#info::event#. +The possible values for each info parameter and any restrictions are defined +in the specification of the <> associated with the <>. +All info parameters in [code]#info::event# are specified in +<> and the synopsis for [code]#info::event# is described +in <>. [[table.event.info]] .Event class information descriptors @@ -4056,18 +4019,21 @@ info::event_command_status::complete An <> can be queried for profiling information using the [code]#get_profiling_info# member function of the [code]#event# class, specifying one of the profiling info parameters enumerated in -[code]#info::event_profiling#. The possible values for each info parameter and -any restrictions are defined in the specification of the <> -associated with the <>. All info parameters in -[code]#info::event_profiling# are specified in <> -and the synopsis for [code]#info::event_profiling# is described in +[code]#info::event_profiling#. +The possible values for each info parameter and any restrictions are defined +in the specification of the <> associated with the <>. +All info parameters in [code]#info::event_profiling# are specified in +<> and the synopsis for +[code]#info::event_profiling# is described in <>. -Each profiling descriptor returns a 64-bit timestamp that represents the number -of nanoseconds that have elapsed since some implementation-defined timebase. +Each profiling descriptor returns a 64-bit timestamp that represents the +number of nanoseconds that have elapsed since some implementation-defined +timebase. All events that share the same backend are guaranteed to share the same -timebase, therefore the difference between two timestamps from the same backend -yields the number of nanoseconds that have elapsed between those events. +timebase, therefore the difference between two timestamps from the same +backend yields the number of nanoseconds that have elapsed between those +events. [[table.event.profilinginfo]] .Profiling information descriptors for the SYCL [code]#event# class @@ -4126,37 +4092,37 @@ info::event_profiling::command_end [[sec:data.access.and.storage]] == Data access and storage in SYCL -In SYCL, when using <> and <>, -data storage and access are handled by separate classes. -<> and <> handle -storage and ownership of the data, whereas <> handle access to -the data. +In SYCL, when using <> and <>, data storage +and access are handled by separate classes. +<> and <> handle storage and ownership of the +data, whereas <> handle access to the data. Buffers and images in SYCL can be bound to more than one device or context, including across different <>. -They also handle ownership of the -data, while allowing exception handling for blocking -and non-blocking data transfers. Accessors manage data transfers between the host -and all of the devices in the system, as well as tracking of data dependencies. - -Zero-sized buffers and accessors are permitted, but attempting to access data -within them produces undefined behavior, similar to dereferencing a null -pointer in {cpp}. Note that zero-sized accessors can be created in several -ways: by creating an accessor from a zero-sized buffer, by creating an accessor -with a zero-sized buffer sub-range, or by creating an accessor with its default +They also handle ownership of the data, while allowing exception handling +for blocking and non-blocking data transfers. +Accessors manage data transfers between the host and all of the devices in +the system, as well as tracking of data dependencies. + +Zero-sized buffers and accessors are permitted, but attempting to access +data within them produces undefined behavior, similar to dereferencing a +null pointer in {cpp}. +Note that zero-sized accessors can be created in several ways: by creating +an accessor from a zero-sized buffer, by creating an accessor with a +zero-sized buffer sub-range, or by creating an accessor with its default constructor. -When using <> allocations, data storage is managed by USM allocation functions, and -data access is via pointers. See <> for greater detail. +When using <> allocations, data storage is managed by USM allocation +functions, and data access is via pointers. +See <> for greater detail. === Host allocation -A <> may need to allocate temporary objects on the host -to handle some operations (such as copying data from one context to -another). +A <> may need to allocate temporary objects on the host to +handle some operations (such as copying data from one context to another). Allocation on the host is managed using an allocator object, following the standard {cpp} allocator class definition. -The default allocator for memory objects is implementation-defined, -but the user can supply their own allocator class. +The default allocator for memory objects is implementation-defined, but the +user can supply their own allocator class. [source,,linenums] ---- @@ -4165,16 +4131,18 @@ but the user can supply their own allocator class. } ---- -When an allocator returns a [code]#nullptr#, the runtime cannot allocate data on the -host. Note that in this case the runtime will raise an error if it requires -host memory but it is not available (e.g when moving data across <> +When an allocator returns a [code]#nullptr#, the runtime cannot allocate +data on the host. +Note that in this case the runtime will raise an error if it requires host +memory but it is not available (e.g when moving data across <> contexts). In some cases, the implementation may retain a copy of the allocator object -even after the buffer is destroyed. For example, this can happen when the -buffer object is destroyed before commands using accessors to the buffer have -completed. Therefore, the application must be prepared for calls to the -allocator even after the buffer is destroyed. +even after the buffer is destroyed. +For example, this can happen when the buffer object is destroyed before +commands using accessors to the buffer have completed. +Therefore, the application must be prepared for calls to the allocator even +after the buffer is destroyed. [NOTE] ==== @@ -4193,18 +4161,18 @@ allocator). [[subsec:default.allocators]] ==== Default allocators -A default allocator is always defined by the implementation. For allocations -greater than size zero, it is guaranteed to return non-[code]#nullptr# and -new memory positions every call. +A default allocator is always defined by the implementation. +For allocations greater than size zero, it is guaranteed to return +non-[code]#nullptr# and new memory positions every call. The default allocator for const buffers will remove the const-ness of the -type (therefore, the default allocator for a buffer of type [code]#const int# -will be an [code]#Allocator)#. -This implies that host <> will not synchronize with the pointer given -by the user in the buffer/image constructor, but will use the memory -returned by the [code]#Allocator# itself for that purpose. -The user can implement an allocator that returns the same address as the -one passed in the buffer constructor, but it is the responsibility of the -user to handle the potential race conditions. +type (therefore, the default allocator for a buffer of type [code]#const +int# will be an [code]#Allocator)#. +This implies that host <> will not synchronize with the +pointer given by the user in the buffer/image constructor, but will use the +memory returned by the [code]#Allocator# itself for that purpose. +The user can implement an allocator that returns the same address as the one +passed in the buffer constructor, but it is the responsibility of the user +to handle the potential race conditions. [[table.default.allocators]] @@ -4240,52 +4208,56 @@ See <> for details on manual host-device synchronization. === Buffers The [code]#buffer# class defines a shared array of one, two or three -dimensions that can be used by the SYCL <> and has to be accessed using -<> classes. Buffers are templated on both the type of their data, -and the number of dimensions that the data is stored and accessed through. - -A [code]#buffer# does not map to only one underlying backend -object, and all <> memory objects may be temporary for use -within a command group on a specific device. - -The underlying data type of a buffer [code]#T# must be <> as -defined in <>. Some overloads of the [code]#buffer# -constructor initialize the buffer contents by copying objects from host memory -while other overloads construct the buffer without copying objects from the -host. For the overloads that do not copy host objects, the initial state of -the objects in the buffer depends on whether [code]#T# is an implicit-lifetime -type (as defined in the {cpp} core language). If [code]#T# is an -implicit-lifetime type, objects of that type are implicitly created in the -buffer with indeterminate values. For other types, these constructor overloads -merely allocate uninitialized memory, and the application is responsible for -constructing objects by calling placement-new and for destroying them later -by manually calling the object's destructor. - -For the overloads that do copy objects from host memory, the [code]#hostData# -pointer must point to at least _N_ bytes of memory where _N_ is -[code]#sizeof(T) * bufferRange.size()#. If _N_ is zero, [code]#hostData# is -permitted to be a null pointer. +dimensions that can be used by the SYCL <> and has to be accessed +using <> classes. +Buffers are templated on both the type of their data, and the number of +dimensions that the data is stored and accessed through. + +A [code]#buffer# does not map to only one underlying backend object, and all +<> memory objects may be temporary for use within a command group +on a specific device. + +The underlying data type of a buffer [code]#T# must be <> +as defined in <>. +Some overloads of the [code]#buffer# constructor initialize the buffer +contents by copying objects from host memory while other overloads construct +the buffer without copying objects from the host. +For the overloads that do not copy host objects, the initial state of the +objects in the buffer depends on whether [code]#T# is an implicit-lifetime +type (as defined in the {cpp} core language). +If [code]#T# is an implicit-lifetime type, objects of that type are +implicitly created in the buffer with indeterminate values. +For other types, these constructor overloads merely allocate uninitialized +memory, and the application is responsible for constructing objects by +calling placement-new and for destroying them later by manually calling the +object's destructor. + +For the overloads that do copy objects from host memory, the +[code]#hostData# pointer must point to at least _N_ bytes of memory where +_N_ is [code]#sizeof(T) * bufferRange.size()#. +If _N_ is zero, [code]#hostData# is permitted to be a null pointer. A SYCL [code]#buffer# can construct an instance of a SYCL [code]#buffer# -that reinterprets the original SYCL [code]#buffer# with a different -type, dimensionality and range using the member function -[code]#reinterpret#. The reinterpreted SYCL [code]#buffer# that is -constructed must behave as though it were a copy of the SYCL [code]#buffer# -that constructed it (see <>) with the exception -that the type, dimensionality and range of the reinterpreted SYCL -[code]#buffer# must reflect the type, dimensionality and range specified -when calling the [code]#reinterpret# member function. By extension of this, -the class member types [code]#value_type#, [code]#reference# and -[code]#const_reference#, and the member functions [code]#get_range()# -and [code]#size()# of the reinterpreted SYCL [code]#buffer# must -reflect the new type, dimensionality and range. The data that the original SYCL -[code]#buffer# and the reinterpreted SYCL [code]#buffer# manage -remains unaffected, though the representation of the data when accessed through -the reinterpreted SYCL [code]#buffer# may alter to reflect the new type, -dimensionality and range. It is important to note that a reinterpreted SYCL -[code]#buffer# is a copy of the original SYCL [code]#buffer# only, -and not a new SYCL [code]#buffer#. Constructing more than one SYCL -[code]#buffer# managing the same host pointer is still undefined behavior. +that reinterprets the original SYCL [code]#buffer# with a different type, +dimensionality and range using the member function [code]#reinterpret#. +The reinterpreted SYCL [code]#buffer# that is constructed must behave as +though it were a copy of the SYCL [code]#buffer# that constructed it (see +<>) with the exception that the type, +dimensionality and range of the reinterpreted SYCL [code]#buffer# must +reflect the type, dimensionality and range specified when calling the +[code]#reinterpret# member function. +By extension of this, the class member types [code]#value_type#, +[code]#reference# and [code]#const_reference#, and the member functions +[code]#get_range()# and [code]#size()# of the reinterpreted SYCL +[code]#buffer# must reflect the new type, dimensionality and range. +The data that the original SYCL [code]#buffer# and the reinterpreted SYCL +[code]#buffer# manage remains unaffected, though the representation of the +data when accessed through the reinterpreted SYCL [code]#buffer# may alter +to reflect the new type, dimensionality and range. +It is important to note that a reinterpreted SYCL [code]#buffer# is a copy +of the original SYCL [code]#buffer# only, and not a new SYCL [code]#buffer#. +Constructing more than one SYCL [code]#buffer# managing the same host +pointer is still undefined behavior. The SYCL [code]#buffer# class template provides the common reference semantics (see <>). @@ -4295,21 +4267,20 @@ semantics (see <>). The constructors and member functions of the SYCL [code]#buffer# class template are listed in <> and -<>, respectively. The additional common special -member functions and common member functions are listed in -<> and +<>, respectively. +The additional common special member functions and common member functions +are listed in <> and <>, respectively. Each constructor takes as the last parameter an optional SYCL -[code]#property_list# to provide properties to the SYCL -[code]#buffer#. +[code]#property_list# to provide properties to the SYCL [code]#buffer#. The SYCL [code]#buffer# class template takes a template parameter -[code]#AllocatorT# for specifying an allocator which is used by -the <> when allocating temporary memory on the -host. If no template argument is provided, then the default allocator -for the SYCL [code]#buffer# class [code]#buffer_allocator# -will be used (see <>). +[code]#AllocatorT# for specifying an allocator which is used by the +<> when allocating temporary memory on the host. +If no template argument is provided, then the default allocator for the SYCL +[code]#buffer# class [code]#buffer_allocator# will be used (see +<>). // Interface for class: buffer @@ -4914,8 +4885,7 @@ reinterpret() const ==== Buffer properties The properties that can be provided when constructing the SYCL -[code]#buffer# class are describe in -<>. +[code]#buffer# class are describe in <>. [[table.properties.buffer]] @@ -4961,9 +4931,8 @@ property::buffer::context_bound |==== -The constructors and special member functions of the buffer property -classes are listed in -<> and +The constructors and special member functions of the buffer property classes +are listed in <> and <> respectively. @@ -5025,122 +4994,131 @@ context property::buffer::context_bound::get_context() const [[sec:buf-sync-rules]] ==== Buffer synchronization rules -Buffers are reference-counted. When a buffer value is constructed -from another buffer, the two values reference the same buffer and a -reference count is incremented. When a buffer value is destroyed, -the reference count is decremented. Only when there are no more -buffer values that reference a specific buffer is the actual -buffer destroyed and the buffer destruction behavior defined +Buffers are reference-counted. +When a buffer value is constructed from another buffer, the two values +reference the same buffer and a reference count is incremented. +When a buffer value is destroyed, the reference count is decremented. +Only when there are no more buffer values that reference a specific buffer +is the actual buffer destroyed and the buffer destruction behavior defined below is followed. -If any error occurs on buffer destruction, it is reported -via the associated queue's asynchronous error handling mechanism. +If any error occurs on buffer destruction, it is reported via the associated +queue's asynchronous error handling mechanism. -The basic rule for the blocking behavior of a buffer destructor is -that it blocks if there is some data to write back because a -write accessor on it has been created, or if the buffer was constructed -with attached host memory and is still in use. +The basic rule for the blocking behavior of a buffer destructor is that it +blocks if there is some data to write back because a write accessor on it +has been created, or if the buffer was constructed with attached host memory +and is still in use. More precisely: . A buffer can be constructed from a [code]#range# (and without a - [code]#hostData# pointer). The memory management for this type of buffer - is entirely handled by the SYCL system. The destructor for this type of - buffer does not need to block, even if work on the buffer has not - completed. Instead, the SYCL system frees any storage required for the - buffer asynchronously when it is no longer in use in queues. The initial - contents of the buffer are unspecified. - . A buffer can be constructed from a [code]#hostData# pointer. The buffer - will use this host memory for its full lifetime, but the contents of this - host memory are unspecified for the lifetime of the buffer. If the host - memory is modified on the host or if it is used to construct another - buffer or image during the lifetime of this buffer, then the results are - undefined. The initial contents of the buffer will be the contents of the - host memory at the time of construction. + [code]#hostData# pointer). + The memory management for this type of buffer is entirely handled by the + SYCL system. + The destructor for this type of buffer does not need to block, even if + work on the buffer has not completed. + Instead, the SYCL system frees any storage required for the buffer + asynchronously when it is no longer in use in queues. + The initial contents of the buffer are unspecified. + . A buffer can be constructed from a [code]#hostData# pointer. + The buffer will use this host memory for its full lifetime, but the + contents of this host memory are unspecified for the lifetime of the + buffer. + If the host memory is modified on the host or if it is used to construct + another buffer or image during the lifetime of this buffer, then the + results are undefined. + The initial contents of the buffer will be the contents of the host + memory at the time of construction. + -- -When the buffer is destroyed, the destructor will block until all -work in queues on the buffer have completed, then copy the contents -of the buffer back to the host memory (if required) and then -return. +When the buffer is destroyed, the destructor will block until all work in +queues on the buffer have completed, then copy the contents of the buffer +back to the host memory (if required) and then return. .. If the type of the host data is [code]#const#, then the buffer is read-only; only read accessors are allowed on the buffer and no-copy-back to host memory is performed (although the host memory must - still be kept available for use by SYCL). When using the default buffer - allocator, the const-ness of the type will be removed in order to allow - host allocation of memory, which will allow temporary host copies of the - data by the <>, for example for speeding up host - accesses. + still be kept available for use by SYCL). + When using the default buffer allocator, the const-ness of the type will + be removed in order to allow host allocation of memory, which will allow + temporary host copies of the data by the <>, for example + for speeding up host accesses. + -When the buffer is destroyed, the destructor will block until all work -in queues on the buffer have completed and then return, as there is no -copy of data back to host. - .. If the type of the host data is not [code]#const# but the pointer - to host data is [code]#const#, then the read-only restriction - applies only on host and not on device accesses. +When the buffer is destroyed, the destructor will block until all work in +queues on the buffer have completed and then return, as there is no copy of +data back to host. + .. If the type of the host data is not [code]#const# but the pointer to + host data is [code]#const#, then the read-only restriction applies only + on host and not on device accesses. + -When the buffer is destroyed, the destructor will block until all work -in queues on the buffer have completed. +When the buffer is destroyed, the destructor will block until all work in +queues on the buffer have completed. -- - . A buffer can be constructed using a [code]#shared_ptr# to host - data. This pointer is shared between the SYCL application and the - runtime. In order to allow synchronization between the application and - the runtime a [code]#mutex# is used which will be locked by the - runtime whenever the data is in use, and unlocked when it is no longer - needed. + . A buffer can be constructed using a [code]#shared_ptr# to host data. + This pointer is shared between the SYCL application and the runtime. + In order to allow synchronization between the application and the + runtime a [code]#mutex# is used which will be locked by the runtime + whenever the data is in use, and unlocked when it is no longer needed. + -- The [code]#shared_ptr# reference counting is used in order to prevent -destroying the buffer host data prematurely. If the [code]#shared_ptr# -is deleted from the user application before buffer destruction, the buffer -can continue securely because the pointer hasn't been destroyed yet. It will -not copy data back to the host before destruction, however, as the +destroying the buffer host data prematurely. +If the [code]#shared_ptr# is deleted from the user application before buffer +destruction, the buffer can continue securely because the pointer hasn't +been destroyed yet. +It will not copy data back to the host before destruction, however, as the application side has already deleted its copy. -Note that since there is an implicit conversion of a -[code]#std::unique_ptr# to a [code]#std::shared_ptr#, a -[code]#std::unique_ptr# can also be used to pass the ownership to the -<>. +Note that since there is an implicit conversion of a [code]#std::unique_ptr# +to a [code]#std::shared_ptr#, a [code]#std::unique_ptr# can also be used to +pass the ownership to the <>. -- - . A buffer can be constructed from a pair of iterator values. In this - case, the buffer construction will copy the data from the data range - defined by the iterator pair. The destructor will not copy back any data - and does not need to block. + . A buffer can be constructed from a pair of iterator values. + In this case, the buffer construction will copy the data from the data + range defined by the iterator pair. + The destructor will not copy back any data and does not need to block. . A buffer can be constructed from a container on which - [code]#std::data(container)# and [code]#std::size(container)# - are well-formed. The initial contents of the buffer will - be the contents of the container at the time of construction. + [code]#std::data(container)# and [code]#std::size(container)# are + well-formed. + The initial contents of the buffer will be the contents of the container + at the time of construction. + -- -The buffer may use the memory within the container for its full -lifetime, and the contents of this memory are unspecified for the -lifetime of the buffer. If the container memory is modified by the host -during the lifetime of this buffer, then the results are undefined. +The buffer may use the memory within the container for its full lifetime, +and the contents of this memory are unspecified for the lifetime of the +buffer. +If the container memory is modified by the host during the lifetime of this +buffer, then the results are undefined. When the buffer is destroyed, the destructor will block until all work in -queues on the buffer have completed. If the return type of -[code]#std::data(container)# is not [code]#const# then the destructor will also -copy the contents of the buffer to the container (if required). +queues on the buffer have completed. +If the return type of [code]#std::data(container)# is not [code]#const# then +the destructor will also copy the contents of the buffer to the container +(if required). -- -If [code]#set_final_data()# is used to change where to write the -data back to, then the destructor of the buffer will block if a -write accessor on it has been created. +If [code]#set_final_data()# is used to change where to write the data back +to, then the destructor of the buffer will block if a write accessor on it +has been created. -A sub-buffer object can be created which is a sub-range reference to a -base buffer. This sub-buffer can be used to create accessors to the -base buffer, which have access to the range specified at time -of construction of the sub-buffer. Sub-buffers cannot be created from -sub-buffers, but only from a base buffer which is not already a sub-buffer. +A sub-buffer object can be created which is a sub-range reference to a base +buffer. +This sub-buffer can be used to create accessors to the base buffer, which +have access to the range specified at time of construction of the +sub-buffer. +Sub-buffers cannot be created from sub-buffers, but only from a base buffer +which is not already a sub-buffer. Sub-buffers must be constructed from a contiguous region of memory in a -buffer. This requirement is potentially non-intuitive when working with -buffers that have dimensionality larger than one, but maps to -one-dimensional <> native allocations without performance cost due -to index mapping computation. For example: +buffer. +This requirement is potentially non-intuitive when working with buffers that +have dimensionality larger than one, but maps to one-dimensional <> +native allocations without performance cost due to index mapping +computation. +For example: [source,,linenums] ---- @@ -5151,30 +5129,31 @@ include::{code_dir}/subbuffer.cpp[lines=4..-1] [[subsec:images]] === Images -The classes [code]#unsampled_image# -(<>) and [code]#sampled_image# -(<>) define shared image data of one, -two or three dimensions, that can be used by kernels in queues and have to be -accessed using the image <> classes. +The classes [code]#unsampled_image# (<>) +and [code]#sampled_image# (<>) define +shared image data of one, two or three dimensions, that can be used by +kernels in queues and have to be accessed using the image <> +classes. The constructors and member functions of the SYCL [code]#unsampled_image# and [code]#sampled_image# class templates are listed in <>, <>, <> and <>, -respectively. The additional common special member functions and common member -functions are listed in <> and +respectively. +The additional common special member functions and common member functions +are listed in <> and <>, respectively. -Where relevant, it is the responsibility of the user to ensure that the format -of the data matches the format described by [code]#image_format#. +Where relevant, it is the responsibility of the user to ensure that the +format of the data matches the format described by [code]#image_format#. The allocator template parameter of the SYCL [code]#unsampled_image# and [code]#sampled_image# classes can be any allocator type including a custom allocator, however it must allocate in units of [code]#std::byte#. -For any image that is constructed with the range latexmath:[(r1,r2,r3)] with an element -type size in bytes of _s_, the image row pitch and image slice pitch should be -calculated as follows: +For any image that is constructed with the range latexmath:[(r1,r2,r3)] with +an element type size in bytes of _s_, the image row pitch and image slice +pitch should be calculated as follows: [[image-row-pitch]] [latexmath] @@ -5188,9 +5167,8 @@ r1 \cdot s r1 \cdot r2 \cdot s ++++ -The SYCL [code]#unsampled_image# and [code]#sampled_image# class -templates provide the common reference semantics -(see <>). +The SYCL [code]#unsampled_image# and [code]#sampled_image# class templates +provide the common reference semantics (see <>). ==== Unsampled image interface @@ -5204,10 +5182,10 @@ Each constructor additionally takes as the last parameter an optional SYCL The SYCL [code]#unsampled_image# class template takes a template parameter [code]#AllocatorT# for specifying an allocator which is used by the -<> when allocating temporary memory on the host. If no template -argument is provided, the default allocator for the SYCL -[code]#unsampled_image# class [code]#image_allocator# is used -(see <>). +<> when allocating temporary memory on the host. +If no template argument is provided, the default allocator for the SYCL +[code]#unsampled_image# class [code]#image_allocator# is used (see +<>). // Interface for class: unsampled image [source,,linenums] @@ -5703,10 +5681,9 @@ have any effect. ==== Sampled image interface -Each constructor of the [code]#sampled_image# class requires a -pointer to the host data the image will sample, an -[code]#image_format# to describe the data layout and an -[code]#image_sampler# (<>) to describe +Each constructor of the [code]#sampled_image# class requires a pointer to +the host data the image will sample, an [code]#image_format# to describe the +data layout and an [code]#image_sampler# (<>) to describe how to sample the image data. Each constructor additionally takes as the last parameter an optional SYCL @@ -5929,8 +5906,8 @@ host_sampled_image_accessor get_host_access() ==== Image properties The properties that can be provided when constructing the SYCL -[code]#unsampled_image# and [code]#sampled_image# classes are -describe in <>. +[code]#unsampled_image# and [code]#sampled_image# classes are describe in +<>. // Interface for image properties [source,,linenums] @@ -6039,53 +6016,58 @@ context property::image::context_bound::get_context() const The rules are similar to those described in <>. -For the lifetime of the image object, the associated host memory must -be left available to the <> and the contents of the associated -host memory is unspecified until the image object is destroyed. If an -image object value is copied, then only a reference to the underlying -image object is copied. The underlying image object is reference-counted. -Only after all image value references to the underlying image object -have been destroyed is the actual image object itself destroyed. - -If an image object is constructed with associated host memory, then -its destructor blocks until all operations in all SYCL queues on -that image object have completed. Any modifications to the image data -will be copied back, if necessary, to the associated host memory. +For the lifetime of the image object, the associated host memory must be +left available to the <> and the contents of the associated +host memory is unspecified until the image object is destroyed. +If an image object value is copied, then only a reference to the underlying +image object is copied. +The underlying image object is reference-counted. +Only after all image value references to the underlying image object have +been destroyed is the actual image object itself destroyed. + +If an image object is constructed with associated host memory, then its +destructor blocks until all operations in all SYCL queues on that image +object have completed. +Any modifications to the image data will be copied back, if necessary, to +the associated host memory. Any errors occurring during destruction are reported to any associated -context's asynchronous error handler. If an image object is constructed -with a storage object, then the storage object defines what -synchronization or copying behavior occurs on image object destruction. +context's asynchronous error handler. +If an image object is constructed with a storage object, then the storage +object defines what synchronization or copying behavior occurs on image +object destruction. [[sec:sharing-host-memory-with-dm]] === Sharing host memory with the SYCL data management classes -In order to allow the <> to do memory management and allow -for data dependencies, there are two classes defined, buffer and image. The -default behavior for them is that a "`raw`" pointer is given during the -construction of the data management class, with full ownership to use it until -the destruction of the SYCL object. +In order to allow the <> to do memory management and allow for +data dependencies, there are two classes defined, buffer and image. +The default behavior for them is that a "`raw`" pointer is given during the +construction of the data management class, with full ownership to use it +until the destruction of the SYCL object. -In this section we go in greater detail on sharing or explicitly not -sharing host memory with the SYCL data classes, and we will use the buffer -class as an example. The same rules will apply to images as well. +In this section we go in greater detail on sharing or explicitly not sharing +host memory with the SYCL data classes, and we will use the buffer class as +an example. +The same rules will apply to images as well. ==== Default behavior -When using a SYCL buffer, the ownership of the pointer passed to the constructor -of the class is, by default, passed to <>, and that pointer cannot be used -on the host side until the buffer or image is destroyed. -A SYCL application can access the contents of the memory managed by a SYCL buffer -by using a [code]#host_accessor# as defined in <>. +When using a SYCL buffer, the ownership of the pointer passed to the +constructor of the class is, by default, passed to <>, and +that pointer cannot be used on the host side until the buffer or image is +destroyed. +A SYCL application can access the contents of the memory managed by a SYCL +buffer by using a [code]#host_accessor# as defined in <>. However, there is no guarantee that the host accessor synchronizes with the original host address used in its constructor. -The pointer passed in is the one used to copy data back to the host, if needed, -before buffer destruction. The memory pointed by <> -will not be de-allocated by the runtime, -and the data is copied back from the device if there is -a need for it. +The pointer passed in is the one used to copy data back to the host, if +needed, before buffer destruction. +The memory pointed by <> will not be de-allocated by the +runtime, and the data is copied back from the device if there is a need for +it. ==== SYCL ownership of the host memory @@ -6094,10 +6076,10 @@ In the case where there is host memory to be used for initialization of data but there is no intention of using that host memory after the buffer is destroyed, then the buffer can take full ownership of that host memory. -When a buffer owns the <> there is no copy back, by -default. In this situation, the SYCL application may pass a unique -pointer to the host data, which will be then used by the runtime -internally to initialize the data in the device. +When a buffer owns the <> there is no copy back, by default. +In this situation, the SYCL application may pass a unique pointer to the +host data, which will be then used by the runtime internally to initialize +the data in the device. For example, the following could be used: @@ -6111,10 +6093,9 @@ For example, the following could be used: } ---- -However, optionally the [code]#buffer::set_final_data()# can be -set to a [code]#std::weak_ptr# to enable copying data -back, to another host memory address that is going to be valid after -buffer construction. +However, optionally the [code]#buffer::set_final_data()# can be set to a +[code]#std::weak_ptr# to enable copying data back, to another host memory +address that is going to be valid after buffer construction. [source,,linenums] ---- @@ -6133,26 +6114,28 @@ buffer construction. When an instance of [code]#std::shared_ptr# is passed to the buffer constructor, then the buffer object and the developer's application share -the memory region. If the shared pointer is still used on the application's -side then the data will be copied back from the buffer or image and will be -available to the application after the buffer or image is destroyed. +the memory region. +If the shared pointer is still used on the application's side then the data +will be copied back from the buffer or image and will be available to the +application after the buffer or image is destroyed. If the [code]#shared_ptr# is not empty, the contents of the referenced -memory are used to initialize the buffer. If the [code]#shared_ptr# is -empty, then the buffer is created with uninitialized memory. +memory are used to initialize the buffer. +If the [code]#shared_ptr# is empty, then the buffer is created with +uninitialized memory. When the buffer is destroyed and the data have potentially been updated, if the number of copies of the shared pointer outside the runtime is 0, there -is no user-side shared pointer to read the data. Therefore the data is not -copied out, and the buffer destructor does not need to wait for the data -processes to be finished, as the outcome is not needed on the application's -side. +is no user-side shared pointer to read the data. +Therefore the data is not copied out, and the buffer destructor does not +need to wait for the data processes to be finished, as the outcome is not +needed on the application's side. -This behavior can be overridden using the [code]#set_final_data()# -member function of the buffer class, which will by any means force the buffer +This behavior can be overridden using the [code]#set_final_data()# member +function of the buffer class, which will by any means force the buffer destructor to wait until the data is copied to wherever the -[code]#set_final_data()# member function has put the data (or not wait nor copy -if set final data is [code]#nullptr)#. +[code]#set_final_data()# member function has put the data (or not wait nor +copy if set final data is [code]#nullptr)#. [source,,linenums] ---- @@ -6183,11 +6166,11 @@ if set final data is [code]#nullptr)#. [[subsec:mutex]] === Synchronization primitives -When the user wants to use the [code]#buffer# simultaneously in -the <> and their own code (e.g. a multi-threaded -mechanism) and wants to use manual synchronization without using a -[code]#host_accessor#, a [code]#std::mutex# can be passed to the -[code]#buffer# constructor via the right [code]#property#. +When the user wants to use the [code]#buffer# simultaneously in the +<> and their own code (e.g. a multi-threaded mechanism) and +wants to use manual synchronization without using a [code]#host_accessor#, a +[code]#std::mutex# can be passed to the [code]#buffer# constructor via the +right [code]#property#. The runtime promises to lock the mutex whenever the data is in use and unlock it when it no longer needs it. @@ -6209,8 +6192,8 @@ unlock it when it no longer needs it. ---- When the runtime releases the mutex the user is guaranteed that the data was -copied back on the shared pointer --- unless the final data destination has been -changed using the member function [code]#set_final_data()#. +copied back on the shared pointer --- unless the final data destination has +been changed using the member function [code]#set_final_data()#. [[subsec:accessors]] @@ -6221,25 +6204,26 @@ changed using the member function [code]#set_final_data()#. <> provide three different capabilities: they provide access to the data managed by a <> or <>, they provide access -to local memory on a <>, and they define the *requirements* to memory -objects which determine the scheduling of <> (see +to local memory on a <>, and they define the *requirements* to +memory objects which determine the scheduling of <> (see <>). -A memory object requirement is created when an accessor is constructed, unless -the accessor is a placeholder in which case the requirement is created when -the accessor is bound to a <> by calling [code]#handler::require()#. +A memory object requirement is created when an accessor is constructed, +unless the accessor is a placeholder in which case the requirement is +created when the accessor is bound to a <> by calling +[code]#handler::require()#. There are several different {cpp} classes that implement accessors: -* The [code]#accessor# class provides access to data in a [code]#buffer# from - within a <>. +* The [code]#accessor# class provides access to data in a [code]#buffer# + from within a <>. -* The [code]#host_accessor# class provides access to data in a [code]#buffer# - from host code that is outside of a <>. These accessors are - typically used in <>. +* The [code]#host_accessor# class provides access to data in a + [code]#buffer# from host code that is outside of a <>. + These accessors are typically used in <>. -* The [code]#local_accessor# class provides access to device local memory from - within a <>. +* The [code]#local_accessor# class provides access to device local memory + from within a <>. * The [code]#unsampled_image_accessor# and [code]#sampled_image_accessor# classes provide access to data in an [code]#unsampled_image# and @@ -6248,31 +6232,36 @@ There are several different {cpp} classes that implement accessors: * The [code]#host_unsampled_image_accessor# and [code]#host_sampled_image_accessor# classes provide access to data in an [code]#unsampled_image# and [code]#sampled_image# from host code that is - outside of a <>. These accessors are typically used in - <>. + outside of a <>. + These accessors are typically used in <>. Accessor objects must always be constructed in host code, either in -<> or in <>. Whether the constructor -blocks waiting for data to synchronize depends on the type of accessor. Those -accessors which provide access to data within a <> do not block. -Instead, these accessors define a requirement which influences the scheduling -of the <>. Those accessors which provide access to data from host -code do block until the data is available on the host. +<> or in <>. +Whether the constructor blocks waiting for data to synchronize depends on +the type of accessor. +Those accessors which provide access to data within a <> do not +block. +Instead, these accessors define a requirement which influences the +scheduling of the <>. +Those accessors which provide access to data from host code do block until +the data is available on the host. For those accessors which provide access to data within a <>, the member functions which access data should only be called from within the -<>. Programs which call these member functions from outside of the -<> are ill formed. The sections below describe exactly which member -functions fall into this category. +<>. +Programs which call these member functions from outside of the <> +are ill formed. +The sections below describe exactly which member functions fall into this +category. ==== Data type -All accessors have a [code]#DataT# template parameter which specifies the type -of each element that the accessor accesses. For [code]#accessor# and -[code]#host_accessor#, this type must either match the type of each element in -the underlying [code]#buffer#, or it must be a [code]#const# qualified version -of that type. +All accessors have a [code]#DataT# template parameter which specifies the +type of each element that the accessor accesses. +For [code]#accessor# and [code]#host_accessor#, this type must either match +the type of each element in the underlying [code]#buffer#, or it must be a +[code]#const# qualified version of that type. For the image accessors ([code]#unsampled_image_accessor#, [code]#sampled_image_accessor#, [code]#host_unsampled_image_accessor#, and @@ -6290,15 +6279,17 @@ For [code]#local_accessor# see <> for the allowable ==== Access modes Most accessors have an [code]#AccessMode# template parameter which specifies -whether the accessor can read or write the underlying data. This information -is used by the runtime when defining the requirements for the associated -<>, and it tells the runtime whether data needs to be transferred to -or from a device before data can be accessed through the accessor. - -The [code]#access_mode# enumeration, shown in <>, -describes the potential modes of an accessor. However, not all accessor -classes support all modes, so see the description of each class for more -details. +whether the accessor can read or write the underlying data. +This information is used by the runtime when defining the requirements for +the associated <>, and it tells the runtime whether data needs to +be transferred to or from a device before data can be accessed through the +accessor. + +The [code]#access_mode# enumeration, shown in +<>, describes the potential modes of an +accessor. +However, not all accessor classes support all modes, so see the description +of each class for more details. [source,,linenums] ---- @@ -6337,10 +6328,11 @@ access_mode::read_write ==== Deduction tags Some accessor constructors take a [code]#TagT# parameter, which is used to -deduce template arguments for the constructor's class. Each of the access -modes in <> has an associated tag, but there are -additional tags which set other template parameters in addition to the access -mode. The synopsis below shows the namespace scope variables that the +deduce template arguments for the constructor's class. +Each of the access modes in <> has an associated +tag, but there are additional tags which set other template parameters in +addition to the access mode. +The synopsis below shows the namespace scope variables that the implementation provides as possible values for the [code]#TagT# parameter. [source,,linenums] @@ -6356,9 +6348,10 @@ section that pertains to each of the accessor types. ==== Properties All accessor constructors accept a [code]#property_list# parameter, which -affects the semantics of the accessor. <> shows -the set of all possible accessor properties and tells which properties are -allowed when constructing each accessor class. +affects the semantics of the accessor. +<> shows the set of all possible accessor +properties and tells which properties are allowed when constructing each +accessor class. [source,,linenums] ---- @@ -6410,13 +6403,14 @@ this range is preserved. [NOTE] ==== As stated above, the [code]#property::no_init# property requires the -application to construct an object for each accessor element when the element's -type is not an implicit-lifetime type (except in the case when the -corresponding buffer element did not previously contain an object). The reason -for this requirement is to avoid the possibility of overwriting a valid object -with indeterminate bytes, for example, when a <> using the accessor -completes. This means that the implementation can unconditionally copy memory -from the device back to the host when the <> completes, regardless of +application to construct an object for each accessor element when the +element's type is not an implicit-lifetime type (except in the case when the +corresponding buffer element did not previously contain an object). +The reason for this requirement is to avoid the possibility of overwriting a +valid object with indeterminate bytes, for example, when a <> using +the accessor completes. +This means that the implementation can unconditionally copy memory from the +device back to the host when the <> completes, regardless of whether the [code]#DataT# type is an implicit-lifetime type. ==== @@ -6440,136 +6434,148 @@ property::no_init::no_init() ==== Read only accessors -Accessors which have an [code]#AccessMode# template parameter can be declared -as read-only by specifying [code]#access_mode::read# for the template -parameter. A read-only accessor provides read-only access to the underlying -data and provides a "read" requirement for the memory object when it is -constructed. +Accessors which have an [code]#AccessMode# template parameter can be +declared as read-only by specifying [code]#access_mode::read# for the +template parameter. +A read-only accessor provides read-only access to the underlying data and +provides a "read" requirement for the memory object when it is constructed. The [code]#DataT# template parameter for a read-only accessor can optionally be [code]#const# qualified, and the semantics of the accessor are unchanged. For example, an accessor declared with [code]#const DataT# and -[code]#access_mode::read# has the same semantics as an accessor declared with -[code]#DataT# and [code]#access_mode::read#. +[code]#access_mode::read# has the same semantics as an accessor declared +with [code]#DataT# and [code]#access_mode::read#. As detailed in the sections below, some accessor types have a default value for [code]#AccessMode#, which depends on whether the [code]#DataT# parameter -is [code]#const# qualified. This provides a convenient way to declare a -read-only accessor without explicitly specifying the access mode. +is [code]#const# qualified. +This provides a convenient way to declare a read-only accessor without +explicitly specifying the access mode. A [code]#const# qualified [code]#DataT# is only allowed for a read-only -accessor. Programs which specify a [code]#const# qualified [code]#DataT# and -any access mode other than [code]#access_mode::read# are ill formed, and the +accessor. +Programs which specify a [code]#const# qualified [code]#DataT# and any +access mode other than [code]#access_mode::read# are ill formed, and the implementation must issue a diagnostic in this case. Each accessor class also provides implicit conversions between the two forms -of read-only accessors. This makes it possible, for example, to assign an -accessor whose type has [code]#const DataT# and [code]#access_mode::read# to an -accessor whose type has [code]#DataT# and [code]#access_mode::read#, so long as -the other template parameters are the same. There is also an implicit -conversion from a read-write accessor to either of the forms of a read-only -accessor. These implicit conversions are described in detail for each accessor -class in the sections that follow. +of read-only accessors. +This makes it possible, for example, to assign an accessor whose type has +[code]#const DataT# and [code]#access_mode::read# to an accessor whose type +has [code]#DataT# and [code]#access_mode::read#, so long as the other +template parameters are the same. +There is also an implicit conversion from a read-write accessor to either of +the forms of a read-only accessor. +These implicit conversions are described in detail for each accessor class +in the sections that follow. ==== Accessing elements of an accessor Accessors of type [code]#accessor#, [code]#host_accessor#, and -[code]#local_accessor# can have zero, one, two, or three Dimensions. A zero -dimension accessor provides access to a single scalar element via an implicit -conversion operator to the underlying type of that element and via an overloaded -copy/move assignment operators from the underlying type of the element. +[code]#local_accessor# can have zero, one, two, or three Dimensions. +A zero dimension accessor provides access to a single scalar element via an +implicit conversion operator to the underlying type of that element and via +an overloaded copy/move assignment operators from the underlying type of the +element. One, two, or three dimensional specializations of these accessors provide -access to the elements they contain in two ways. The first way is through a -subscript operator that takes an instance of an [code]#id# class which has the -same dimensionality as the accessor. The second way is by passing a single -[code]#size_t# value to multiple consecutive subscript operators as specified -in <>. +access to the elements they contain in two ways. +The first way is through a subscript operator that takes an instance of an +[code]#id# class which has the same dimensionality as the accessor. +The second way is by passing a single [code]#size_t# value to multiple +consecutive subscript operators as specified in <>. In all these cases, the reference to the contained element is of type [code]#const DataT&# for read-only accessors and of type [code]#DataT&# for other accessors. -Accessors of all types have a range that defines the set of indices that may be -used to access elements. For buffer accessors, this is the range of the -underlying buffer, unless it is a <> in which case the range -comes from the accessor's constructor. For image accessors, this is the range -of the underlying image. Local accessors specify the range when the accessor -is constructed. Any attempt to access an element via an index that is outside -of this range produces undefined behavior. +Accessors of all types have a range that defines the set of indices that may +be used to access elements. +For buffer accessors, this is the range of the underlying buffer, unless it +is a <> in which case the range comes from the accessor's +constructor. +For image accessors, this is the range of the underlying image. +Local accessors specify the range when the accessor is constructed. +Any attempt to access an element via an index that is outside of this range +produces undefined behavior. ==== Container interface Accessors of type [code]#accessor#, [code]#host_accessor#, and [code]#local_accessor# meet the {cpp} requirement of -[code]#ReversibleContainer#. The exception to this is that only -[code]#local_accessor# owns the underlying data, meaning that its destructor -destroys elements and frees the memory. The [code]#accessor# and -[code]#host_accessor# types don't destroy any elements or free the memory on -destruction. The iterator for the container interface meets the {cpp} -requirement of [code]#LegacyRandomAccessIterator# and the underlying -pointers/references correspond to the address space specified by the accessor -type. For multidimensional accessors the iterator linearizes the data -according to <>. +[code]#ReversibleContainer#. +The exception to this is that only [code]#local_accessor# owns the +underlying data, meaning that its destructor destroys elements and frees the +memory. +The [code]#accessor# and [code]#host_accessor# types don't destroy any +elements or free the memory on destruction. +The iterator for the container interface meets the {cpp} requirement of +[code]#LegacyRandomAccessIterator# and the underlying pointers/references +correspond to the address space specified by the accessor type. +For multidimensional accessors the iterator linearizes the data according to +<>. [[sec:accessors.ranged]] ==== Ranged accessors -Accessors of type [code]#accessor# and [code]#host_accessor# can be constructed -from a sub-range of a [code]#buffer# by providing a range and offset to the -constructor. This limits the elements that can be accessed to the specified -sub-range, which allows the implementation to perform certain optimizations such -as reducing the amount of memory that needs to be copied to or from a device. +Accessors of type [code]#accessor# and [code]#host_accessor# can be +constructed from a sub-range of a [code]#buffer# by providing a range and +offset to the constructor. +This limits the elements that can be accessed to the specified sub-range, +which allows the implementation to perform certain optimizations such as +reducing the amount of memory that needs to be copied to or from a device. If the ranged accessor is multi-dimensional, the sub-range is allowed to describe a region of memory in the underlying buffer that is not contiguous -in the linear address space. It is also legal to construct several ranged -accessors for the same underlying buffer, either overlapping or -non-overlapping. - -A ranged accessor still creates a requisite for the entire underlying buffer, -even for the portions not within the range. For example, if one command writes -through a ranged accessor to one region of a buffer and a second command reads -through a ranged accessor from a non-overlapping region of the same buffer, the -second command must still be scheduled after the first because the requisites -for the two commands are on the entire buffer, not on the sub-ranges of the -ranged accessors. +in the linear address space. +It is also legal to construct several ranged accessors for the same +underlying buffer, either overlapping or non-overlapping. + +A ranged accessor still creates a requisite for the entire underlying +buffer, even for the portions not within the range. +For example, if one command writes through a ranged accessor to one region +of a buffer and a second command reads through a ranged accessor from a +non-overlapping region of the same buffer, the second command must still be +scheduled after the first because the requisites for the two commands are on +the entire buffer, not on the sub-ranges of the ranged accessors. Most of the accessor member functions which provide a reference to the underlying buffer elements are affected by a ranged accessor's offset and -range. For example, calling [code]#operator[](0)# on a one-dimensional ranged +range. +For example, calling [code]#operator[](0)# on a one-dimensional ranged accessor returns a reference to the element at the position specified by the accessor's offset, which is not necessarily the first element in the buffer. -In addition, the accessor's iterator functions iterate only over the elements -that are within the sub-range. +In addition, the accessor's iterator functions iterate only over the +elements that are within the sub-range. The only exceptions are the [code]#get_pointer# and [code]#get_multi_ptr# member functions, which return a pointer to the beginning of the underlying -buffer regardless of the accessor's offset. Applications using these functions -must take care to manually add the offset before dereferencing the pointer -because accessing an element that is outside of the accessor's range results -in undefined behavior. +buffer regardless of the accessor's offset. +Applications using these functions must take care to manually add the offset +before dereferencing the pointer because accessing an element that is +outside of the accessor's range results in undefined behavior. [NOTE] ==== There is no change in behavior for ranged accessors with a range of zero. -It still creates a requisite for the entire underlying buffer, and -an attempt to access an element produces undefined behaviour. +It still creates a requisite for the entire underlying buffer, and an +attempt to access an element produces undefined behaviour. ==== ==== Buffer accessor for commands The [code]#accessor# class provides access to data in a [code]#buffer# from -within a <> or from within a <>. When used in -a <>, it accesses the contents of the buffer via the -device's <>. These two forms of the accessor are distinguished -by the [code]#AccessTarget# template parameter as shown in -<>. Both forms support the -following values for the [code]#AccessMode# template parameter: -[code]#access_mode::read#, [code]#access_mode::write# and +within a <> or from within a <>. +When used in a <>, it accesses the contents of the +buffer via the device's <>. +These two forms of the accessor are distinguished by the +[code]#AccessTarget# template parameter as shown in +<>. +Both forms support the following values for the [code]#AccessMode# template +parameter: [code]#access_mode::read#, [code]#access_mode::write# and [code]#access_mode::read_write#. [[table.accessors.command.buffer.capabilities]] @@ -6584,35 +6590,37 @@ following values for the [code]#AccessMode# template parameter: |==== Programs which specify the access target as [code]#target::device# and then -capture the [code]#accessor# in a <> can only use the accessor for -interoperability through the [code]#interop_handle#, any other uses result in -undefined behavior. +capture the [code]#accessor# in a <> can only use the accessor +for interoperability through the [code]#interop_handle#, any other uses +result in undefined behavior. -Programs which specify the access target as [code]#target::host_task# and then -use the [code]#accessor# from a <> result in undefined -behavior. +Programs which specify the access target as [code]#target::host_task# and +then use the [code]#accessor# from a <> result in +undefined behavior. -The dimensionality of the accessor must match the underlying buffer, however, -there is a special case if the buffer is one-dimensional. In this case, the -accessor may either be one-dimensional or it may be zero-dimensional. A -zero-dimensional accessor has access to just the first element of the buffer, -whereas a one-dimensional accessor has access to the entire buffer. - -Certain [code]#accessor# constructors create a "placeholder" accessor. Such -an accessor is bound to a [code]#buffer# and its semantics such as access -target and access mode are defined. However, a placeholder accessor is not -yet bound to a <>. Before such an accessor can be used in a -<>, it must be bound by calling [code]#handler::require()#. Passing a -placeholder accessor as an argument to a <> without first being bound -to a <> with [code]#handler::require()# will result in undefined -behavior. +The dimensionality of the accessor must match the underlying buffer, +however, there is a special case if the buffer is one-dimensional. +In this case, the accessor may either be one-dimensional or it may be +zero-dimensional. +A zero-dimensional accessor has access to just the first element of the +buffer, whereas a one-dimensional accessor has access to the entire buffer. + +Certain [code]#accessor# constructors create a "placeholder" accessor. +Such an accessor is bound to a [code]#buffer# and its semantics such as +access target and access mode are defined. +However, a placeholder accessor is not yet bound to a <>. +Before such an accessor can be used in a <>, it must be bound by +calling [code]#handler::require()#. +Passing a placeholder accessor as an argument to a <> without first +being bound to a <> with [code]#handler::require()# will +result in undefined behavior. [NOTE] ==== -Implementations are encouraged to throw either a synchronous or an asynchronous -exception when a placeholder accessor, that has not been bound to the -corresponding <> with [code]#handler::require()#, is either -passed as an argument to or is used inside a <>. +Implementations are encouraged to throw either a synchronous or an +asynchronous exception when a placeholder accessor, that has not been bound +to the corresponding <> with [code]#handler::require()#, is +either passed as an argument to or is used inside a <>. ==== @@ -6620,23 +6628,26 @@ passed as an argument to or is used inside a <>. A synopsis of the [code]#accessor# class is provided below, showing the interface when it is specialized with [code]#target::device# or -[code]#target::host_task#. Since some of the class types and member functions -have the same name and meaning as other accessors, the common types and -functions are described in <>. The member types -are listed in <> and -<>. The constructors are listed in -<>, and the member functions are -listed in <> and +[code]#target::host_task#. +Since some of the class types and member functions have the same name and +meaning as other accessors, the common types and functions are described in +<>. +The member types are listed in <> and +<>. +The constructors are listed in +<>, and the member functions +are listed in <> and <>. -The additional common special member functions and common member functions are -listed in <> in +The additional common special member functions and common member functions +are listed in <> in <> and -<>, respectively. For valid implicit -conversions between accessor types refer to -<>. Additionally, accessors of the -same type must be equality comparable both in the host application and also in -<>. +<>, respectively. +For valid implicit conversions between accessor types refer to +<>. +Additionally, accessors of the same type must be equality comparable both in +the host application and also in <>. [source,,linenums] ---- @@ -6678,10 +6689,11 @@ accessor() -- * [code]#(empty() == true)# * All size queries return [code]#0#. - * The return values of [code]#get_pointer()# and [code]#get_multi_ptr()# are - unspecified. - * A default constructed accessor can be passed to a <>, - but attempting to access data elements from it produces undefined behavior. + * The return values of [code]#get_pointer()# and [code]#get_multi_ptr()# + are unspecified. + * A default constructed accessor can be passed to a + <>, but attempting to access data elements from it + produces undefined behavior. -- a@ @@ -7037,9 +7049,10 @@ This function may only be called from within a <>. [[sec:accessor.command.buffer.tags]] ===== Deduction tags for buffer command accessors -Some [code]#accessor# constructors take a [code]#TagT# parameter, which is used -to deduce template arguments. The permissible values for this parameter are -listed in <> along with the access mode and +Some [code]#accessor# constructors take a [code]#TagT# parameter, which is +used to deduce template arguments. +The permissible values for this parameter are listed in +<> along with the access mode and accessor target that they imply. [[table.accessors.command.buffer.tags]] @@ -7072,10 +7085,10 @@ accessor target that they imply. ===== Read only buffer command accessors and implicit conversions <> shows the specializations of -[code]#accessor# with [code]#target::device# or -[code]#target::host_task# that are read-only accessors. There is an implicit -conversion between any of these specializations, provided that all other -template parameters are the same. +[code]#accessor# with [code]#target::device# or [code]#target::host_task# +that are read-only accessors. +There is an implicit conversion between any of these specializations, +provided that all other template parameters are the same. [[table.accessors.command.buffer.read-only]] .Specializations of [code]#accessor# that are read-only @@ -7086,10 +7099,11 @@ template parameters are the same. | const-qualified | [code]#access_mode::read# |==== -There is also an implicit conversion from the read-write specialization shown -in <> to any of the read-only -specializations shown in <>, provided -that all other template parameters are the same. +There is also an implicit conversion from the read-write specialization +shown in <> to any of the +read-only specializations shown in +<>, provided that all other +template parameters are the same. [[table.accessors.command.buffer.read-write]] .Specializations of [code]#accessor# that are read-write @@ -7102,25 +7116,27 @@ that all other template parameters are the same. ===== Deprecated features of the [code]#accessor# class -All of the features defined in this section are deprecated and will likely be -removed from a future version of the specification. +All of the features defined in this section are deprecated and will likely +be removed from a future version of the specification. ====== Aliased names -The enumerated value [code]#target::global_buffer# is an alias for [code]#target:::device#. +The enumerated value [code]#target::global_buffer# is an alias for +[code]#target:::device#. It has the same type and value as its alias. -The enumerated type [code]#access::target# is an alias for [code]#target#, and -the enumerated type [code]#access::mode# is an alias for [code]#access_mode#. +The enumerated type [code]#access::target# is an alias for [code]#target#, +and the enumerated type [code]#access::mode# is an alias for +[code]#access_mode#. ====== Discard access modes An [code]#accessor# instance specialized with access mode -[code]#access_mode::discard_write# has the same behavior as an [code]#accessor# -instance of mode [code]#access_mode::write# that is constructed with the -property [code]#property::no_init#. +[code]#access_mode::discard_write# has the same behavior as an +[code]#accessor# instance of mode [code]#access_mode::write# that is +constructed with the property [code]#property::no_init#. An [code]#accessor# instance specialized with access mode [code]#access_mode::discard_read_write# has the same behavior as an @@ -7130,18 +7146,18 @@ constructed with the property [code]#property::no_init#. ====== Placeholder template parameter -The [code]#accessor# template parameter [code]#IsPlaceholder# is allowed to be -specified, but it has no bearing on whether the [code]#accessor# instance is a -placeholder. This is determined solely by the constructor used to create the -instance. +The [code]#accessor# template parameter [code]#IsPlaceholder# is allowed to +be specified, but it has no bearing on whether the [code]#accessor# instance +is a placeholder. +This is determined solely by the constructor used to create the instance. The associated type [code]#access::placeholder# is also deprecated. ====== Additional member functions for [code]#target::device# specialization -Specializations of the [code]#accessor# class with [code]#target::device# have -the additional member functions described in +Specializations of the [code]#accessor# class with [code]#target::device# +have the additional member functions described in <>. [[table.accessors.deprecated.command.buffer.members]] @@ -7168,41 +7184,44 @@ size_t get_count() const ====== Accessor specialization with [code]#target::constant_buffer# The [code]#accessor# class may be specialized with target -[code]#target::constant_buffer#, which results in an accessor that can be used -within a <> to access the contents of a buffer through -the device's <>. - -As with other [code]#accessor# specializations, the dimensionality must match -the underlying buffer, however there is a special case if the buffer is -one-dimensional. In this case, the accessor may either be one-dimensional or -it may be zero-dimensional. A zero-dimensional accessor has access to just the -first element of the buffer, whereas a one-dimensional accessor has access to the -entire buffer. - -This specialization of [code]#accessor# is available only for the access mode -[code]#access_mode::read#. - -This accessor type can be constructed as a "placeholder" accessor. As with -other [code]#accessor# specializations that are placeholders, -[code]#handler::require()# must be called before passing a placeholder accessor -to a <>. Passing a placeholder accessor as an argument to a -<> without first being bound to a <> with -[code]#handler::require()# will result in undefined behavior. +[code]#target::constant_buffer#, which results in an accessor that can be +used within a <> to access the contents of a buffer +through the device's <>. + +As with other [code]#accessor# specializations, the dimensionality must +match the underlying buffer, however there is a special case if the buffer +is one-dimensional. +In this case, the accessor may either be one-dimensional or it may be +zero-dimensional. +A zero-dimensional accessor has access to just the first element of the +buffer, whereas a one-dimensional accessor has access to the entire buffer. + +This specialization of [code]#accessor# is available only for the access +mode [code]#access_mode::read#. + +This accessor type can be constructed as a "placeholder" accessor. +As with other [code]#accessor# specializations that are placeholders, +[code]#handler::require()# must be called before passing a placeholder +accessor to a <>. +Passing a placeholder accessor as an argument to a <> without first +being bound to a <> with [code]#handler::require()# will +result in undefined behavior. A synopsis for this specialization of [code]#accessor# is provided below. Since some of the class types and member functions have the same name and meaning as other accessors, the common types and functions are described in -<>. The member types are listed in -<>. The constructors are listed in -<>, and the member functions -are listed in <> and +<>. +The member types are listed in <>. +The constructors are listed in +<>, and the member +functions are listed in <> and <>. -The additional common special member functions and common member functions are -listed in <> in +The additional common special member functions and common member functions +are listed in <> in <> and -<>, respectively. Additionally, -accessors of the same type must be equality comparable. +<>, respectively. +Additionally, accessors of the same type must be equality comparable. [source,,linenums] ---- @@ -7388,17 +7407,18 @@ This function may only be called from within a <>. The [code]#accessor# class may be specialized with target [code]#target::host_buffer#, which results in a host accessor similar to -[code]#host_accessor#. This specialization provides access to data in a -[code]#buffer# from host code that is outside of a <>, and -constructors of this specialization block until the requested data is available -on the host. - -As with other [code]#accessor# specializations, the dimensionality must match -the underlying buffer, however there is a special case if the buffer is -one-dimensional. In this case, the accessor may either be one-dimensional or -it may be zero-dimensional. A zero-dimensional accessor has access to just the -first element of the buffer, whereas a one-dimensional accessor has access to the -entire buffer. +[code]#host_accessor#. +This specialization provides access to data in a [code]#buffer# from host +code that is outside of a <>, and constructors of this +specialization block until the requested data is available on the host. + +As with other [code]#accessor# specializations, the dimensionality must +match the underlying buffer, however there is a special case if the buffer +is one-dimensional. +In this case, the accessor may either be one-dimensional or it may be +zero-dimensional. +A zero-dimensional accessor has access to just the first element of the +buffer, whereas a one-dimensional accessor has access to the entire buffer. This specialization of [code]#accessor# is available for all access modes except for [code]#access_mode::atomic#. @@ -7406,17 +7426,18 @@ except for [code]#access_mode::atomic#. A synopsis for this specialization of [code]#accessor# is provided below. Since some of the class types and member functions have the same name and meaning as other accessors, the common types and functions are described in -<>. The member types are listed in -<>. The constructors are listed in -<>, and the member functions are -listed in <> and +<>. +The member types are listed in <>. +The constructors are listed in +<>, and the member functions +are listed in <> and <>. -The additional common special member functions and common member functions are -listed in <> in +The additional common special member functions and common member functions +are listed in <> in <> and -<>, respectively. Additionally, -accessors of the same type must be equality comparable. +<>, respectively. +Additionally, accessors of the same type must be equality comparable. [source,,linenums] ---- @@ -7542,17 +7563,18 @@ This specialization of [code]#accessor# is only available for access modes A synopsis for this specialization of [code]#accessor# is provided below. Since some of the class types and member functions have the same name and meaning as other accessors, the common types and functions are described in -<>. The member types are listed in -<>. The constructors are listed in +<>. +The member types are listed in <>. +The constructors are listed in <>, and the member functions are listed in <> and <>. -The additional common special member functions and common member functions are -listed in <> in +The additional common special member functions and common member functions +are listed in <> in <> and -<>, respectively. Additionally, -accessors of the same type must be equality comparable. +<>, respectively. +Additionally, accessors of the same type must be equality comparable. [source,,linenums] ---- @@ -7661,10 +7683,11 @@ This function may only be called from within a <>. Specializations of the [code]#accessor# class with [code]#target::constant_buffer#, [code]#target::host_buffer# and -[code]#target::local# have many member types and member functions with the same -name and meaning. <> describes these -common types and <> describes the -common member functions. +[code]#target::local# have many member types and member functions with the +same name and meaning. +<> describes these common types and +<> describes the common member +functions. [[table.accessors.deprecated.common.types]] @@ -7837,8 +7860,8 @@ When [code]#AccessTarget# is [code]#target::local# or The [code]#accessor# class may be specialized with target [code]#target::device# and access mode [code]#access_mode::atomic#. -This specialization provides additional member functions beyond those that are -provided for other [code]#target::device# specializations as described +This specialization provides additional member functions beyond those that +are provided for other [code]#target::device# specializations as described in <>. @@ -7898,15 +7921,16 @@ the accessor's offset to [code]#index#. ==== Buffer accessor for host code The [code]#host_accessor# class provides access to data in a [code]#buffer# -from host code that is outside of a <> (i.e. do not use this class to -access a buffer inside a host task). +from host code that is outside of a <> (i.e. do not use this class +to access a buffer inside a host task). As with [code]#accessor#, the dimensionality of [code]#host_accessor# must -match the underlying buffer, however, there is a special case if the buffer is -one-dimensional. In this case, the accessor may either be one-dimensional or -it may be zero-dimensional. A zero-dimensional accessor has access to just the -first element of the buffer, whereas a one-dimensional accessor has access to the -entire buffer. +match the underlying buffer, however, there is a special case if the buffer +is one-dimensional. +In this case, the accessor may either be one-dimensional or it may be +zero-dimensional. +A zero-dimensional accessor has access to just the first element of the +buffer, whereas a one-dimensional accessor has access to the entire buffer. The [code]#host_accessor# class supports the following access modes: [code]#access_mode::read#, [code]#access_mode::write# and @@ -7915,22 +7939,22 @@ The [code]#host_accessor# class supports the following access modes: ===== Interface for buffer host accessors -A synopsis of the [code]#host_accessor# class is provided below. Since some of -the class types and member functions have the same name and meaning as other -accessors, the common types and functions are described in -<>. The member types are listed in -<>. +A synopsis of the [code]#host_accessor# class is provided below. +Since some of the class types and member functions have the same name and +meaning as other accessors, the common types and functions are described in +<>. +The member types are listed in <>. The constructors are listed in <>, -and the member functions are listed in <> and -<>. +and the member functions are listed in <> +and <>. -The additional common special member functions and common member functions are -listed in <> in +The additional common special member functions and common member functions +are listed in <> in <> and -<>, respectively. For valid implicit -conversions between accessor types refer to -<>. Additionally, accessors of the same -type must be equality comparable. +<>, respectively. +For valid implicit conversions between accessor types refer to +<>. +Additionally, accessors of the same type must be equality comparable. [source,,linenums] ---- @@ -8137,10 +8161,11 @@ Assignment to the single element that is accessed by this accessor. [[sec:accessor.host.buffer.tags]] ===== Deduction tags for buffer host accessors -Some [code]#host_accessor# constructors take a [code]#TagT# parameter, which is -used to deduce template arguments. The permissible values for this parameter -are listed in <> along with the access mode -that they imply. +Some [code]#host_accessor# constructors take a [code]#TagT# parameter, which +is used to deduce template arguments. +The permissible values for this parameter are listed in +<> along with the access mode that they +imply. [[table.accessors.host.buffer.tags]] .Enumeration of tags available for [code]#host_accessor# construction @@ -8160,9 +8185,9 @@ that they imply. ===== Read only buffer host accessors and implicit conversions <> shows the specializations of -[code]#host_accessor# that are read-only accessors. There is an implicit -conversion between any of these specializations, provided that all other -template parameters are the same. +[code]#host_accessor# that are read-only accessors. +There is an implicit conversion between any of these specializations, +provided that all other template parameters are the same. [[table.accessors.host.buffer.read-only]] .Specializations of [code]#host_accessor# that are read-only @@ -8173,9 +8198,10 @@ template parameters are the same. | const-qualified | [code]#access_mode::read# |==== -There is also an implicit conversion from the read-write [code]#host_accessor# -type shown in <> to any of the read-only -accessors in <>, provided that all other +There is also an implicit conversion from the read-write +[code]#host_accessor# type shown in +<> to any of the read-only accessors +in <>, provided that all other template parameters are the same. [[table.accessors.host.buffer.read-write]] @@ -8191,42 +8217,46 @@ template parameters are the same. ==== Local accessor The [code]#local_accessor# class allocates device local memory and provides -access to this memory from within a <>. The -<> that is allocated is shared between all -<> of a <>. If multiple work-groups execute -simultaneously in an implementation, each work-group receives its own -independent copy of the allocated local memory. +access to this memory from within a <>. +The <> that is allocated is shared between all +<> of a <>. +If multiple work-groups execute simultaneously in an implementation, each +work-group receives its own independent copy of the allocated local memory. The underlying [code]#DataT# type can be any {cpp} type that the device -supports. If [code]#DataT# is an implicit-lifetime type (as defined in the -{cpp} core language), the local accessor implicitly creates objects of that -type with indeterminate values. For other types, the local accessor merely -allocates uninitialized memory, and the application is responsible for -constructing objects in that memory (e.g. by calling placement-new). +supports. +If [code]#DataT# is an implicit-lifetime type (as defined in the {cpp} core +language), the local accessor implicitly creates objects of that type with +indeterminate values. +For other types, the local accessor merely allocates uninitialized memory, +and the application is responsible for constructing objects in that memory +(e.g. by calling placement-new). -A local accessor must not be used in a <> that is invoked -via [code]#single_task# or via the simple form of [code]#parallel_for# that -takes a [code]#range# parameter. In these cases submitting the kernel to -a queue must throw a synchronous [code]#exception# with the -[code]#errc::kernel_argument# error code. +A local accessor must not be used in a <> that is +invoked via [code]#single_task# or via the simple form of +[code]#parallel_for# that takes a [code]#range# parameter. +In these cases submitting the kernel to a queue must throw a synchronous +[code]#exception# with the [code]#errc::kernel_argument# error code. ===== Interface for local accessors -A synopsis of the [code]#local_accessor# class is provided below. Since some -of the class types and member functions have the same name and meaning as other -accessors, the common types and functions are described in -<>. The member types are listed in -<> and <>. -The constructors are listed in <>, -and the member functions are listed in <> and +A synopsis of the [code]#local_accessor# class is provided below. +Since some of the class types and member functions have the same name and +meaning as other accessors, the common types and functions are described in +<>. +The member types are listed in <> and +<>. +The constructors are listed in <>, and +the member functions are listed in <> and <>. -The additional common special member functions and common member functions are -listed in <> in +The additional common special member functions and common member functions +are listed in <> in <> and -<>, respectively. For valid implicit -conversions between accessor types refer to <>. +<>, respectively. +For valid implicit conversions between accessor types refer to +<>. Additionally, accessors of the same type must be equality comparable. [source,,linenums] @@ -8266,8 +8296,8 @@ local_accessor() -- * [code]#(empty() == true)# * All size queries return [code]#0#. - * The return values of [code]#get_pointer()# and [code]#get_multi_ptr()# are - unspecified. + * The return values of [code]#get_pointer()# and [code]#get_multi_ptr()# + are unspecified. * A default constructed local accessor can be passed to a <>, but attempting to access data elements from it produces undefined behavior. @@ -8371,19 +8401,22 @@ This function may only be called from within a <>. [[sec:accessor.local.conversions]] ===== Read only local accessors and implicit conversions -Since [code]#local_accessor# has no template parameter for the access mode, the -only specialization for a read-only local accessor is by providing a -[code]#const# qualified [code]#DataT# parameter. Specializations with a -non-[code]#const# qualified [code]#DataT# parameter are read-write. There is -an implicit conversion from the read-write specialization to the read-only -specialization, provided that all other template parameters are the same. +Since [code]#local_accessor# has no template parameter for the access mode, +the only specialization for a read-only local accessor is by providing a +[code]#const# qualified [code]#DataT# parameter. +Specializations with a non-[code]#const# qualified [code]#DataT# parameter +are read-write. +There is an implicit conversion from the read-write specialization to the +read-only specialization, provided that all other template parameters are +the same. [[sec:accessor.common.members]] ==== Common members for buffer and local accessors -The [code]#accessor#, [code]#host_accessor#, and [code]#local_accessor# classes -have many member types and member functions with the same name and meaning. +The [code]#accessor#, [code]#host_accessor#, and [code]#local_accessor# +classes have many member types and member functions with the same name and +meaning. <> describes these common types and <> describes the common member functions. @@ -8738,49 +8771,53 @@ called from within a <>. There are two classes which implement accessors for unsampled images, [code]#unsampled_image_accessor# and [code]#host_unsampled_image_accessor#. The former provides access from within a <> or from -within a <>. The latter provides access from host code that is -outside of a <>. - -The dimensionality of an unsampled image accessor must match the dimensionality -of the underlying image to which it provides access. Both unsampled image -accessor classes support the [code]#access_mode::read# and -[code]#access_mode::write# access modes. In addition, the -[code]#host_unsampled_image_accessor# class supports +within a <>. +The latter provides access from host code that is outside of a +<>. + +The dimensionality of an unsampled image accessor must match the +dimensionality of the underlying image to which it provides access. +Both unsampled image accessor classes support the [code]#access_mode::read# +and [code]#access_mode::write# access modes. +In addition, the [code]#host_unsampled_image_accessor# class supports [code]#access_mode::read_write#. The [code]#AccessTarget# template parameter dictates how the [code]#unsampled_image_accessor# can be used: [code]#image_target::device# means the accessor can be used in a <> while [code]#image_target::host_task# means the accessor can be used in a -<>. Programs which specify this template parameter as -[code]#image_target::device# and then use the [code]#unsampled_image_accessor# -from a <> are ill formed. Likewise, programs which specify this -template parameter as [code]#image_target::host_task# and then use the +<>. +Programs which specify this template parameter as +[code]#image_target::device# and then use the +[code]#unsampled_image_accessor# from a <> are ill formed. +Likewise, programs which specify this template parameter as +[code]#image_target::host_task# and then use the [code]#unsampled_image_accessor# from a <> are ill formed. ===== Interface for unsampled image accessors -A synopsis of the two unsampled image accessor classes is provided below. Both -classes have member types with the same name, which are described in -<>. The constructors for the two -classes are described in <> and -<>. Both classes also have -member functions with the same name, which are described in -<>. - -The additional common special member functions and common member functions are -listed in <> in +A synopsis of the two unsampled image accessor classes is provided below. +Both classes have member types with the same name, which are described in +<>. +The constructors for the two classes are described in +<> and +<>. +Both classes also have member functions with the same name, which are +described in <>. + +The additional common special member functions and common member functions +are listed in <> in <> and -<>, respectively. For valid implicit -conversions between unsampled accessor types refer to +<>, respectively. +For valid implicit conversions between unsampled accessor types refer to <>. -Two [code]#unsampled_image_accessor# objects of the same type must be equality -comparable in both the host code and in SYCL kernel functions. Two -[code]#host_unsampled_image_accessor# objects of the same type must be equality -comparable in the host code. +Two [code]#unsampled_image_accessor# objects of the same type must be +equality comparable in both the host code and in SYCL kernel functions. +Two [code]#host_unsampled_image_accessor# objects of the same type must be +equality comparable in the host code. [source,,linenums] ---- @@ -8918,11 +8955,12 @@ within a <>. [[sec:accessor.unsampled.image.conversions]] ===== Read only unsampled image accessors and implicit conversions -All specializations of unsampled image accessors with [code]#access_mode::read# -are read-only regardless of whether [code]#DataT# is [code]#const# qualified. +All specializations of unsampled image accessors with +[code]#access_mode::read# are read-only regardless of whether [code]#DataT# +is [code]#const# qualified. There is an implicit conversion between the [code]#const# qualified and -non-[code]#const# qualified specializations, provided that all other template -parameters are the same. +non-[code]#const# qualified specializations, provided that all other +template parameters are the same. ==== Sampled image accessors @@ -8930,46 +8968,51 @@ parameters are the same. There are two classes which implement accessors for sampled images, [code]#sampled_image_accessor# and [code]#host_sampled_image_accessor#. The former provides access from within a <> or from -within a <>. The latter provides access from host code that is -outside of a <>. +within a <>. +The latter provides access from host code that is outside of a +<>. The dimensionality of a sampled image accessor must match the dimensionality -of the underlying image to which it provides access. Sampled image accessors -are always read-only. +of the underlying image to which it provides access. +Sampled image accessors are always read-only. The [code]#AccessTarget# template parameter dictates how the -[code]#sampled_image_accessor# can be used: [code]#image_target::device# means -the accessor can be used in a <> while +[code]#sampled_image_accessor# can be used: [code]#image_target::device# +means the accessor can be used in a <> while [code]#image_target::host_task# means the accessor can be used in a -<>. Programs which specify this template parameter as +<>. +Programs which specify this template parameter as [code]#image_target::device# and then use the [code]#sampled_image_accessor# -from a <> are ill formed. Likewise, programs which specify this -template parameter as [code]#image_target::host_task# and then use the -[code]#sampled_image_accessor# from a <> are ill formed. +from a <> are ill formed. +Likewise, programs which specify this template parameter as +[code]#image_target::host_task# and then use the +[code]#sampled_image_accessor# from a <> are ill +formed. ===== Interface for sampled image accessors -A synopsis of the two sampled image accessor classes is provided below. Both -classes have member types with the same name, which are described in -<>. The constructors for the two -classes are described in <> and -<>. Both classes also have -member functions with the same name, which are described in -<>. - -The additional common special member functions and common member functions are -listed in <> in +A synopsis of the two sampled image accessor classes is provided below. +Both classes have member types with the same name, which are described in +<>. +The constructors for the two classes are described in +<> and +<>. +Both classes also have member functions with the same name, which are +described in <>. + +The additional common special member functions and common member functions +are listed in <> in <> and -<>, respectively. For valid implicit -conversions between sampled accessor types refer to +<>, respectively. +For valid implicit conversions between sampled accessor types refer to <>. Two [code]#sampled_image_accessor# objects of the same type must be equality -comparable in both the host code and in SYCL kernel functions. Two -[code]#host_sampled_image_accessor# objects of the same type must be equality -comparable in the host code. +comparable in both the host code and in SYCL kernel functions. +Two [code]#host_sampled_image_accessor# objects of the same type must be +equality comparable in the host code. [source,,linenums] ---- @@ -9085,9 +9128,10 @@ within a <>. ===== Read only sampled image accessors and implicit conversions All specializations of sampled image accessors are read-only regardless of -whether [code]#DataT# is [code]#const# qualified. There is an implicit -conversion between the [code]#const# qualified and non-[code]#const# qualified -specializations, provided that all other template parameters are the same. +whether [code]#DataT# is [code]#const# qualified. +There is an implicit conversion between the [code]#const# qualified and +non-[code]#const# qualified specializations, provided that all other +template parameters are the same. // %%%%%%%%%%%%%%%%%%%%%%%%%%%% end accessors %%%%%%%%%%%%%%%%%%%%%%%%%%%% @@ -9096,23 +9140,26 @@ specializations, provided that all other template parameters are the same. === Address space classes In SYCL, there are five different address spaces: global, local, constant, -private and generic. In a SYCL generic implementation, types are not -affected by the address spaces. However, there are situations where users -need to explicitly carry address spaces in the type. For example: - - * For performance tuning and genericness. Even if the platform supports - the representation of the generic address space, this may come at some - performance sacrifice. In order to help the target compiler, it can be - useful to track specifically which address space a pointer is - addressing. - * When linking SYCL kernels with <>-specific functions. In this - case, it might be necessary to specify the address space for any pointer - parameters. +private and generic. +In a SYCL generic implementation, types are not affected by the address +spaces. +However, there are situations where users need to explicitly carry address +spaces in the type. +For example: + + * For performance tuning and genericness. + Even if the platform supports the representation of the generic address + space, this may come at some performance sacrifice. + In order to help the target compiler, it can be useful to track + specifically which address space a pointer is addressing. + * When linking SYCL kernels with <>-specific functions. + In this case, it might be necessary to specify the address space for any + pointer parameters. Direct declaration of pointers with address spaces is discouraged as the -definition is implementation-defined. Users must rely on the -[code]#multi_ptr# class to handle address space boundaries and -interoperability. +definition is implementation-defined. +Users must rely on the [code]#multi_ptr# class to handle address space +boundaries and interoperability. [[sec:multiptr]] @@ -9121,43 +9168,49 @@ interoperability. The multi-pointer class is the common interface for the explicit pointer classes, defined in <>. -There are situations where a user may want to make their type address space dependent. -This allows performing generic programming that depends on the address space associated -with their data. An example might be wrapping a pointer inside a class, where -a user may need to template the class according to the address space of the -pointer the class is initialized with. In this case, the [code]#multi_ptr# -class enables users to do this in a portable and stable way. +There are situations where a user may want to make their type address space +dependent. +This allows performing generic programming that depends on the address space +associated with their data. +An example might be wrapping a pointer inside a class, where a user may need +to template the class according to the address space of the pointer the +class is initialized with. +In this case, the [code]#multi_ptr# class enables users to do this in a +portable and stable way. The [code]#multi_ptr# class exposes 3 flavors of the same interface. -If the value of [code]#access::decorated# is [code]#access::decorated::no#, -the interface exposes pointers and references type that are not decorated by an address space. -If the value of [code]#access::decorated# is [code]#access::decorated::yes#, -the interface exposes pointers and references type that are decorated by an address space. -The decoration is implementation dependent and relies on device compiler extensions. +If the value of [code]#access::decorated# is [code]#access::decorated::no#, +the interface exposes pointers and references type that are not decorated by +an address space. +If the value of [code]#access::decorated# is [code]#access::decorated::yes#, +the interface exposes pointers and references type that are decorated by an +address space. +The decoration is implementation dependent and relies on device compiler +extensions. The decorated type may be distinct from the non-decorated one. -For interoperability with the <>, users should rely on types exposed -by the decorated version. -If the value of [code]#access::decorated# is [code]#access::decorated::legacy#, -the 1.2.1 interface is exposed. +For interoperability with the <>, users should rely on types +exposed by the decorated version. +If the value of [code]#access::decorated# is +[code]#access::decorated::legacy#, the 1.2.1 interface is exposed. This interface is deprecated. The template traits [code]#remove_decoration# and type alias -[code]#remove_decoration_t# retrieve the non-decorated pointer or -reference from a decorated one. Using this template trait with a -non-decorated type is safe and returns the same type. +[code]#remove_decoration_t# retrieve the non-decorated pointer or reference +from a decorated one. +Using this template trait with a non-decorated type is safe and returns the +same type. -It is possible to use the [code]#void# type for the [code]#multi_ptr# -class, but in that case some functionality is disabled. +It is possible to use the [code]#void# type for the [code]#multi_ptr# class, +but in that case some functionality is disabled. [code]#multi_ptr# does not provide the [code]#reference# or -[code]#const_reference# types, the access operators -([code]#operator*()#, [code]#+operator->()+#), the arithmetic -operators or [code]#prefetch# member function. -Conversions from [code]#multi_ptr# to [code]#multi_ptr# of the -same address space are allowed, and will occur implicitly. -Conversions from [code]#multi_ptr# to any other -[code]#multi_ptr# type of the same address space -are allowed, but must be explicit. +[code]#const_reference# types, the access operators ([code]#operator*()#, +[code]#+operator->()+#), the arithmetic operators or [code]#prefetch# member +function. +Conversions from [code]#multi_ptr# to [code]#multi_ptr# of the same +address space are allowed, and will occur implicitly. +Conversions from [code]#multi_ptr# to any other [code]#multi_ptr# type +of the same address space are allowed, but must be explicit. The same rules apply to [code]#multi_ptr#. An overview of the interface provided for the [code]#multi_ptr# class @@ -9840,8 +9893,8 @@ include::{header_dir}/multipointerlegacy.h[lines=4..-1] SYCL provides aliases to the [code]#multi_ptr# class template (see <>) for each specialization of [code]#access::address_space#. -A synopsis of the SYCL [code]#multi_ptr# class template -aliases is provided below. +A synopsis of the SYCL [code]#multi_ptr# class template aliases is provided +below. // Interface of the explicit pointer classes [source,,linenums] @@ -9849,18 +9902,17 @@ aliases is provided below. include::{header_dir}/pointer.h[lines=4..-1] ---- -Note that using [code]#global_ptr#, [code]#local_ptr#, -[code]#constant_ptr# or [code]#private_ptr# -without specifying the decoration is deprecated. +Note that using [code]#global_ptr#, [code]#local_ptr#, [code]#constant_ptr# +or [code]#private_ptr# without specifying the decoration is deprecated. The default argument is provided for compatibility with 1.2.1. [[subsec:samplers]] === Image samplers -The SYCL [code]#image_sampler# struct contains a configuration for sampling a -[code]#sampled_image#. The members of this struct are defined by the following -tables. +The SYCL [code]#image_sampler# struct contains a configuration for sampling +a [code]#sampled_image#. +The members of this struct are defined by the following tables. // Interface of the sampler class [source,,linenums] @@ -9971,12 +10023,14 @@ unnormalized [[sec:usm]] == Unified shared memory (USM) -This section describes properties and routines for pointer-based -memory management interfaces in SYCL. These routines augment, rather -than replace, the buffer-based interfaces in SYCL. +This section describes properties and routines for pointer-based memory +management interfaces in SYCL. +These routines augment, rather than replace, the buffer-based interfaces in +SYCL. -Unified Shared Memory (<>) provides a pointer-based alternative to -the buffer programming model. USM enables: +Unified Shared Memory (<>) provides a pointer-based alternative to the +buffer programming model. +USM enables: * Easier integration into existing code bases by representing allocations as pointers rather than buffers, with full support for pointer @@ -9986,18 +10040,17 @@ the buffer programming model. USM enables: * A simpler programming model, by automatically migrating some allocations between SYCL devices and the host. -To show the differences with the example from <>, the -following source code example shows how shared memory can be used -between host and device: +To show the differences with the example from <>, the following +source code example shows how shared memory can be used between host and +device: [source,,linenums] ---- include::{code_dir}/usm_shared.cpp[lines=4..-1] ---- -By comparison, the following source code example uses less capable -device memory, which requires an explicit copy between the device and the -host: +By comparison, the following source code example uses less capable device +memory, which requires an explicit copy between the device and the host: [source,,linenums] ---- include::{code_dir}/usm_device.cpp[lines=4..-1] @@ -10007,21 +10060,22 @@ include::{code_dir}/usm_device.cpp[lines=4..-1] === Unified addressing Unified Addressing guarantees that all devices will use a unified address -space. Pointer values in the unified address space will always refer to the -same location in memory. The unified address space encompasses the host and -one or more devices. Note that this does not require addresses in the -unified address space to be accessible on all devices, just that pointer -values will be consistent. +space. +Pointer values in the unified address space will always refer to the same +location in memory. +The unified address space encompasses the host and one or more devices. +Note that this does not require addresses in the unified address space to be +accessible on all devices, just that pointer values will be consistent. === Kinds of unified shared memory -<> is a capability that, when available, provides the ability -to create allocations that are visible to both host and device(s). -USM builds upon Unified Addressing to define a shared address space -where pointer values in this space always refer to the same location -in memory. USM defines three types of memory allocations -described in <>. +<> is a capability that, when available, provides the ability to create +allocations that are visible to both host and device(s). +USM builds upon Unified Addressing to define a shared address space where +pointer values in this space always refer to the same location in memory. +USM defines three types of memory allocations described in +<>. [[table.USM.allocation]] .Type of USM allocations @@ -10037,8 +10091,8 @@ described in <>. device |==== -The following [code]#enum# is used to refer to the different types of allocations -inside of a SYCL program: +The following [code]#enum# is used to refer to the different types of +allocations inside of a SYCL program: [source,,linenums] ---- @@ -10057,10 +10111,10 @@ enum class alloc : /* unspecified */ { ---- USM is an optional feature which may not be supported by all devices, and -devices that support USM may not support all types of USM allocation. A SYCL -application can use the [code]#device::has()# function to determine the -level of USM support for a device. See <> in -<> for more details. +devices that support USM may not support all types of USM allocation. +A SYCL application can use the [code]#device::has()# function to determine +the level of USM support for a device. +See <> in <> for more details. The characteristics of USM allocations are summarized in <>. @@ -10082,37 +10136,44 @@ The characteristics of USM allocations are summarized in | Another [code]#device# | Optional | Another [code]#device#| Optional |==== -Each USM allocation has an associated SYCL <>, and any access to that -memory must use the same context. Specifically, any <> -that dereferences a pointer to a USM allocation must be submitted to a -<> that was constructed with the same context that was used to allocate -that memory. The explicit memory operation <> that take USM -pointers have a similar restriction. (See <> for -details.) Violations of these requirements result in undefined behavior. +Each USM allocation has an associated SYCL <>, and any access to +that memory must use the same context. +Specifically, any <> that dereferences a pointer to a +USM allocation must be submitted to a <> that was constructed with +the same context that was used to allocate that memory. +The explicit memory operation <> that take USM pointers +have a similar restriction. +(See <> for details.) Violations of these +requirements result in undefined behavior. [NOTE] ==== There are no similar restrictions for dereferencing a USM pointer in a -<>. This is legal regardless of which <> the host task was -submitted to so long as the USM pointer is accessible on the host. +<>. +This is legal regardless of which <> the host task was submitted to +so long as the USM pointer is accessible on the host. ==== Each type of USM allocation has different rules for where that memory is -accessible. Attempting to dereference a USM pointer on the host or on a device -in violation of these rules results in undefined behavior. Passing a USM -pointer to one of the explicit memory functions where the pointer is not -accessible to the device generally results in undefined behavior. See -<> for the exact rules. +accessible. +Attempting to dereference a USM pointer on the host or on a device in +violation of these rules results in undefined behavior. +Passing a USM pointer to one of the explicit memory functions where the +pointer is not accessible to the device generally results in undefined +behavior. +See <> for the exact rules. Device allocations are used for explicitly managing device memory. -Programmers directly allocate device memory and explicitly copy data -between host memory and a device allocation. Device allocations are obtained -through SYCL device USM allocation routines instead of system allocation -routines like [code]#std::malloc# or {cpp} [code]#new#. Device -allocations are not accessible on the host, but the pointer values remain -consistent on account of Unified Addressing. The size of device allocations -will be limited by the amount of memory in a device. Support for device -allocations on a specific device can be queried through +Programmers directly allocate device memory and explicitly copy data between +host memory and a device allocation. +Device allocations are obtained through SYCL device USM allocation routines +instead of system allocation routines like [code]#std::malloc# or {cpp} +[code]#new#. +Device allocations are not accessible on the host, but the pointer values +remain consistent on account of Unified Addressing. +The size of device allocations will be limited by the amount of memory in a +device. +Support for device allocations on a specific device can be queried through [code]#aspect::usm_device_allocations#. Device allocations must be explicitly copied between the host and a device. @@ -10121,88 +10182,97 @@ The member functions to copy and initialize data are found in functions may be used on device allocations if a device supports [code]#aspect::usm_device_allocations#. -Host allocations allow devices to directly read and write host memory -inside of a kernel. This can be useful for several reasons, such as when the -overhead of moving a small amount of data is not worth paying over the cost of a -remote access or when the size of a data set exceeds the size of a device's memory. -Host allocations must also be obtained using SYCL routines instead -of system allocation routines. While a device may remotely read and -write a host allocation, the allocation does not migrate to the device - -it remains in host memory. Users should take care to properly synchronize -access to host allocations between host execution and kernels. The total -size of host allocations will be limited by the amount of pinnable-memory -on the host on most systems. Support for host allocations on a specific -device can be queried through [code]#aspect::usm_host_allocations#. -Support for atomic modification of host allocations -on a specific device can be queried through -[code]#aspect::usm_atomic_host_allocations#. - -Shared allocations implicitly share data between the host -and devices. Data may move to where it is being used without the programmer -explicitly informing the runtime. It is up to the runtime and backends -to make sure that a shared allocation is available where it is used. +Host allocations allow devices to directly read and write host memory inside +of a kernel. +This can be useful for several reasons, such as when the overhead of moving +a small amount of data is not worth paying over the cost of a remote access +or when the size of a data set exceeds the size of a device's memory. +Host allocations must also be obtained using SYCL routines instead of system +allocation routines. +While a device may remotely read and write a host allocation, the allocation +does not migrate to the device - +it remains in host memory. +Users should take care to properly synchronize access to host allocations +between host execution and kernels. +The total size of host allocations will be limited by the amount of +pinnable-memory on the host on most systems. +Support for host allocations on a specific device can be queried through +[code]#aspect::usm_host_allocations#. +Support for atomic modification of host allocations on a specific device can +be queried through [code]#aspect::usm_atomic_host_allocations#. + +Shared allocations implicitly share data between the host and devices. +Data may move to where it is being used without the programmer explicitly +informing the runtime. +It is up to the runtime and backends to make sure that a shared allocation +is available where it is used. Shared allocations must also be obtained using SYCL allocation routines -instead of the system allocator. The maximum size of a shared allocation -on a specific device, and the total size of all shared allocations in a -context, are implementation-defined. -Support for shared allocations on a -specific device can be queried through [code]#aspect::usm_shared_allocations#. - -Not all devices may support concurrent access of a shared allocation -with the host. If a device does not support this, -host execution and device code must take turns accessing the allocation, so -the host must not access a shared allocation while a kernel is executing. -Host access to a shared allocation which is also accessed -by an executing kernel on a device that does not support -concurrent access results in undefined behavior. If a device does -support concurrent access, both the host and and the device may atomically -modify the same data inside an allocation. Allocations, or pieces of allocations, -are now free to migrate to different devices in the same context -that also support this capability. Additionally, many devices that support -concurrent access may support a working set of shared allocations -larger than device memory. +instead of the system allocator. +The maximum size of a shared allocation on a specific device, and the total +size of all shared allocations in a context, are implementation-defined. +Support for shared allocations on a specific device can be queried through +[code]#aspect::usm_shared_allocations#. + +Not all devices may support concurrent access of a shared allocation with +the host. +If a device does not support this, host execution and device code must take +turns accessing the allocation, so the host must not access a shared +allocation while a kernel is executing. +Host access to a shared allocation which is also accessed by an executing +kernel on a device that does not support concurrent access results in +undefined behavior. +If a device does support concurrent access, both the host and and the device +may atomically modify the same data inside an allocation. +Allocations, or pieces of allocations, are now free to migrate to different +devices in the same context that also support this capability. +Additionally, many devices that support concurrent access may support a +working set of shared allocations larger than device memory. Users may query whether a device supports concurrent access with atomic modification of shared allocations through the aspect [code]#aspect::usm_atomic_shared_allocations#. See <> in <> for more details. -Performance hints for shared allocations may be specified by the user -by enqueueing [code]#prefetch# operations on a device. These operations -inform the SYCL runtime that the specified shared allocation is -likely to be accessed on the device in the future, and that it is free -to migrate the allocation to the device. +Performance hints for shared allocations may be specified by the user by +enqueueing [code]#prefetch# operations on a device. +These operations inform the SYCL runtime that the specified shared +allocation is likely to be accessed on the device in the future, and that it +is free to migrate the allocation to the device. More about [code]#prefetch# is found in <> and -<>. If a device supports concurrent access to -shared allocations, then [code]#prefetch# operations may be overlapped -with kernel execution. - -Additionally, users may use the [code]#mem_advise# member function to annotate -shared allocations with [code]#advice#. Valid [code]#advice# is defined by the -device and its associated backend. See <> and -<> for more information. - -In the most capable systems, users do not need to use SYCL USM allocation functions -to create shared allocations. The system allocator ([code]#malloc#/[code]#new#) may -instead be used. Likewise, [code]#std::free# and -[code]#delete# are used instead of [code]#sycl::free#. Note that -host and device allocations are unaffected by this -change and must still be allocated using their respective USM functions in -order to guarantee their behavior. Users may query the device to determine -if system allocations are supported for use on the device, through -[code]#aspect::usm_system_allocations#. +<>. +If a device supports concurrent access to shared allocations, then +[code]#prefetch# operations may be overlapped with kernel execution. + +Additionally, users may use the [code]#mem_advise# member function to +annotate shared allocations with [code]#advice#. +Valid [code]#advice# is defined by the device and its associated backend. +See <> and <> for more +information. + +In the most capable systems, users do not need to use SYCL USM allocation +functions to create shared allocations. +The system allocator ([code]#malloc#/[code]#new#) may instead be used. +Likewise, [code]#std::free# and [code]#delete# are used instead of +[code]#sycl::free#. +Note that host and device allocations are unaffected by this change and must +still be allocated using their respective USM functions in order to +guarantee their behavior. +Users may query the device to determine if system allocations are supported +for use on the device, through [code]#aspect::usm_system_allocations#. === USM allocations -USM provides several allocation functions. These functions accept a -[code]#property_list# parameter, which is provided for future extensibility. +USM provides several allocation functions. +These functions accept a [code]#property_list# parameter, which is provided +for future extensibility. The <> does not yet define any USM allocation properties. -Some of the allocation functions take an explicit alignment parameter. Like -[code]#std::aligned_alloc#, these functions return [code]#nullptr# if the -alignment is not supported by the implementation. Some of the allocation -functions are templated on the allocated type [code]#T# and some are not. The -following table specifies the alignment guarantees for each category. +Some of the allocation functions take an explicit alignment parameter. +Like [code]#std::aligned_alloc#, these functions return [code]#nullptr# if +the alignment is not supported by the implementation. +Some of the allocation functions are templated on the allocated type +[code]#T# and some are not. +The following table specifies the alignment guarantees for each category. [[table.usm.alignment]] .Alignment guarantees of USM allocation functions @@ -10232,11 +10302,11 @@ a@ Pointer is suitably aligned for an object of type [code]#T# or it is aligned ==== {cpp} allocator interface -SYCL defines an allocator class named [code]#usm_allocator# that satisfies the -{cpp} named requirement [code]#Allocator#. The [code]#AllocKind# template -parameter can be either [code]#usm::alloc::host# or [code]#usm::alloc::shared#, -causing the allocator to make either host USM allocations or shared USM -allocations. +SYCL defines an allocator class named [code]#usm_allocator# that satisfies +the {cpp} named requirement [code]#Allocator#. +The [code]#AllocKind# template parameter can be either +[code]#usm::alloc::host# or [code]#usm::alloc::shared#, causing the +allocator to make either host USM allocations or shared USM allocations. [NOTE] ==== @@ -10246,14 +10316,14 @@ host. ==== The [code]#usm_allocator# class has a template argument [code]#Alignment#, -which specifies the minimum alignment for memory that it allocates. This -alignment is used even if the allocator is rebound to a different type. Memory -allocated by this allocator is suitably aligned for objects of its underlying -[code]#value_type# or at the alignment specified by [code]#Alignment#, -whichever is greater. +which specifies the minimum alignment for memory that it allocates. +This alignment is used even if the allocator is rebound to a different type. +Memory allocated by this allocator is suitably aligned for objects of its +underlying [code]#value_type# or at the alignment specified by +[code]#Alignment#, whichever is greater. -A synopsis of the [code]#usm_allocator# class is provided below. The -constructors are listed in <>. +A synopsis of the [code]#usm_allocator# class is provided below. +The constructors are listed in <>. [source,,linenums] ---- @@ -10347,17 +10417,20 @@ a@ Simplified constructor form where [code]#syclQueue# provides the ==== Device allocation functions -The functions in <> allocate device USM. On success, -these functions return a pointer to the newly allocated memory, which must -eventually be deallocated with [code]#sycl::free# in order to avoid a memory -leak. If there are not enough resources to allocate the requested memory, -these functions return [code]#nullptr#. +The functions in <> allocate device USM. +On success, these functions return a pointer to the newly allocated memory, +which must eventually be deallocated with [code]#sycl::free# in order to +avoid a memory leak. +If there are not enough resources to allocate the requested memory, these +functions return [code]#nullptr#. When the allocation size is zero bytes ([code]#numBytes# or [code]#count# is zero), these functions behave in a manor consistent with {cpp} -[code]#std::malloc#. The value returned is unspecified in this case, and the -returned pointer may not be used to access storage. If this pointer is not -null, it must be passed to [code]#sycl::free# to avoid a memory leak. +[code]#std::malloc#. +The value returned is unspecified in this case, and the returned pointer may +not be used to access storage. +If this pointer is not null, it must be passed to [code]#sycl::free# to +avoid a memory leak. [[table.usm.device.allocs]] .Device USM Allocation Functions @@ -10479,11 +10552,12 @@ a@ Simplified form where [code]#syclQueue# provides the [code]#device# and ==== Host allocation functions -The functions in <> allocate host USM. On success, -these functions return a pointer to the newly allocated memory, which must -eventually be deallocated with [code]#sycl::free# in order to avoid a memory -leak. If there are not enough resources to allocate the requested memory, -these functions return [code]#nullptr#. +The functions in <> allocate host USM. +On success, these functions return a pointer to the newly allocated memory, +which must eventually be deallocated with [code]#sycl::free# in order to +avoid a memory leak. +If there are not enough resources to allocate the requested memory, these +functions return [code]#nullptr#. [[table.usm.host.allocs]] .Host USM Allocation Functions @@ -10580,11 +10654,12 @@ a@ Simplified form where [code]#syclQueue# provides the [code]#context#. ==== Shared allocation functions -The functions in <> allocate shared USM. On success, -these functions return a pointer to the newly allocated memory, which must -eventually be deallocated with [code]#sycl::free# in order to avoid a memory -leak. If there are not enough resources to allocate the requested memory, -these functions return [code]#nullptr#. +The functions in <> allocate shared USM. +On success, these functions return a pointer to the newly allocated memory, +which must eventually be deallocated with [code]#sycl::free# in order to +avoid a memory leak. +If there are not enough resources to allocate the requested memory, these +functions return [code]#nullptr#. [[table.usm.shared.allocs]] .Shared USM Allocation Functions @@ -10706,21 +10781,23 @@ a@ Simplified form where [code]#syclQueue# provides the [code]#device# and ==== Parameterized allocation functions -The functions in <> take a [code]#kind# parameter that -specifies the type of USM to allocate. When [code]#kind# is -[code]#usm::alloc::device#, then the allocation device must have -[code]#aspect::usm_device_allocations#. When [code]#kind# is -[code]#usm::alloc::host#, at least one device in the allocation context must -have [code]#aspect::usm_host_allocations#. When [code]#kind# is -[code]#usm::alloc::shared#, the allocation device must have -[code]#aspect::usm_shared_allocations#. If these requirements are -violated, the allocation function throws a synchronous [code]#exception# with -the [code]#errc::feature_not_supported# error code. +The functions in <> take a [code]#kind# parameter +that specifies the type of USM to allocate. +When [code]#kind# is [code]#usm::alloc::device#, then the allocation device +must have [code]#aspect::usm_device_allocations#. +When [code]#kind# is [code]#usm::alloc::host#, at least one device in the +allocation context must have [code]#aspect::usm_host_allocations#. +When [code]#kind# is [code]#usm::alloc::shared#, the allocation device must +have [code]#aspect::usm_shared_allocations#. +If these requirements are violated, the allocation function throws a +synchronous [code]#exception# with the [code]#errc::feature_not_supported# +error code. On success, these functions return a pointer to the newly allocated memory, -which must eventually be deallocated with [code]#sycl::free# in order to avoid -a memory leak. If there are not enough resources to allocate the requested -memory, these functions return [code]#nullptr#. +which must eventually be deallocated with [code]#sycl::free# in order to +avoid a memory leak. +If there are not enough resources to allocate the requested memory, these +functions return [code]#nullptr#. [[table.usm.param.allocs]] .Parameterized USM Allocation Functions @@ -10864,11 +10941,11 @@ a@ Alternate form where [code]#syclQueue# provides the [code]#context#. === Unified shared memory pointer queries -Since USM pointers look like raw {cpp} pointers, users cannot deduce what kind of -USM allocation a given pointer may be from examining its type. However, two -functions are defined that let users query the type of a USM allocation and, if -applicable, the [code]#device# on which it was allocated. These query functions -are only supported on the host. +Since USM pointers look like raw {cpp} pointers, users cannot deduce what +kind of USM allocation a given pointer may be from examining its type. +However, two functions are defined that let users query the type of a USM +allocation and, if applicable, the [code]#device# on which it was allocated. +These query functions are only supported on the host. [[table.usm.ptr.query]] .USM Pointer Query Functions @@ -10911,18 +10988,18 @@ USM allocation from [code]#syclContext#. === Ranges and index space identifiers The data parallelism of the SYCL kernel execution model requires -instantiation of a parallel execution over a -range of iteration space coordinates. To achieve this, SYCL exposes types -to define the range of execution and to identify a given execution -instance's point in the iteration space. +instantiation of a parallel execution over a range of iteration space +coordinates. +To achieve this, SYCL exposes types to define the range of execution and to +identify a given execution instance's point in the iteration space. -The following types are defined: [code]#range#, -[code]#nd_range#, [code]#id#, [code]#item#, [code]#h_item#, -[code]#nd_item# and [code]#group#. +The following types are defined: [code]#range#, [code]#nd_range#, +[code]#id#, [code]#item#, [code]#h_item#, [code]#nd_item# and [code]#group#. -When constructing multi-dimensional ids or ranges from integers, the elements -are written such that the right-most element varies fastest in a linearization -of the multi-dimensional space (see <>). +When constructing multi-dimensional ids or ranges from integers, the +elements are written such that the right-most element varies fastest in a +linearization of the multi-dimensional space (see +<>). [[table.id.summary]] @@ -10999,22 +11076,20 @@ group [[range-class]] ==== [code]#range# class -[code]#range# -is a 1D, 2D or 3D vector that defines -the iteration domain of either a single work-group in a parallel -dispatch, or the overall Dimensions of the dispatch. It can be -constructed from integers. +[code]#range# is a 1D, 2D or 3D vector that defines the +iteration domain of either a single work-group in a parallel dispatch, or +the overall Dimensions of the dispatch. +It can be constructed from integers. -The SYCL [code]#range# class template provides the common by-value -semantics (see <>). +The SYCL [code]#range# class template provides the common by-value semantics +(see <>). -A synopsis of the SYCL [code]#range# class is provided below. The -constructors, member functions and non-member functions of the SYCL -[code]#range# class are listed in -<>, <> and -<> respectively. The additional common -special member functions and common member functions are listed in -<> in +A synopsis of the SYCL [code]#range# class is provided below. +The constructors, member functions and non-member functions of the SYCL +[code]#range# class are listed in <>, +<> and <> respectively. +The additional common special member functions and common member functions +are listed in <> in <> and <> respectively. @@ -11249,23 +11324,22 @@ Then return the initial copy of the [code]#range#. include::{header_dir}/ndRange.h[lines=4..-1] ---- -[code]#nd_range# -defines the iteration domain of both -the work-groups and the overall dispatch. To define this the -[code]#nd_range# comprises two ranges: the whole range over which -the kernel is to be executed, and the range of each work -group. +[code]#nd_range# defines the iteration domain of both the +work-groups and the overall dispatch. +To define this the [code]#nd_range# comprises two ranges: the whole range +over which the kernel is to be executed, and the range of each work group. The SYCL [code]#nd_range# class template provides the common by-value semantics (see <>). -A synopsis of the SYCL [code]#nd_range# class is provided below. The -constructors and member functions of the SYCL [code]#nd_range# class -are listed in <> and -<> respectively. The additional common special -member functions and common member functions are listed in -<> in <> -and <> respectively. +A synopsis of the SYCL [code]#nd_range# class is provided below. +The constructors and member functions of the SYCL [code]#nd_range# class are +listed in <> and <> +respectively. +The additional common special member functions and common member functions +are listed in <> in +<> and +<> respectively. [[table.constructors.ndrange]] @@ -11335,20 +11409,20 @@ id get_offset() const ==== [code]#id# class [code]#id# is a vector of Dimensions that is used to -represent an <> into a global or local -[code]#range#. It can be used as an index in an accessor of the -same rank. The subscript operator ([code]#operator[](n)#) returns the component +represent an <> into a global or local [code]#range#. +It can be used as an index in an accessor of the same rank. +The subscript operator ([code]#operator[](n)#) returns the component [code]#n# as a [code]#size_t#. The SYCL [code]#id# class template provides the common by-value semantics (see <>). -A synopsis of the SYCL [code]#id# class is provided below. The -constructors, member functions and non-member functions of the SYCL +A synopsis of the SYCL [code]#id# class is provided below. +The constructors, member functions and non-member functions of the SYCL [code]#id# class are listed in <>, -<> and <> respectively. The -additional common special member functions and common member functions are -listed in <> in +<> and <> respectively. +The additional common special member functions and common member functions +are listed in <> in <> and <> respectively. @@ -11581,23 +11655,25 @@ Then return the initial copy of the [code]#id#. [[subsec:item.class]] ==== [code]#item# class -<> identifies an instance of the function object -executing at each point in a [code]#range#. It is passed to a -[code]#parallel_for# call or returned by member functions of [code]#h_item#. -It encapsulates enough information to identify the work-item's range -of possible values and its ID in that range. It can optionally carry the offset of the -range if provided to the [code]#parallel_for#; note this is deprecated in SYCL 2020. -Instances of the [code]#item# class are -not user-constructible and are passed by the runtime to each instance -of the function object. +<> identifies an instance of the function object executing at each +point in a [code]#range#. +It is passed to a [code]#parallel_for# call or returned by member functions +of [code]#h_item#. +It encapsulates enough information to identify the work-item's range of +possible values and its ID in that range. +It can optionally carry the offset of the range if provided to the +[code]#parallel_for#; note this is deprecated in SYCL 2020. +Instances of the [code]#item# class are not user-constructible and are +passed by the runtime to each instance of the function object. The SYCL [code]#item# class template provides the common by-value semantics (see <>). -A synopsis of the SYCL [code]#item# class is provided below. The member -functions of the SYCL [code]#item# class are listed in -<>. The additional common special member functions -and common member functions are listed in <> in +A synopsis of the SYCL [code]#item# class is provided below. +The member functions of the SYCL [code]#item# class are listed in +<>. +The additional common special member functions and common member functions +are listed in <> in <> and <> respectively. @@ -11701,23 +11777,26 @@ size_t get_linear_id() const [[nditem-class]] ==== [code]#nd_item# class -[code]#nd_item# identifies an instance of the function object -executing at each point in an [code]#nd_range# passed to a -[code]#parallel_for# call. It encapsulates enough -information to identify the <>'s local and global <>, the -<> and also provides access to the [code]#group# and -[code]#sub_group# classes. Instances of the [code]#nd_item# class are not user-constructible and are passed by the runtime to -each instance of the function object. +[code]#nd_item# identifies an instance of the function +object executing at each point in an [code]#nd_range# passed +to a [code]#parallel_for# call. +It encapsulates enough information to identify the <>'s local and +global <>, the <> and also provides access to the +[code]#group# and [code]#sub_group# classes. +Instances of the [code]#nd_item# class are not +user-constructible and are passed by the runtime to each instance of the +function object. The SYCL [code]#nd_item# class template provides the common by-value semantics (see <>). -A synopsis of the SYCL [code]#nd_item# class is provided below. The -member functions of the SYCL [code]#nd_item# class are listed in -<>. The additional common special member -functions and common member functions are listed in -<> in <> -and <> respectively. +A synopsis of the SYCL [code]#nd_item# class is provided below. +The member functions of the SYCL [code]#nd_item# class are listed in +<>. +The additional common special member functions and common member functions +are listed in <> in +<> and +<> respectively. % interface for nd_item class [source,,linenums] @@ -12019,25 +12098,27 @@ template void wait_for(EventTN... events) const [code]#group::parallel_for_work_item# function object executing at each point in a local [code]#range# passed to a [code]#parallel_for_work_item# call or to the corresponding -[code]#parallel_for_work_group# call if no [code]#range# is passed -to the [code]#parallel_for_work_item# call. It encapsulates enough -information to identify the <>'s local and global <> -according to the information given to [code]#parallel_for_work_group# -(physical ids) as well as the <>'s logical local <> -in the logical local range. All returned <> objects are -offset-less. Instances of the [code]#h_item# class are -not user-constructible and are passed by the runtime to each instance of the +[code]#parallel_for_work_group# call if no [code]#range# is passed to the +[code]#parallel_for_work_item# call. +It encapsulates enough information to identify the <>'s local and +global <> according to the information given to +[code]#parallel_for_work_group# (physical ids) as well as the +<>'s logical local <> in the logical local range. +All returned <> objects are offset-less. +Instances of the [code]#h_item# class are not +user-constructible and are passed by the runtime to each instance of the function object. The SYCL [code]#h_item# class template provides the common by-value semantics (see <>). -A synopsis of the SYCL [code]#h_item# class is provided below. The -member functions of the SYCL [code]#h_item# class are listed in -<>. The additional common special member -functions and common member functions are listed in -<> in <> -and <> respectively. +A synopsis of the SYCL [code]#h_item# class is provided below. +The member functions of the SYCL [code]#h_item# class are listed in +<>. +The additional common special member functions and common member functions +are listed in <> in +<> and +<> respectively. [source,,linenums] ---- @@ -12209,26 +12290,28 @@ size_t get_physical_local_id(int dimension) const [[group-class]] ==== [code]#group# class -The [code]#group# encapsulates all functionality -required to represent a particular <> within a -parallel execution. It is not user-constructible. +The [code]#group# encapsulates all functionality required to +represent a particular <> within a parallel execution. +It is not user-constructible. -The local range stored in the group class is provided either by -the programmer, when it is passed as an optional parameter to -[code]#parallel_for_work_group#, or by the runtime system when it -selects the optimal work-group size. This allows the developer to -always know how many work-items are in each executing work-group, even through -the abstracted iteration range of the [code]#parallel_for_work_item# loops. +The local range stored in the group class is provided either by the +programmer, when it is passed as an optional parameter to +[code]#parallel_for_work_group#, or by the runtime system when it selects +the optimal work-group size. +This allows the developer to always know how many work-items are in each +executing work-group, even through the abstracted iteration range of the +[code]#parallel_for_work_item# loops. -The SYCL [code]#group# class template provides the common by-value -semantics (see <>). +The SYCL [code]#group# class template provides the common by-value semantics +(see <>). -A synopsis of the SYCL [code]#group# class is provided below. The -member functions of the SYCL [code]#group# class are listed in -<>. The additional common special member -functions and common member functions are listed in -<> in <> -and <> respectively. +A synopsis of the SYCL [code]#group# class is provided below. +The member functions of the SYCL [code]#group# class are listed in +<>. +The additional common special member functions and common member functions +are listed in <> in +<> and +<> respectively. // Interface for class: group [source,,linenums] @@ -12540,19 +12623,20 @@ template void wait_for(EventTN... events) const [[sub-group-class]] ==== [code]#sub_group# class -The [code]#sub_group# class encapsulates all functionality -required to represent a particular <> within a -parallel execution. It is not user-constructible. +The [code]#sub_group# class encapsulates all functionality required to +represent a particular <> within a parallel execution. +It is not user-constructible. -The SYCL [code]#sub_group# class provides the common by-value -semantics (see <>). +The SYCL [code]#sub_group# class provides the common by-value semantics (see +<>). -A synopsis of the SYCL [code]#sub_group# class is provided below. The -member functions of the SYCL [code]#sub_group# class are listed in -<>. The additional common special member -functions and common member functions are listed in -<> in <> -and <> respectively. +A synopsis of the SYCL [code]#sub_group# class is provided below. +The member functions of the SYCL [code]#sub_group# class are listed in +<>. +The additional common special member functions and common member functions +are listed in <> in +<> and +<> respectively. // Interface for class: subgroup [source,,linenums] @@ -12666,12 +12750,12 @@ local id of 0. All functionality related to <> is captured by the [code]#reducer# class and the [code]#reduction# function. -The example below demonstrates how to write a <> -kernel that performs two reductions simultaneously on the same input values, -computing both the sum of all values in a buffer and the maximum value in the -buffer. For each reduction variable passed to [code]#parallel_for#, a -reference to a [code]#reducer# object is passed as a parameter to the kernel -function in the same order. +The example below demonstrates how to write a <> kernel that +performs two reductions simultaneously on the same input values, computing +both the sum of all values in a buffer and the maximum value in the buffer. +For each reduction variable passed to [code]#parallel_for#, a reference to a +[code]#reducer# object is passed as a parameter to the kernel function in +the same order. [source,,linenums] ---- @@ -12679,25 +12763,29 @@ include::{code_dir}/reduction.cpp[lines=4..-1] ---- Reductions are supported for all trivially copyable types (as defined by the -{cpp} core language). If the reduction operator is non-associative or -non-commutative, the behavior of a reduction may be non-deterministic. If -multiple reductions reference the same reduction variable, or a reduction -variable is accessed directly during the lifetime of a reduction (e.g. via an -[code]#accessor# or USM pointer), the behavior is undefined. - -Some of the overloads for the [code]#reduction# function take an identity value -and some do not. An implementation is required to compute a correct reduction -even when the application does not specify an identity value. However, the -implementation may be more efficient when the identity value is either provided -by the application or is known by the implementation. For reductions using -standard binary operators and fundamental types (e.g. [code]#plus# and -arithmetic types), an implementation can determine the correct identity value -automatically in order to avoid performance penalties. - -If an implementation can identify an identity value for a given combination of -accumulator type and function object type, the value is defined as a member of -the [code]#known_identity# trait class. Whether this member value exists can -be tested using the [code]#has_known_identity# trait class. +{cpp} core language). +If the reduction operator is non-associative or non-commutative, the +behavior of a reduction may be non-deterministic. +If multiple reductions reference the same reduction variable, or a reduction +variable is accessed directly during the lifetime of a reduction (e.g. via +an [code]#accessor# or USM pointer), the behavior is undefined. + +Some of the overloads for the [code]#reduction# function take an identity +value and some do not. +An implementation is required to compute a correct reduction even when the +application does not specify an identity value. +However, the implementation may be more efficient when the identity value is +either provided by the application or is known by the implementation. +For reductions using standard binary operators and fundamental types (e.g. +[code]#plus# and arithmetic types), an implementation can determine the +correct identity value automatically in order to avoid performance +penalties. + +If an implementation can identify an identity value for a given combination +of accumulator type and function object type, the value is defined as a +member of the [code]#known_identity# trait class. +Whether this member value exists can be tested using the +[code]#has_known_identity# trait class. [source,,linenums] ---- @@ -12885,54 +12973,55 @@ a@ |==== The reduction interface is limited to reduction variables whose size can be -determined at compile-time. As such, [code]#buffer# and USM pointer arguments -are interpreted by the reduction interface as describing a single variable. +determined at compile-time. +As such, [code]#buffer# and USM pointer arguments are interpreted by the +reduction interface as describing a single variable. A reduction operation associated with a [code]#span# represents an array -reduction. An array reduction of size _N_ is functionally equivalent to -specifying _N_ independent scalar reductions. The combination operations -performed by an array reduction are limited to the extent of a USM allocation -described by a [code]#span#, and access to elements outside of these regions -results in undefined behavior. +reduction. +An array reduction of size _N_ is functionally equivalent to specifying _N_ +independent scalar reductions. +The combination operations performed by an array reduction are limited to +the extent of a USM allocation described by a [code]#span#, and access to +elements outside of these regions results in undefined behavior. [NOTE] ==== -Since a [code]#span# is one-dimensional, there is currently no way to describe -an array reduction with more than one dimension. This is expected to change in -a future version of the SYCL specification, but depends on the introduction of -a multi-dimensional [code]#span#. +Since a [code]#span# is one-dimensional, there is currently no way to +describe an array reduction with more than one dimension. +This is expected to change in a future version of the SYCL specification, +but depends on the introduction of a multi-dimensional [code]#span#. ==== [[reduction-interface]] ==== [code]#reduction# interface -The [code]#reduction# interface is used to attach <> semantics -to a variable, by specifying: the reduction variable, the -reduction operator and an optional identity value associated with the operator. +The [code]#reduction# interface is used to attach <> semantics to +a variable, by specifying: the reduction variable, the reduction operator +and an optional identity value associated with the operator. The overloads of the interface are described in <>. The return value of the [code]#reduction# interface is an implementation-defined object of unspecified type, which is interpreted by -[code]#parallel_for# to construct an appropriate [code]#reducer# -type as detailed in <>. +[code]#parallel_for# to construct an appropriate [code]#reducer# type as +detailed in <>. -An implementation may use an unspecified number of temporary variables inside -of any [code]#reducer# objects it creates. If an identity value is supplied to -a reduction, an implementation will use that value to initialize any such -temporary variables. +An implementation may use an unspecified number of temporary variables +inside of any [code]#reducer# objects it creates. +If an identity value is supplied to a reduction, an implementation will use +that value to initialize any such temporary variables. [NOTE] ==== -Since the number of temporary variables is unspecified, supplying an identity -value different to the identity value associated with the reduction operator -may lead to unexpected results. +Since the number of temporary variables is unspecified, supplying an +identity value different to the identity value associated with the reduction +operator may lead to unexpected results. ==== -The initial value of the reduction variable is included -in the reduction operation, unless the [code]#property::reduction::initialize_to_identity# +The initial value of the reduction variable is included in the reduction +operation, unless the [code]#property::reduction::initialize_to_identity# property was specified when the [code]#reduction# interface was invoked. -The reduction variable -is updated so as to contain the result of the reduction when the kernel finishes -execution. +The reduction variable is updated so as to contain the result of the +reduction when the kernel finishes execution. [source,,linenums] ---- @@ -13033,8 +13122,8 @@ reduction(span vars, const T& identity, [[sec:reduction-properties]] ==== Reduction properties -The properties that can be provided when using the [code]#reduction# interface -are described in <>. +The properties that can be provided when using the [code]#reduction# +interface are described in <>. [[table.properties.reduction]] @@ -13081,35 +13170,39 @@ property::reduction::initialize_to_identity::initialize_to_identity() ==== [code]#reducer# class The [code]#reducer# class defines the interface between a work-item and a -reduction variable during the execution of a SYCL kernel, restricting access to -the underlying reduction variable. The intermediate values of a reduction -variable cannot be inspected during kernel execution, and the variable cannot -be updated using anything other than the reduction's specified combination -operation. The combination order of different reducers is unspecified, as are -when and how the value of each reducer is combined with the original reduction +reduction variable during the execution of a SYCL kernel, restricting access +to the underlying reduction variable. +The intermediate values of a reduction variable cannot be inspected during +kernel execution, and the variable cannot be updated using anything other +than the reduction's specified combination operation. +The combination order of different reducers is unspecified, as are when and +how the value of each reducer is combined with the original reduction variable. To enable compile-time specialization of reduction algorithms, the -implementation of the [code]#reducer# class is unspecified, -except for the functions and operators defined in <> -and <>. As such, developers should not specify the -template arguments of a [code]#reducer# directly, and should instead employ -generic programming techniques that allow kernel functions to accept a -reference to a variable of any [code]#reducer# type. Kernels written as -lambdas should employ [code]#auto&# or [code]#+auto&...+#, and kernels written as -function objects should employ template parameters or template parameter packs. - -An implementation must guarantee that it is safe for multiple work-items in a -kernel to call the combine function of a [code]#reducer# concurrently. An -implementation is free to re-use reducer variables (e.g. across work-groups -scheduled to the same compute unit) if it can guarantee that it is safe to do -so. - -The type aliases and constant static members of the [code]#reducer# class are -listed in <> and its member functions are listed in -<>. Additional shorthand operators may be made -available for certain combinations of reduction variable type and combination -operation, as described in <>. +implementation of the [code]#reducer# class is unspecified, except for the +functions and operators defined in <> and +<>. +As such, developers should not specify the template arguments of a +[code]#reducer# directly, and should instead employ generic programming +techniques that allow kernel functions to accept a reference to a variable +of any [code]#reducer# type. +Kernels written as lambdas should employ [code]#auto&# or +[code]#+auto&...+#, and kernels written as function objects should employ +template parameters or template parameter packs. + +An implementation must guarantee that it is safe for multiple work-items in +a kernel to call the combine function of a [code]#reducer# concurrently. +An implementation is free to re-use reducer variables (e.g. across +work-groups scheduled to the same compute unit) if it can guarantee that it +is safe to do so. + +The type aliases and constant static members of the [code]#reducer# class +are listed in <> and its member functions are listed in +<>. +Additional shorthand operators may be made available for certain +combinations of reduction variable type and combination operation, as +described in <>. [source,,linenums] ---- @@ -13256,70 +13349,79 @@ reducer& operator++(reducer& accum) [[sec:command.group.scope]] === Command group scope -A <>, as defined in <>, -may execute a single <> such as invoking a kernel, copying memory, -or executing a host task. It is legal for a <> to -statically contain more than one call to a <> function, but any -single execution of the <> may execute no more -than one <>. If an application fails to do this, the function that -submits the <> (i.e., [code]#queue::submit#) -must throw a synchronous [code]#exception# with the [code]#errc::invalid# error -code. The statements that call <> together with -the statements that define the requirements for a kernel form the -<>. The command group -function object takes as a parameter an instance of the <> class which -encapsulates all the member functions executed in the command group scope. -The member functions and objects defined in this scope will define the requirements for the -kernel execution or explicit memory operation, and will be used by the <> -to evaluate if the operation is ready for execution. +A <>, as defined in <>, may execute +a single <> such as invoking a kernel, copying memory, or executing +a host task. +It is legal for a <> to statically contain more than +one call to a <> function, but any single execution of the +<> may execute no more than one <>. +If an application fails to do this, the function that submits the +<> (i.e., [code]#queue::submit#) must throw a +synchronous [code]#exception# with the [code]#errc::invalid# error code. +The statements that call <> together with the statements +that define the requirements for a kernel form the +<>. +The command group function object takes as a parameter an instance of the +<> class which encapsulates all the member functions executed in +the command group scope. +The member functions and objects defined in this scope will define the +requirements for the kernel execution or explicit memory operation, and will +be used by the <> to evaluate if the operation is ready for +execution. Host code within a <> (typically setting up -requirements) is executed once, before the command group submit call returns. -This abstraction of the kernel -execution unifies the data with its processing, and consequently allows more -abstraction and flexibility in the parallel programming models that can be -implemented on top of SYCL. - -The <> and the [code]#handler# class -serve as an interface for the encapsulation of <>. -A <> is defined as a function object. All the device data accesses are -defined inside this group and any transfers are managed by the <>. The -rules for the data transfers regarding device and -host data accesses are better described in <>, -where buffers (<>) and accessor (<>) classes -are described. The overall memory model of the SYCL application is described in +requirements) is executed once, before the command group submit call +returns. +This abstraction of the kernel execution unifies the data with its +processing, and consequently allows more abstraction and flexibility in the +parallel programming models that can be implemented on top of SYCL. + +The <> and the [code]#handler# class serve as +an interface for the encapsulation of <>. +A <> is defined as a function object. +All the device data accesses are defined inside this group and any transfers +are managed by the <>. +The rules for the data transfers regarding device and host data accesses are +better described in <>, where buffers +(<>) and accessor (<>) classes are +described. +The overall memory model of the SYCL application is described in <>. -It is possible for a <> to fail to enqueue to a queue, -or for it to fail to execute correctly. A user can therefore supply a secondary -queue when submitting a command group to the primary queue. If the <> -fails to enqueue or execute a command group on a primary queue, it can attempt -to run the command group on the secondary queue. The circumstances in which it -is, or is not, possible for a <> to fall-back from primary to -secondary queue are unspecified in the specification. Even if a command group -is run on the secondary queue, the requirement that host code within the command group -is executed exactly once remains, regardless of whether the fallback queue is used for -execution. +It is possible for a <> to fail to enqueue to +a queue, or for it to fail to execute correctly. +A user can therefore supply a secondary queue when submitting a command +group to the primary queue. +If the <> fails to enqueue or execute a command group on a +primary queue, it can attempt to run the command group on the secondary +queue. +The circumstances in which it is, or is not, possible for a <> +to fall-back from primary to secondary queue are unspecified in the +specification. +Even if a command group is run on the secondary queue, the requirement that +host code within the command group is executed exactly once remains, +regardless of whether the fallback queue is used for execution. -The command group [code]#handler# class provides the interface -for all of the member functions that are able to be executed inside the command group +The command group [code]#handler# class provides the interface for all of +the member functions that are able to be executed inside the command group scope, and it is also provided as a scoped object to all of the data access -requests. The <> class provides the interface -in which every command in the command group scope will be submitted to a queue. +requests. +The <> class provides the interface in which every command in the +command group scope will be submitted to a queue. [[sec:handlerClass]] === Command group [code]#handler# class -A <> object can only be constructed by the SYCL -runtime. All of the accessors defined in <> take as a -parameter an instance of the <>, and all the -kernel invocation functions are member functions of this class. +A <> object can only be constructed by the SYCL runtime. +All of the accessors defined in <> take as a parameter +an instance of the <>, and all the kernel invocation functions are +member functions of this class. The constructors of the SYCL [code]#handler# class are described in <>. -It is disallowed for an instance of the SYCL [code]#handler# class to -be moved or copied. +It is disallowed for an instance of the SYCL [code]#handler# class to be +moved or copied. // Interface for class: handler [source,,linenums] @@ -13347,22 +13449,24 @@ handler(___unspecified___) ==== SYCL functions for adding requirements When an accessor is created from a <>, a *requirement* is -implicitly added to the <> for the accessor's data. However, -this does not happen when creating a [keyword]#placeholder# accessor. In order -to create a *requirement* for a [keyword]#placeholder# accessor, code -must call the [code]#handler::require()# member function. - -Note that the default constructed [code]#accessor# is not a placeholder, so it -may be passed to a <> without calling -[code]#handler::require()#. However, this accessor also has no underlying -memory object, so such an accessor does not create any *requirement* for the -command group, and attempting to access data elements from it produces -undefined behavior. +implicitly added to the <> for the accessor's data. +However, this does not happen when creating a [keyword]#placeholder# +accessor. +In order to create a *requirement* for a [keyword]#placeholder# accessor, +code must call the [code]#handler::require()# member function. + +Note that the default constructed [code]#accessor# is not a placeholder, so +it may be passed to a <> without calling +[code]#handler::require()#. +However, this accessor also has no underlying memory object, so such an +accessor does not create any *requirement* for the command group, and +attempting to access data elements from it produces undefined behavior. SYCL events may also be used to create requirements for a <>. Such requirements state that the actions represented by the events must -complete before the <> may execute. Such requirements -are added when code calls the [code]#handler::depends_on()# member function. +complete before the <> may execute. +Such requirements are added when code calls the +[code]#handler::depends_on()# member function. [[table.members.handler.requirements]] .Member functions of the [code]#handler# class @@ -13412,13 +13516,13 @@ by each event in [code]#depEvents# must complete before executing this [keyword]#data-parallel# <>, <> in <>, or [keyword]#hierarchical parallelism#. -Each function takes an optional kernel name template parameter. The user -may optionally provide a <>, otherwise an implementation-defined name -will be generated for the kernel. +Each function takes an optional kernel name template parameter. +The user may optionally provide a <>, otherwise an +implementation-defined name will be generated for the kernel. -All the functions for invoking kernels are member functions of the command group -[code]#handler# class (<>), which -is used to encapsulate all the member functions provided in a command group scope. +All the functions for invoking kernels are member functions of the command +group [code]#handler# class (<>), which is used to +encapsulate all the member functions provided in a command group scope. <> lists all the members of the [code]#handler# class related to the kernel invocation. @@ -13664,16 +13768,16 @@ associated with the secondary queue (if specified). ===== [code]#single_task# invoke SYCL provides a simple interface to enqueue a kernel that will be -sequentially executed on a device. Only one instance of the -kernel will be executed. This interface is useful as a primitive for more -complicated parallel algorithms, as it can easily create a chain of -sequential tasks on a SYCL device with each of them managing its -own data transfers. +sequentially executed on a device. +Only one instance of the kernel will be executed. +This interface is useful as a primitive for more complicated parallel +algorithms, as it can easily create a chain of sequential tasks on a SYCL +device with each of them managing its own data transfers. This function can only be called inside a command group using the [code]#handler# object created by the runtime. -Any accessors that are used in a kernel should be defined inside the -same command group. +Any accessors that are used in a kernel should be defined inside the same +command group. Local accessors are disallowed for single task invocations. @@ -13685,10 +13789,9 @@ include::{code_dir}/singletask.cpp[lines=4..-1] For single tasks, the kernel member function takes no parameters, as there is no need for <> in a unary index space. -A [code]#kernel_handler# can optionally be passed as a parameter -to the <> that is invoked by -[code]#single_task# for the purpose explained -in <>. +A [code]#kernel_handler# can optionally be passed as a parameter to the +<> that is invoked by [code]#single_task# for the +purpose explained in <>. [source,,linenums] ---- @@ -13698,46 +13801,47 @@ include::{code_dir}/singleTaskWithKernelHandler.cpp[lines=4..-1] ===== [code]#parallel_for# invoke -The [code]#parallel_for# member function of the SYCL -[code]#handler# class provides an interface to define and invoke a SYCL -kernel function in a command group, to execute in parallel execution over a -3 dimensional index space. There are three overloads of the -[code]#parallel_for# member function which provide variations of this -interface, each with a different level of complexity and providing a -different set of features. +The [code]#parallel_for# member function of the SYCL [code]#handler# class +provides an interface to define and invoke a SYCL kernel function in a +command group, to execute in parallel execution over a 3 dimensional index +space. +There are three overloads of the [code]#parallel_for# member function which +provide variations of this interface, each with a different level of +complexity and providing a different set of features. For the simplest case, users need only provide the global range (the total -number of work-items in the index space) via a SYCL [code]#range# -parameter. In this case the function object that represents the SYCL kernel -function must take one of: -1) a single SYCL [code]#item# parameter, 2) a single generic parameter -([code]#template# parameter or [code]#auto#) that will be treated as -an [code]#item# parameter, 3) any other type -implicitly converted from SYCL [code]#item#, representing the currently -executing work-item within the range specified by the [code]#range# -parameter. +number of work-items in the index space) via a SYCL [code]#range# parameter. +In this case the function object that represents the SYCL kernel function +must take one of: 1) a single SYCL [code]#item# parameter, 2) a single +generic parameter ([code]#template# parameter or [code]#auto#) that will be +treated as an [code]#item# parameter, 3) any other type implicitly converted +from SYCL [code]#item#, representing the currently executing work-item +within the range specified by the [code]#range# parameter. [NOTE] ==== -Case 3) above allows the kernel function to take an argument of type [code]#id# -because [code]#item# is implicitly convertible to [code]#id#. It also allows -a 1-D kernel function to take an integral argument (e.g. [code]#int# or -[code]#size_t#) because a 1-D [code]#item# is implicitly convertible to these -types. Finally, it allows the kernel function to take a user-defined argument -type that can be constructed from [code]#item#, enabling users to layer their -own abstractions on top of SYCL. +Case 3) above allows the kernel function to take an argument of type +[code]#id# because [code]#item# is implicitly convertible to [code]#id#. +It also allows a 1-D kernel function to take an integral argument (e.g. +[code]#int# or [code]#size_t#) because a 1-D [code]#item# is implicitly +convertible to these types. +Finally, it allows the kernel function to take a user-defined argument type +that can be constructed from [code]#item#, enabling users to layer their own +abstractions on top of SYCL. ==== The execution of the kernel function is the same whether the parameter to -the SYCL kernel function is a SYCL [code]#id# or a SYCL -[code]#item#. What differs is the functionality that is available to -the SYCL kernel function via the respective interfaces. +the SYCL kernel function is a SYCL [code]#id# or a SYCL [code]#item#. +What differs is the functionality that is available to the SYCL kernel +function via the respective interfaces. Below is an example of invoking a SYCL kernel function with -[code]#parallel_for# using a lambda function, and passing a SYCL -[code]#id# parameter. In this case, only the global id is available. +[code]#parallel_for# using a lambda function, and passing a SYCL [code]#id# +parameter. +In this case, only the global id is available. This variant of [code]#parallel_for# is designed for when it is not -necessary to query the global range of the index space being executed across. +necessary to query the global range of the index space being executed +across. [source,,linenums] ---- @@ -13745,11 +13849,11 @@ include::{code_dir}/basicparallelfor.cpp[lines=4..-1] ---- Below is an example of invoking a SYCL kernel function with -[code]#parallel_for# using a lambda function and passing a SYCL -[code]#item# parameter. In this case, both the global id and global -range are queryable. This variant of [code]#parallel_for# is designed -for when it is necessary to query the global range of the index space -being executed across. +[code]#parallel_for# using a lambda function and passing a SYCL [code]#item# +parameter. +In this case, both the global id and global range are queryable. +This variant of [code]#parallel_for# is designed for when it is necessary to +query the global range of the index space being executed across. [source,,linenums] ---- @@ -13757,12 +13861,13 @@ include::{code_dir}/basicParallelForItem.cpp[lines=4..-1] ---- Below is an example of invoking a SYCL kernel function with -[code]#parallel_for# using a lambda function and passing -[code]#auto# parameter, treated as [code]#item#. In this case, both -the global id and global range are queryable. The same effect can be -achieved using class with templatized [code]#operator()#. This variant -of [code]#parallel_for# is designed for when it is necessary to query -the global range within which the global id will vary. +[code]#parallel_for# using a lambda function and passing [code]#auto# +parameter, treated as [code]#item#. +In this case, both the global id and global range are queryable. +The same effect can be achieved using class with templatized +[code]#operator()#. +This variant of [code]#parallel_for# is designed for when it is necessary to +query the global range within which the global id will vary. [source,,linenums] ---- @@ -13771,10 +13876,13 @@ include::{code_dir}/basicParallelForGeneric.cpp[lines=4..-1] Below is an example of invoking a SYCL kernel function with [code]#parallel_for# using a lambda function and passing an integral type -parameter. This example is only valid when calling [code]#parallel_for# with -[code]#range<1>#. In this case only the global id is available. This variant of -[code]#parallel_for# is designed for when it is not necessary to query -the global range of the index space being executed across. +parameter. +This example is only valid when calling [code]#parallel_for# with +[code]#range<1>#. +In this case only the global id is available. +This variant of [code]#parallel_for# is designed for when it is not +necessary to query the global range of the index space being executed +across. [source,,linenums] ---- @@ -13782,8 +13890,8 @@ include::{code_dir}/basicParallelForIntegral.cpp[lines=4..-1] ---- The [code]#parallel_for# overload without an offset can be called with -either a number or a [code]#braced-init-list# with 1-3 elements. In that -case the following calls are equivalent: +either a number or a [code]#braced-init-list# with 1-3 elements. +In that case the following calls are equivalent: * [code]#parallel_for(N, some_kernel)# has same effect as [code]#parallel_for(range<1>(N), some_kernel)# @@ -13791,11 +13899,11 @@ case the following calls are equivalent: [code]#parallel_for(range<1>(N), some_kernel)# * [code]#parallel_for({N1, N2}, some_kernel)# has same effect as [code]#parallel_for(range<2>(N1, N2), some_kernel)# - * [code]#parallel_for({N1, N2, N3}, some_kernel)# has same effect - as [code]#parallel_for(range<3>(N1, N2, N3), some_kernel)# + * [code]#parallel_for({N1, N2, N3}, some_kernel)# has same effect as + [code]#parallel_for(range<3>(N1, N2, N3), some_kernel)# -Below is an example of invoking [code]#parallel_for# with a number -instead of an explicit [code]#range# object. +Below is an example of invoking [code]#parallel_for# with a number instead +of an explicit [code]#range# object. [source,,linenums] ---- @@ -13807,32 +13915,34 @@ For SYCL kernel functions invoked via the above described overload of the accessors or to use a <>. The following two examples show how a kernel function object can be launched -over a 3D grid, with 3 elements in each dimension. In the first case -work-item ids range from 0 to 2 inclusive, and in the second case -work-item ids run from 1 to 3. +over a 3D grid, with 3 elements in each dimension. +In the first case work-item ids range from 0 to 2 inclusive, and in the +second case work-item ids run from 1 to 3. [source,,linenums] ---- include::{code_dir}/parallelfor.cpp[lines=4..-1] ---- -The last case of a [code]#parallel_for# invocation enables low-level functionality -of work-items and work-groups. This becomes valuable when an execution -requires groups of work-items that communicate and synchronize. These are -exposed in SYCL through [code]#+parallel_for (nd_range,...)+# and the -[code]#nd_item# class. In this case, the developer needs to define the -[code]#nd_range# that the kernel will execute on in order to have fine -grained control of the enqueuing of the kernel. This variation of -parallel_for expects an [code]#nd_range#, specifying both local and -global ranges, defining the global number of work-items and the number in -each cooperating work-group. The function object that represents the SYCL -kernel function must take one of: -1) a single SYCL [code]#nd_item# parameter, 2) a single generic parameter -([code]#template# parameter or [code]#auto#) that will be treated as -an [code]#nd_item# parameter, 3) any other type converted -from SYCL [code]#nd_item#, representing the currently executing work-item -within the range specified by the [code]#nd_range# parameter. The -[code]#nd_item# parameter makes all information about the work-item and +The last case of a [code]#parallel_for# invocation enables low-level +functionality of work-items and work-groups. +This becomes valuable when an execution requires groups of work-items that +communicate and synchronize. +These are exposed in SYCL through [code]#+parallel_for (nd_range,...)+# and +the [code]#nd_item# class. +In this case, the developer needs to define the [code]#nd_range# that the +kernel will execute on in order to have fine grained control of the +enqueuing of the kernel. +This variation of parallel_for expects an [code]#nd_range#, specifying both +local and global ranges, defining the global number of work-items and the +number in each cooperating work-group. +The function object that represents the SYCL kernel function must take one +of: 1) a single SYCL [code]#nd_item# parameter, 2) a single generic +parameter ([code]#template# parameter or [code]#auto#) that will be treated +as an [code]#nd_item# parameter, 3) any other type converted from SYCL +[code]#nd_item#, representing the currently executing work-item within the +range specified by the [code]#nd_range# parameter. +The [code]#nd_item# parameter makes all information about the work-item and its position in the range available, and provides access to functions enabling the use of a <> to synchronize between the <> in the <>. @@ -13840,36 +13950,42 @@ enabling the use of a <> to synchronize between the [NOTE] ==== Case 3) above includes user-defined types that can be constructed from -[code]#nd_item#, enabling users to layer their own abstractions on top of SYCL. +[code]#nd_item#, enabling users to layer their own abstractions on top of +SYCL. ==== -The following example shows how sixty-four work-items may be launched -in a three-dimensional grid with four in each dimension, and divided -into eight work-groups. Each group of work-items synchronizes with a -<>. +The following example shows how sixty-four work-items may be launched in a +three-dimensional grid with four in each dimension, and divided into eight +work-groups. +Each group of work-items synchronizes with a <>. [source,,linenums] ---- include::{code_dir}/parallelforbarrier.cpp[lines=4..-1] ---- -In all of these cases the underlying <> will be created -and the kernel defined as a function object will be created and enqueued -as part of the command group scope. +In all of these cases the underlying <> will be created and the +kernel defined as a function object will be created and enqueued as part of +the command group scope. Some forms of [code]#parallel_for# accept an offset parameter of type -[code]#id#, where the number of dimensions of the [code]#id# is the same -as the number of dimensions of the [code]#range# that determines the iteration space. -These forms of [code]#parallel_for# execute the same number of iterations as the form -with no offset. The difference is that the [code]#id# or [code]#item# parameter passed -to the kernel function has the value of [code]#offset# implicitly added. +[code]#id#, where the number of dimensions of the [code]#id# is +the same as the number of dimensions of the [code]#range# that determines +the iteration space. +These forms of [code]#parallel_for# execute the same number of iterations as +the form with no offset. +The difference is that the [code]#id# or [code]#item# parameter passed to +the kernel function has the value of [code]#offset# implicitly added. This offset parameter is deprecated in SYCL 2020. -An offset can also be passed to the forms of [code]#parallel_for# that accept an -[code]#nd_range# via the third parameter to the [code]#nd_range# constructor. These -forms of [code]#parallel_for# also execute the same number of iterations as if no offset -was specified. The difference is that the [code]#nd_item# parameter passed to the kernel -function has the value of the offset implicitly added to the constituent <>. +An offset can also be passed to the forms of [code]#parallel_for# that +accept an [code]#nd_range# via the third parameter to the [code]#nd_range# +constructor. +These forms of [code]#parallel_for# also execute the same number of +iterations as if no offset was specified. +The difference is that the [code]#nd_item# parameter passed to the kernel +function has the value of the offset implicitly added to the constituent +<>. This offset parameter is deprecated in SYCL 2020. @@ -13886,37 +14002,38 @@ include::{code_dir}/parallelForWithKernelHandler.cpp[lines=4..-1] ===== Parallel for hierarchical invoke The hierarchical parallel kernel execution interface provides the same -functionality as is available from the <> interface, but -exposed differently. To execute the same sixty-four work-items in -eight work-groups that we saw in a previous example, we execute an -outer [code]#parallel_for_work_group# call to create the -groups. The member function -[code]#handler::parallel_for_work_group# is parameterized by the -number of work-groups, such that the size of each group is chosen by -the runtime, or by the number of work-groups and number of work-items -for users who need more control. - -The body of the outer [code]#parallel_for_work_group# call -consists of a lambda function or function object. The body of this -function object contains code that is executed only once for the -entire work-group. If the code has no side-effects and the compiler -heuristic suggests that it is more efficient to do so, this code will be -executed for each work-item. +functionality as is available from the <> interface, but exposed +differently. +To execute the same sixty-four work-items in eight work-groups that we saw +in a previous example, we execute an outer [code]#parallel_for_work_group# +call to create the groups. +The member function [code]#handler::parallel_for_work_group# is +parameterized by the number of work-groups, such that the size of each group +is chosen by the runtime, or by the number of work-groups and number of +work-items for users who need more control. + +The body of the outer [code]#parallel_for_work_group# call consists of a +lambda function or function object. +The body of this function object contains code that is executed only once +for the entire work-group. +If the code has no side-effects and the compiler heuristic suggests that it +is more efficient to do so, this code will be executed for each work-item. Within this region any variable declared will have the semantics of <>, shared between all <> in the -<>. If the -device compiler can prove that an array of such variables is accessed only by -a single work-item throughout the lifetime of the work-group, for +<>. +If the device compiler can prove that an array of such variables is accessed +only by a single work-item throughout the lifetime of the work-group, for example if access is derived from the id of the work-item with no -transformation, then it can allocate the data in private memory or -registers instead. +transformation, then it can allocate the data in private memory or registers +instead. -To guarantee use of private per-work-item memory, the -[code]#private_memory# class can be used to wrap the data. +To guarantee use of private per-work-item memory, the [code]#private_memory# +class can be used to wrap the data. This class simply constructs private data for a given group across the -entire group. The id of the current work-item is passed to any access -to grab the correct data. +entire group. +The id of the current work-item is passed to any access to grab the correct +data. The [code]#private_memory# class has the following interface: @@ -13957,52 +14074,53 @@ T& operator()(const h_item& id) |==== -<> is allocated per underlying <>, not per -iteration of the [code]#parallel_for_work_item# loop. The number -of instances of a private memory object is only under direct control -if a work-group size is passed to the -[code]#parallel_for_work_group# call. If the underlying -work-group size is chosen by the runtime, the number of private memory -instances is opaque to the program. Explicit private memory -declarations should therefore be used with care and with a full -understanding of which instances of a -[code]#parallel_for_work_item# loop will share the same -underlying variable. +<> is allocated per underlying <>, +not per iteration of the [code]#parallel_for_work_item# loop. +The number of instances of a private memory object is only under direct +control if a work-group size is passed to the +[code]#parallel_for_work_group# call. +If the underlying work-group size is chosen by the runtime, the number of +private memory instances is opaque to the program. +Explicit private memory declarations should therefore be used with care and +with a full understanding of which instances of a +[code]#parallel_for_work_item# loop will share the same underlying variable. Also within the lambda body can be a sequence of calls to -[code]#parallel_for_work_item#. At the edges of these inner parallel -executions the work-group synchronizes. As a result the pair of -[code]#parallel_for_work_item# calls in the code below is equivalent to -the parallel execution with a <> in the earlier -example. +[code]#parallel_for_work_item#. +At the edges of these inner parallel executions the work-group synchronizes. +As a result the pair of [code]#parallel_for_work_item# calls in the code +below is equivalent to the parallel execution with a <> +in the earlier example. [source,,linenums] ---- include::{code_dir}/parallelforworkgroup.cpp[lines=4..-1] ---- -It is valid to use more flexible dimensions of the work-item loops. In -the following example we issue 8 work-groups but let the runtime -choose their size, by not passing a work-group size to the -[code]#parallel_for_work_group# call. The -[code]#parallel_for_work_item# loops may also vary in size, with -their execution ranges unrelated to the dimensions of the work-group, -and the compiler generating an appropriate iteration space to fill the -gap. In this case, the [code]#h_item# provides access to local ids and -ranges that reflect both kernel and [code]#parallel_for_work_item# invocation ranges. +It is valid to use more flexible dimensions of the work-item loops. +In the following example we issue 8 work-groups but let the runtime choose +their size, by not passing a work-group size to the +[code]#parallel_for_work_group# call. +The [code]#parallel_for_work_item# loops may also vary in size, with their +execution ranges unrelated to the dimensions of the work-group, and the +compiler generating an appropriate iteration space to fill the gap. +In this case, the [code]#h_item# provides access to local ids and ranges +that reflect both kernel and [code]#parallel_for_work_item# invocation +ranges. [source,,linenums] ---- include::{code_dir}/parallelforworkgroup2.cpp[lines=4..-1] ---- -This interface offers a more intuitive way for tiling parallel -programming paradigms. In summary, the hierarchical model allows a -developer to distinguish the execution at work-group level and at -work-item level using the [code]#parallel_for_work_group# and the nested -[code]#parallel_for_work_item# functions. It also provides this visibility -to the compiler without the need for difficult loop fission such that -host execution may be more efficient. +This interface offers a more intuitive way for tiling parallel programming +paradigms. +In summary, the hierarchical model allows a developer to distinguish the +execution at work-group level and at work-item level using the +[code]#parallel_for_work_group# and the nested +[code]#parallel_for_work_item# functions. +It also provides this visibility to the compiler without the need for +difficult loop fission such that host execution may be more efficient. A [code]#kernel_handler# can optionally be passed as a parameter to the <> that is invoked by any variant of @@ -14020,53 +14138,56 @@ include::{code_dir}/parallelForWorkGroupWithKernelHandler.cpp[lines=4..-1] [[subsec:explicitmemory]] ==== SYCL functions for explicit memory operations -In addition to <>, <> objects can also be used to -perform manual operations on host and device memory by using the +In addition to <>, <> objects can also be +used to perform manual operations on host and device memory by using the [keyword]#copy# API of the <>. Manual copy operations can be seen as specialized kernels executing on the device, except that typically this operations will be implemented using a -host API that exists as part of a backend (e.g, OpenCL enqueue copy operations). - -These explicit copy operations have a source and a destination. When an -accessor is the _source_ of the operation, the destination can be a host -pointer or another accessor. The _source_ accessor must have either -[code]#access_mode::read# or [code]#access_mode::read_write# access mode. When -an accessor is the _destination_ of the explicit copy operation, the source can -be a host pointer or another accessor. The _destination_ accessor must have -either [code]#access_mode::write#, [code]#access_mode::read_write#, -[code]#access_mode::discard_write# or [code]#access_mode::discard_read_write# -access mode. +host API that exists as part of a backend (e.g, OpenCL enqueue copy +operations). + +These explicit copy operations have a source and a destination. +When an accessor is the _source_ of the operation, the destination can be a +host pointer or another accessor. +The _source_ accessor must have either [code]#access_mode::read# or +[code]#access_mode::read_write# access mode. +When an accessor is the _destination_ of the explicit copy operation, the +source can be a host pointer or another accessor. +The _destination_ accessor must have either [code]#access_mode::write#, +[code]#access_mode::read_write#, [code]#access_mode::discard_write# or +[code]#access_mode::discard_read_write# access mode. When an accessor is used as a parameter to one of these explicit copy operations, the target must be either [code]#target::device# or [code]#target::constant_buffer#. -When accessors are both the source and the destination, -the operation is executed on objects controlled by the SYCL runtime. -The SYCL runtime is allowed to not perform an explicit in-copy operation -if a different path to update the data is available according to -the SYCL application memory model. - -The most recent copy of the memory object may reside on any context controlled -by the SYCL runtime, or on the host in a pointer controlled by the -SYCL runtime. The SYCL runtime will ensure that data is copied to the destination -once the <> has completed execution. - -Whenever a host pointer is used as either the source or the destination of these -explicit memory operations, it is the responsibility -of the user for that pointer to have at least as much memory allocated as -the accessor is giving access to, e.g: if an accessor accesses a range -of 10 elements of [code]#int# type, the host pointer must at least have -[code]#10 * sizeof(int)# bytes of memory allocated. +When accessors are both the source and the destination, the operation is +executed on objects controlled by the SYCL runtime. +The SYCL runtime is allowed to not perform an explicit in-copy operation if +a different path to update the data is available according to the SYCL +application memory model. + +The most recent copy of the memory object may reside on any context +controlled by the SYCL runtime, or on the host in a pointer controlled by +the SYCL runtime. +The SYCL runtime will ensure that data is copied to the destination once the +<> has completed execution. + +Whenever a host pointer is used as either the source or the destination of +these explicit memory operations, it is the responsibility of the user for +that pointer to have at least as much memory allocated as the accessor is +giving access to, e.g: if an accessor accesses a range of 10 elements of +[code]#int# type, the host pointer must at least have [code]#10 * +sizeof(int)# bytes of memory allocated. A special case is the [code]#update_host# member function. -This member function only requires an accessor, and instructs the runtime to update -the internal copy of the data in the host, if any. This is particularly -useful when users use manual synchronization with host pointers, e.g. -via mutex objects on the [code]#buffer# constructors. +This member function only requires an accessor, and instructs the runtime to +update the internal copy of the data in the host, if any. +This is particularly useful when users use manual synchronization with host +pointers, e.g. via mutex objects on the [code]#buffer# constructors. -<> describes the interface for the -explicit copy operations. +<> describes the interface for the explicit copy +operations. [[table.members.handler.copy]] @@ -14244,10 +14365,10 @@ the default behavior. For more detail on USM, please see <>. |==== -The listing below illustrates how to use explicit copy -operations in SYCL. The example copies half of the contents of -a [code]#std::vector# into the device, leaving the rest of the -contents of the buffer on the device unchanged. +The listing below illustrates how to use explicit copy operations in SYCL. +The example copies half of the contents of a [code]#std::vector# into the +device, leaving the rest of the contents of the buffer on the device +unchanged. [source,,linenums] ---- @@ -14266,25 +14387,27 @@ include::{code_dir}/explicitcopy.cpp[lines=4..-1] include::{header_dir}/handler/useKernelBundle.h[lines=4..-1] ---- -_Effects:_ The <> associated with the [code]#handler# will use -<> of the [code]#kernel_bundle# [code]#execBundle# -in any of its <>. If the -[code]#kernel_bundle# contains multiple <> that are -compatible with the <> to which the kernel is submitted, then the -<> chosen is implementation-defined. - -If the <> attempts to invoke a kernel that is not contained by -a compatible device image in [code]#execBundle#, the -<> throws a synchronous [code]#exception# with the -[code]#errc::kernel_not_supported# error code. If the <> has a -secondary queue, then the [code]#execBundle# must contain a kernel that is -compatible with both the primary queue's device and the secondary queue's -device, otherwise the <> throws this exception. - -Since the handler method for setting specialization constants is incompatible -with the kernel bundle method, applications should not call this function if -[code]#handler::set_specialization_constant()# has been previously called for -this same <>. +_Effects:_ The <> associated with the [code]#handler# will +use <> of the [code]#kernel_bundle# +[code]#execBundle# in any of its <>. +If the [code]#kernel_bundle# contains multiple <> that are compatible with the <> to which the kernel is +submitted, then the <> chosen is implementation-defined. + +If the <> attempts to invoke a kernel that is not contained +by a compatible device image in [code]#execBundle#, the +<> throws a synchronous [code]#exception# with +the [code]#errc::kernel_not_supported# error code. +If the <> has a secondary queue, then the [code]#execBundle# +must contain a kernel that is compatible with both the primary queue's +device and the secondary queue's device, otherwise the +<> throws this exception. + +Since the handler method for setting specialization constants is +incompatible with the kernel bundle method, applications should not call +this function if [code]#handler::set_specialization_constant()# has been +previously called for this same <>. _Throws:_ @@ -14301,71 +14424,79 @@ _Throws:_ === Specialization constants -Device code can make use of <> -which represent constants whose values can be set dynamically during execution -of the <>. The values of these constants are fixed when a -<> is invoked, and they do not change during the -execution of the kernel. However, the application is able to set a new value -for a specialization constant each time a kernel is invoked, so the values can -be tuned differently for each invocation. - -There are two methods for an application to use specialization constants, one -method requires creating a [code]#kernel_bundle# object and the other does not. -The syntax for both methods is mostly the same. Both methods declare -specialization constants in the same way, and kernels read their values in the -same way. The main difference is whether their values are set via +Device code can make use of <> which represent constants whose values can be set dynamically +during execution of the <>. +The values of these constants are fixed when a <> is +invoked, and they do not change during the execution of the kernel. +However, the application is able to set a new value for a specialization +constant each time a kernel is invoked, so the values can be tuned +differently for each invocation. + +There are two methods for an application to use specialization constants, +one method requires creating a [code]#kernel_bundle# object and the other +does not. +The syntax for both methods is mostly the same. +Both methods declare specialization constants in the same way, and kernels +read their values in the same way. +The main difference is whether their values are set via [code]#handler::set_specialization_constant()# or via -[code]#kernel_bundle::set_specialization_constant()#. These two methods are -incompatible with one another, so they may not both be used by the same -<>. +[code]#kernel_bundle::set_specialization_constant()#. +These two methods are incompatible with one another, so they may not both be +used by the same <>. [NOTE] ==== -Implementations that support online compilation of kernel bundles will likely -implement both methods of specialization constants using kernel bundles. +Implementations that support online compilation of kernel bundles will +likely implement both methods of specialization constants using kernel +bundles. Therefore, applications should expect that there is some overhead associated -with invoking a kernel with new values for its specialization constants. A -typical implementation records the values of specialization constants set via -[code]#handler::set_specialization_constant()# and remembers these values until -a kernel is invoked (e.g. via [code]#parallel_for()#). At this point, the -implementation determines the bundle that contains the invoked kernel. If -that bundle has already been compiled for the handler's device and compiled -with the correct values for the specialization constants, the kernel is -scheduled for invocation. Otherwise, the implementation compiles the -bundle before scheduling the kernel for invocation. Therefore, applications -that frequently change the values of specialization constants may see an -overhead associated with recompilation of the kernel's bundle. +with invoking a kernel with new values for its specialization constants. +A typical implementation records the values of specialization constants set +via [code]#handler::set_specialization_constant()# and remembers these +values until a kernel is invoked (e.g. via [code]#parallel_for()#). +At this point, the implementation determines the bundle that contains the +invoked kernel. +If that bundle has already been compiled for the handler's device and +compiled with the correct values for the specialization constants, the +kernel is scheduled for invocation. +Otherwise, the implementation compiles the bundle before scheduling the +kernel for invocation. +Therefore, applications that frequently change the values of specialization +constants may see an overhead associated with recompilation of the kernel's +bundle. ==== ==== Declaring a specialization constant -Specialization constants must be declared using the [code]#specialization_id# -class with the following restrictions: +Specialization constants must be declared using the +[code]#specialization_id# class with the following restrictions: * the template parameter [code]#T# must be a <> type; -* the [code]#specialization_id# variable must be declared as [code]#constexpr#; -* the [code]#specialization_id# variable must be declared in either namespace - scope or in class scope; -* if the [code]#specialization_id# variable is declared in class scope, it must - have public accessibility when referenced from namespace scope; +* the [code]#specialization_id# variable must be declared as + [code]#constexpr#; +* the [code]#specialization_id# variable must be declared in either + namespace scope or in class scope; +* if the [code]#specialization_id# variable is declared in class scope, it + must have public accessibility when referenced from namespace scope; * the [code]#specialization_id# variable may not be shadowed by another identifier [code]#X# which has the same name and is declared in an - [code]#inline# namespace, such that the [code]#specialization_id# variable is - no longer accessible after the declaration of [code]#X#; -* if the [code]#specialization_id# variable is declared in a namespace, none of - the enclosing namespace names [code]#N# may be shadowed by another identifier - [code]#X# which has the same name as [code]#N# and is declared in an - [code]#inline# namespace, such that [code]#N# is no longer accessible after - the declaration of [code]#X#. + [code]#inline# namespace, such that the [code]#specialization_id# variable + is no longer accessible after the declaration of [code]#X#; +* if the [code]#specialization_id# variable is declared in a namespace, none + of the enclosing namespace names [code]#N# may be shadowed by another + identifier [code]#X# which has the same name as [code]#N# and is declared + in an [code]#inline# namespace, such that [code]#N# is no longer + accessible after the declaration of [code]#X#. [NOTE] ==== The expectation is that some implementations may conceptually insert code at the end of a translation unit which references each `specialization_id` -variable that is declared in that translation unit. The restrictions listed -above make this possible by ensuring that these variables are accessible at the -end of the translation unit. +variable that is declared in that translation unit. +The restrictions listed above make this possible by ensuring that these +variables are accessible at the end of the translation unit. ==== The following example illustrates some of these restrictions: @@ -14389,8 +14520,8 @@ include::{header_dir}/expressingParallelism/classSpecializationId.h[lines=4..-1] template explicit constexpr specialization_id(Args&&... args); ---- -_Constraints:_ Available only when [code]#+std::is_constructible_v+# -evaluates to [code]#true#. +_Constraints:_ Available only when [code]#+std::is_constructible_v+# evaluates to [code]#true#. _Effects:_ Constructs a [code]#specialization_id# containing an instance of [code]#T# initialized with [code]#+args...+#, which represents the @@ -14417,15 +14548,16 @@ specialization_id& operator=(specialization_id&& rhs) = delete; // (4) If the application uses specialization constants without creating a [code]#kernel_bundle# object, it can set and get their values from <> by calling member functions of the [code]#handler# -class. These member functions have a template parameter [code]#SpecName# whose +class. +These member functions have a template parameter [code]#SpecName# whose value must be a reference to a variable of type [code]#specialization_id#, which defines the type and default value of the specialization constant. -When not using a kernel bundle, the value of a specialization constant that is -used in a kernel invoked from a <> is affected by calls to set -its value from that same <>, but it is not affected by calls -from other <> even if those calls are from -another invocation of the same <>. +When not using a kernel bundle, the value of a specialization constant that +is used in a kernel invoked from a <> is affected by calls to +set its value from that same <>, but it is not affected by +calls from other <> even if those calls are +from another invocation of the same <>. [source] ---- @@ -14435,18 +14567,19 @@ void set_specialization_constant( ---- _Effects:_ Sets the value of the specialization constant whose address is -[code]#SpecName# for this handler's <>. If the specialization -constant's value was previously set in this same <>, the value -is overwritten. +[code]#SpecName# for this handler's <>. +If the specialization constant's value was previously set in this same +<>, the value is overwritten. This function may be called even if the specialization constant [code]#SpecName# isn't used by the kernel that is invoked by this handler's -<>. Doing so has no effect on the invoked kernel. +<>. +Doing so has no effect on the invoked kernel. _Throws:_ - * An [code]#exception# with the [code]#errc::invalid# error code if - a kernel bundle has been bound to the [code]#handler# via + * An [code]#exception# with the [code]#errc::invalid# error code if a + kernel bundle has been bound to the [code]#handler# via [code]#use_kernel_bundle()#. [source] @@ -14457,25 +14590,27 @@ get_specialization_constant(); ---- _Returns:_ The value of the specialization constant whose address is -[code]#SpecName# for this handler's <>. If the value was -previously set in this handler's <>, that value is returned. +[code]#SpecName# for this handler's <>. +If the value was previously set in this handler's <>, that +value is returned. Otherwise, the specialization constant's default value is returned. _Throws:_ - * An [code]#exception# with the [code]#errc::invalid# error code if - a kernel bundle has been bound to the [code]#handler# via + * An [code]#exception# with the [code]#errc::invalid# error code if a + kernel bundle has been bound to the [code]#handler# via [code]#use_kernel_bundle()#. [[sec:spec-constants.device-code]] ==== Reading the value of a specialization constant from device code -In order to read the value of a specialization constant from device code, the -<> must be declared to take an object of type -[code]#kernel_handler# as its last parameter. The <> constructs -this object, which has a member function for reading the specialization -constant's value. A synopsis of this class is shown below. +In order to read the value of a specialization constant from device code, +the <> must be declared to take an object of type +[code]#kernel_handler# as its last parameter. +The <> constructs this object, which has a member function for +reading the specialization constant's value. +A synopsis of this class is shown below. [source,,linenums] ---- @@ -14493,20 +14628,22 @@ get_specialization_constant(); ---- _Returns:_ The value of the <> whose address is -[code]#SpecName#. For a kernel invoked from a <> that was not -bound to a kernel bundle, the value is the same as what would have been -returned if [code]#handler::get_specialization_constant()# was called -immediately before invoking the kernel. For a kernel invoked from a -<> that was bound to a kernel bundle, the value is the same as -what would be returned if [code]#kernel_bundle::get_specialization_constant()# -was called on the bound bundle. +[code]#SpecName#. +For a kernel invoked from a <> that was not bound to a kernel +bundle, the value is the same as what would have been returned if +[code]#handler::get_specialization_constant()# was called immediately before +invoking the kernel. +For a kernel invoked from a <> that was bound to a kernel +bundle, the value is the same as what would be returned if +[code]#kernel_bundle::get_specialization_constant()# was called on the bound +bundle. ==== Example usage The following example performs a convolution and uses -<> to set the values of the -coefficients. +<> to set the values of +the coefficients. [source,,linenums] ---- @@ -14522,14 +14659,15 @@ include::{code_dir}/usingSpecConstants.cpp[lines=4..-1] === Overview A <> is a native {cpp} callable which is scheduled by the -<>. A <> is submitted to a <> via a -<> by a <>. +<>. +A <> is submitted to a <> via a <> by a +<>. When a <> is submitted to a <> it is scheduled based on its data dependencies with other <> including -<> and asynchronous copies, resolving any -requisites created by <> attached to the <> as -defined in <>. +<> and asynchronous +copies, resolving any requisites created by <> attached +to the <> as defined in <>. Since a <> is invoked directly by the <> rather than being compiled as a <>, it does not have the same @@ -14544,39 +14682,41 @@ A <> can be enqueued on any <> and the callable will be invoked directly by the SYCL runtime, regardless of which <> the <> is associated with. -A <> is enqueued on a <> via the [code]#host_task# -member function of the [code]#handler# class. +A <> is enqueued on a <> via the [code]#host_task# member +function of the [code]#handler# class. The <> returned by the submission of the associated <> enters the completed state (corresponding to a status of [code]#info::event_command_status::complete#) once the invocation of the provided {cpp} callable has returned. -Any uncaught exception thrown during the execution of a <> will be -turned into an <> that can be handled as described in +Any uncaught exception thrown during the execution of a <> will +be turned into an <> that can be handled as described in <>. A <> can optionally be used to interoperate with the -<> associated with the <> executing the -<>, the <> that the <> is associated with, the -<> that the <> is associated with and the <> -that have been captured in the callable, via an optional -[code]#interop_handle# parameter. - -This allows <> to be used for two purposes: either as a -task which can perform arbitrary {cpp} code within the scheduling of the +<> associated with the +<> executing the <>, the <> that the <> is +associated with, the <> that the <> is associated with and +the <> that have been captured in the callable, via an +optional [code]#interop_handle# parameter. + +This allows <> to be used for two purposes: either as +a task which can perform arbitrary {cpp} code within the scheduling of the <> or as a task which can perform interoperability at a point within the scheduling of the <>. For the former use case, construct a buffer accessor with [code]#target::host_task# or an image accessor with -[code]#image_target::host_task#. This makes the buffer or image available -on the host during execution of the <>. +[code]#image_target::host_task#. +This makes the buffer or image available on the host during execution of the +<>. -For the latter case, construct a buffer accessor with -[code]#target::device# or [code]#target::constant_buffer#, or construct -an image accessor with [code]#image_target::device#. This makes the buffer or -image available on the device that is associated with the queue used to submit -the <>, so that it can be accessed via interoperability member -functions provided by the [code]#interop_handle# class. +For the latter case, construct a buffer accessor with [code]#target::device# +or [code]#target::constant_buffer#, or construct an image accessor with +[code]#image_target::device#. +This makes the buffer or image available on the device that is associated +with the queue used to submit the <>, so that it can be accessed +via interoperability member functions provided by the [code]#interop_handle# +class. Local <> cannot be used within a <>. @@ -14591,15 +14731,16 @@ include::{header_dir}/hostTask/hostTaskSynopsis.h[lines=4..-1] [[subsec:interfaces.hosttasks.interophandle]] === Class [code]#interop_handle# -The [code]#interop_handle# class is an abstraction over the <> -which is being used to invoke the <> and its associated -<> and <>. It also represents the state of the -<> dependency model at the point the <> is invoked. +The [code]#interop_handle# class is an abstraction over the <> which +is being used to invoke the <> and its associated <> and +<>. +It also represents the state of the <> dependency model at the +point the <> is invoked. The [code]#interop_handle# class provides access to the <> associated with the <>, <>, -<> and any <> or <> that are captured in -the callable being invoked in order to allow a <> to be used +<> and any <> or <> that are captured +in the callable being invoked in order to allow a <> to be used for interoperability purposes. An [code]#interop_handle# cannot be constructed by user-code, only by the @@ -14631,8 +14772,9 @@ include::{header_dir}/hostTask/classInteropHandle/constructors.h[lines=4..-1] include::{header_dir}/hostTask/classInteropHandle/getbackend.h[lines=4..-1] ---- - . _Returns:_ Returns a [code]#backend# identifying the <> associated - with the <> associated with this [code]#interop_handle#. + . _Returns:_ Returns a [code]#backend# identifying the <> + associated with the <> associated with this + [code]#interop_handle#. [[subsec:interfaces.hosttask.interophandle.getnative]] ==== Template member functions [code]#get_native_*# @@ -14642,105 +14784,110 @@ include::{header_dir}/hostTask/classInteropHandle/getbackend.h[lines=4..-1] include::{header_dir}/hostTask/classInteropHandle/getnativeX.h[lines=4..-1] ---- - . _Constraints:_ Available only if the optional interoperability - function [code]#get_native# taking a [code]#buffer# is - available and if [code]#accTarget# is - [code]#target::device#. + . _Constraints:_ Available only if the optional interoperability function + [code]#get_native# taking a [code]#buffer# is available and if + [code]#accTarget# is [code]#target::device#. + -- _Returns:_ The <> associated with the underlying -<> of <> [code]#bufferAcc#. The <> -returned must be in a state where it represents the memory in its current state -within the <> dependency model and is capable of being used in a -way appropriate for the associated <>. It is undefined behavior to use -the <> outside of the scope of the <>. - -_Throws:_ An [code]#exception# with the [code]#errc::invalid# error code if the -<> [code]#bufferAcc# was not registered with the -<> which contained the <>. Must throw an [code]#exception# with the -[code]#errc::backend_mismatch# error code if [code]#Backend != get_backend()#. +<> of <> [code]#bufferAcc#. +The <> returned must be in a state where it +represents the memory in its current state within the <> +dependency model and is capable of being used in a way appropriate for the +associated <>. +It is undefined behavior to use the <> outside of the +scope of the <>. + +_Throws:_ An [code]#exception# with the [code]#errc::invalid# error code if +the <> [code]#bufferAcc# was not registered with the +<> which contained the <>. +Must throw an [code]#exception# with the [code]#errc::backend_mismatch# +error code if [code]#Backend != get_backend()#. -- - . _Constraints:_ Available only if the optional interoperability - function [code]#get_native# taking an [code]#unsampled_image# - is available. + . _Constraints:_ Available only if the optional interoperability function + [code]#get_native# taking an [code]#unsampled_image# is available. + -- _Returns:_ The <> associated with with the underlying -[code]#unsampled_image# of <> [code]#imageAcc#. The -<> returned must be in a state where it represents the -memory in its current state within the <> dependency model and is -capable of being used in a way appropriate for the associated <>. It -is undefined behavior to use the <> outside of the scope -of the <>. - -_Throws:_ An [code]#exception# with the [code]#errc::invalid# error code if the -<> [code]#imageAcc# was not registered with the +[code]#unsampled_image# of <> [code]#imageAcc#. +The <> returned must be in a state where it +represents the memory in its current state within the <> +dependency model and is capable of being used in a way appropriate for the +associated <>. +It is undefined behavior to use the <> outside of the +scope of the <>. + +_Throws:_ An [code]#exception# with the [code]#errc::invalid# error code if +the <> [code]#imageAcc# was not registered with the <> which contained the <>. -- - . _Constraints:_ Available only if the optional interoperability - function [code]#get_native# taking an [code]#sampled_image# - is available. + . _Constraints:_ Available only if the optional interoperability function + [code]#get_native# taking an [code]#sampled_image# is available. + -- _Returns:_ The <> associated with with the underlying -[code]#sampled_image# of <> [code]#imageAcc#. The -<> returned must be in a state where it represents the -memory in its current state within the <> dependency model and is -capable of being used in a way appropriate for the associated <>. It -is undefined behavior to use the <> outside of the scope -of the <>. - -_Throws:_ An [code]#exception# with the [code]#errc::invalid# error code if the -<> [code]#imageAcc# was not registered with the -<> which contained the <>. Must throw an [code]#exception# with the -[code]#errc::backend_mismatch# error code if [code]#Backend != get_backend()#. +[code]#sampled_image# of <> [code]#imageAcc#. +The <> returned must be in a state where it +represents the memory in its current state within the <> +dependency model and is capable of being used in a way appropriate for the +associated <>. +It is undefined behavior to use the <> outside of the +scope of the <>. + +_Throws:_ An [code]#exception# with the [code]#errc::invalid# error code if +the <> [code]#imageAcc# was not registered with the +<> which contained the <>. +Must throw an [code]#exception# with the [code]#errc::backend_mismatch# +error code if [code]#Backend != get_backend()#. -- - . _Constraints:_ Available only if the optional interoperability - function [code]#get_native# taking a [code]#queue# is - available. + . _Constraints:_ Available only if the optional interoperability function + [code]#get_native# taking a [code]#queue# is available. + -- -_Returns:_ The <> associated with the <> that the -<> was submitted to. If the <> was submitted with a -secondary <> and the fall-back was triggered, the <> that is -associated with the [code]#interop_handle# must be the fall-back <>. The -<> returned must be in a state where it is capable of -being used in a way appropriate for the associated <>. It is undefined -behavior to use the <> outside of the scope of the -<>. +_Returns:_ The <> associated with the <> that +the <> was submitted to. +If the <> was submitted with a secondary <> and the +fall-back was triggered, the <> that is associated with the +[code]#interop_handle# must be the fall-back <>. +The <> returned must be in a state where it is +capable of being used in a way appropriate for the associated <>. +It is undefined behavior to use the <> outside of the +scope of the <>. _Throws:_ Must throw an [code]#exception# with the -[code]#errc::backend_mismatch# error code if [code]#Backend != get_backend()#. +[code]#errc::backend_mismatch# error code if [code]#Backend != +get_backend()#. -- - . _Constraints:_ Available only if the optional interoperability - function [code]#get_native# taking a [code]#device# is - available. + . _Constraints:_ Available only if the optional interoperability function + [code]#get_native# taking a [code]#device# is available. + -- -_Returns:_ The <> associated with the <> that is -associated with the <> that the <> was submitted to. The -<> returned must be in a state where it is capable of -being used in a way appropriate for the associated <>. It is -undefined behavior to use the <> outside of the scope of -the <>. +_Returns:_ The <> associated with the <> that +is associated with the <> that the <> was submitted to. +The <> returned must be in a state where it is +capable of being used in a way appropriate for the associated <>. +It is undefined behavior to use the <> outside of the +scope of the <>. _Throws:_ Must throw an [code]#exception# with the -[code]#errc::backend_mismatch# error code if [code]#Backend != get_backend()#. +[code]#errc::backend_mismatch# error code if [code]#Backend != +get_backend()#. -- - . _Constraints:_ Available only if the optional interoperability - function [code]#get_native# taking a [code]#context# is - available. + . _Constraints:_ Available only if the optional interoperability function + [code]#get_native# taking a [code]#context# is available. + -- -_Returns:_ The <> associated with the <> that -is associated with the <> that the <> was submitted to. The -<> returned must be in a state where it is capable of -being used in a way appropriate for the associated <>. It is -undefined behavior to use the <> outside of the scope of -the <>. +_Returns:_ The <> associated with the <> +that is associated with the <> that the <> was submitted +to. +The <> returned must be in a state where it is +capable of being used in a way appropriate for the associated <>. +It is undefined behavior to use the <> outside of the +scope of the <>. _Throws:_ Must throw an [code]#exception# with the -[code]#errc::backend_mismatch# error code if [code]#Backend != get_backend()#. +[code]#errc::backend_mismatch# error code if [code]#Backend != +get_backend()#. -- @@ -14756,27 +14903,29 @@ include::{header_dir}/hostTask/classHandler/hostTask.h[lines=4..-1] ---- . _Effects:_ Enqueues an implementation-defined command to the - <> to invoke [code]#hostTaskCallable# exactly once. The - scheduling of the invocation of [code]#hostTaskCallable# in relation to - other <> enqueued to the <> must be in accordance - with the dependency model described in <>. + <> to invoke [code]#hostTaskCallable# exactly once. + The scheduling of the invocation of [code]#hostTaskCallable# in relation + to other <> enqueued to the <> must be + in accordance with the dependency model described in + <>. Initializes an [code]#interop_handle# object and passes it to [code]#hostTaskCallable# when it is invoked if [code]#std::is_invocable_v# evaluates to - [code]#true#, otherwise invokes [code]#hostTaskCallable# as a - nullary function. + [code]#true#, otherwise invokes [code]#hostTaskCallable# as a nullary + function. [[sec:interfaces.bundles]] == Kernel bundles -Kernel bundles provide several features to a <>. For -implementations that support an online compiler, they provide fine grained -control over the online compilation of device code. For example, an -application can use a kernel bundle to compile its <> at a -specific time during the application's execution (such as during its -initialization), rather than relying on the implementation's default behavior -(which may not compile kernels until they are submitted). +Kernel bundles provide several features to a <>. +For implementations that support an online compiler, they provide fine +grained control over the online compilation of device code. +For example, an application can use a kernel bundle to compile its +<> at a specific time during the application's execution +(such as during its initialization), rather than relying on the +implementation's default behavior (which may not compile kernels until they +are submitted). Kernel bundles also provide a way for the application to set the values of specialization constants in many kernels before any of them are submitted to @@ -14786,44 +14935,50 @@ Kernel bundles provide a way for the application to introspect its kernels. For example, an application can use a bundle to query a kernel's work-group size when it is run on a specific device. -Finally, kernel bundles provide an extension point to interoperate with backend -and device specific features. Some examples of this include invocation of -device specific built-in kernels, online compilation of kernel code with vendor -specific options, or interoperation with kernels created with backend APIs. +Finally, kernel bundles provide an extension point to interoperate with +backend and device specific features. +Some examples of this include invocation of device specific built-in +kernels, online compilation of kernel code with vendor specific options, or +interoperation with kernels created with backend APIs. === Overview A kernel bundle is a high-level abstraction which represents a set of -<> that are associated with a <> and can be executed -on a number of <>, where each device is associated with that -same context. Depending on how a bundle is obtained, it could represent all of -the <> in the <>, +<> that are associated with a <> and can be +executed on a number of <>, where each device is associated +with that same context. +Depending on how a bundle is obtained, it could represent all of the +<> in the <>, or a certain subset of them. A kernel bundle is composed of one or more <>, -where each device image is an indivisible unit of compilation and/or linking. -When the <> compiles or links one of the kernels represented by -the device image, it must also compile or link any other kernels the device -image represents. Once a device image is compiled and linked, any of the other -kernels which that device image represents may be invoked without further -compilation or linking. - -Each <> a bundle represents must reside in at least one -of the bundle's device images. However, it is not necessary for each device -image to contain all of the kernel functions that the bundle represents. The -granularity in which kernel functions are grouped into device images is an -implementation detail. +where each device image is an indivisible unit of compilation and/or +linking. +When the <> compiles or links one of the kernels represented +by the device image, it must also compile or link any other kernels the +device image represents. +Once a device image is compiled and linked, any of the other kernels which +that device image represents may be invoked without further compilation or +linking. + +Each <> a bundle represents must reside in at least +one of the bundle's device images. +However, it is not necessary for each device image to contain all of the +kernel functions that the bundle represents. +The granularity in which kernel functions are grouped into device images is +an implementation detail. [NOTE] ==== -To illustrate the intent of device images, a hypothetical implementation could -represent an application's kernel functions in both the SPIR-V format and also -in a native device code format. The implementation's ahead-of-time compiler -in this example produces device images with native code for certain devices and -also produces SPIR-V device images for use with other devices. Note that in -such an implementation, a particular kernel function could be represented in -more than one device image. +To illustrate the intent of device images, a hypothetical implementation +could represent an application's kernel functions in both the SPIR-V format +and also in a native device code format. +The implementation's ahead-of-time compiler in this example produces device +images with native code for certain devices and also produces SPIR-V device +images for use with other devices. +Note that in such an implementation, a particular kernel function could be +represented in more than one device image. An implementation could choose to have all kernel functions from all translation units grouped together in a single device image, to have each @@ -14831,49 +14986,55 @@ kernel function represented in its own device image, or to group kernel functions in some other way. ==== -Each device associated with a kernel bundle must have at least one compatible -device image, meaning that the implementation can either invoke the image's -kernel functions directly on the device or that the implementation can -translate the device image into a format that allows it to invoke the kernel -functions. +Each device associated with a kernel bundle must have at least one +compatible device image, meaning that the implementation can either invoke +the image's kernel functions directly on the device or that the +implementation can translate the device image into a format that allows it +to invoke the kernel functions. -An outcome of this definition is that each kernel function in a bundle must be -invocable on at least one of the devices associated with the bundle. However, -it is not necessary for every kernel function in the bundle to be invocable on -every associated device. +An outcome of this definition is that each kernel function in a bundle must +be invocable on at least one of the devices associated with the bundle. +However, it is not necessary for every kernel function in the bundle to be +invocable on every associated device. [NOTE] ==== -One common reason why a kernel function might not be invocable on every device -associated with a bundle is if the kernel uses optional device features. It's -possible that these features are available to only some devices in the bundle. - -The use of optional device features could affect how the implementation groups -kernels into device images, depending on how these features are represented. +One common reason why a kernel function might not be invocable on every +device associated with a bundle is if the kernel uses optional device +features. +It's possible that these features are available to only some devices in the +bundle. + +The use of optional device features could affect how the implementation +groups kernels into device images, depending on how these features are +represented. For example, consider an implementation where the optional feature is -represented in SPIR-V but translation of that SPIR-V into native code will fail -if the target device does not support the feature. In such an implementation, -kernels that use optional features should not be grouped into the same device -image as kernels that do not use these features. Since a device image is an -indivisible unit of compilation, doing so would cause a compilation failure if -a kernel K1 is invoked on a device D1 if K1 happened to reside in the same -device image as another kernel K2 that used a feature which is not supported on -device D1. - -See <> for more about optional device features. +represented in SPIR-V but translation of that SPIR-V into native code will +fail if the target device does not support the feature. +In such an implementation, kernels that use optional features should not be +grouped into the same device image as kernels that do not use these +features. +Since a device image is an indivisible unit of compilation, doing so would +cause a compilation failure if a kernel K1 is invoked on a device D1 if K1 +happened to reside in the same device image as another kernel K2 that used a +feature which is not supported on device D1. + +See <> for more about optional device +features. ==== A <> can obtain a kernel bundle by calling one of the -overloads of the [code]#get_kernel_bundle()# free function. Certain backends -may provide additional mechanisms for obtaining bundles with other -representations. If this is supported, the backend specification document will -describe the details. +overloads of the [code]#get_kernel_bundle()# free function. +Certain backends may provide additional mechanisms for obtaining bundles +with other representations. +If this is supported, the backend specification document will describe the +details. -Once a kernel bundle has been obtained there are a number of free functions for -performing compilation, linking and joining. Once a bundle is compiled and -linked, the application can invoke kernels from the bundle by calling -[code]#handler::use_kernel_bundle()# as described in -<>. +Once a kernel bundle has been obtained there are a number of free functions +for performing compilation, linking and joining. +Once a bundle is compiled and linked, the application can invoke kernels +from the bundle by calling [code]#handler::use_kernel_bundle()# as described +in <>. [[sec:interfaces.bundles.overview.synopsis]] @@ -14888,22 +15049,23 @@ include::{header_dir}/bundle/freeFunctions.h[lines=4..-1] === Fixed-function built-in kernels SYCL allows a <> to expose fixed functionality as non-programmable -built-in kernels. The availability and behavior of these built-in kernels are -backend specific and are not required to follow the SYCL execution and memory -models. However, the basic interface is common to all backends. +built-in kernels. +The availability and behavior of these built-in kernels are backend specific +and are not required to follow the SYCL execution and memory models. +However, the basic interface is common to all backends. [[sec:interfaces.bundles.bundlestate]] === Bundle states -A <> can be in one of three different -<> which are represented by an enum class called -[code]#bundle_state#. <> describes the semantics of -these three states. +A <> can be in one of three different <> which are represented by an enum class called [code]#bundle_state#. +<> describes the semantics of these three states. -The states form a progression. A bundle in [code]#bundle_state::input# can -be translated into [code]#bundle_state::object# by online compilation of the -bundle. A bundle in [code]#bundle_state::object# can be translated into +The states form a progression. +A bundle in [code]#bundle_state::input# can be translated into +[code]#bundle_state::object# by online compilation of the bundle. +A bundle in [code]#bundle_state::object# can be translated into [code]#bundle_state::executable# by online linking. [NOTE] @@ -14915,24 +15077,27 @@ specified. ==== There is no requirement that an implementation must expose kernels in -[code]#bundle_state::input# or [code]#bundle_state::object#. In fact, an -implementation could expose some kernels in these states but not others. For -example, this behavior could be controlled by implementation specific options -to the ahead-of-time compiler. Kernels that are not exposed in these states -cannot be online compiled or online linked by the application. +[code]#bundle_state::input# or [code]#bundle_state::object#. +In fact, an implementation could expose some kernels in these states but not +others. +For example, this behavior could be controlled by implementation specific +options to the ahead-of-time compiler. +Kernels that are not exposed in these states cannot be online compiled or +online linked by the application. All kernels defined in the <>, however, must be exposed in -[code]#bundle_state::executable# because this is the only state that allows a -kernel to be invoked on a device. Device built-in kernels are also exposed -in [code]#bundle_state::executable#. +[code]#bundle_state::executable# because this is the only state that allows +a kernel to be invoked on a device. +Device built-in kernels are also exposed in +[code]#bundle_state::executable#. -If an application exposes a bundle in [code]#bundle_state::input# for a device -D, then the implementation must also provide an online compiler for device D. -Therefore, an application need not explicitly test for +If an application exposes a bundle in [code]#bundle_state::input# for a +device D, then the implementation must also provide an online compiler for +device D. Therefore, an application need not explicitly test for [code]#aspect::online_compiler# if it successfully obtains a bundle in -[code]#bundle_state::input# for that device. Likewise, an implementation must -provide an online linker for device D if it exposes a bundle in -[code]#bundle_state::object# for device D. +[code]#bundle_state::input# for that device. +Likewise, an implementation must provide an online linker for device D if it +exposes a bundle in [code]#bundle_state::object# for device D. [[table.bundles.states]] .Enumeration of possible bundle states @@ -14973,18 +15138,19 @@ bundle_state::executable === Kernel identifiers -Some of the functions related to kernel bundles take an input parameter of type -[code]#kernel_id# which identifies a kernel. A synopsis of the -[code]#kernel_id# class is shown below along with a description of its member -functions. Additionally, this class provides the common special member -functions and common member functions that are listed in -<> in <> and +Some of the functions related to kernel bundles take an input parameter of +type [code]#kernel_id# which identifies a kernel. +A synopsis of the [code]#kernel_id# class is shown below along with a +description of its member functions. +Additionally, this class provides the common special member functions and +common member functions that are listed in <> in +<> and <>, respectively. As with all SYCL objects that have the common reference semantics, kernel -identifiers are equality comparable. Two [code]#kernel_id# objects compare -equal if and only if they refer to the same application kernel or to the same -device built-in kernel. +identifiers are equality comparable. +Two [code]#kernel_id# objects compare equal if and only if they refer to the +same application kernel or to the same device built-in kernel. There is no public default constructor for this class. @@ -14999,25 +15165,27 @@ const char* get_name() const noexcept; ---- _Returns:_ An implementation-defined null-terminated string containing the -name of the kernel. There is no guarantee that this name is unique amongst -all the kernels, nor is there a guarantee that the name is stable from one -run of the application to another. The lifetime of the memory containing the -name is unspecified. +name of the kernel. +There is no guarantee that this name is unique amongst all the kernels, nor +is there a guarantee that the name is stable from one run of the application +to another. +The lifetime of the memory containing the name is unspecified. [NOTE] ==== In practice, the lifetime of the memory containing the name will typically extend until the application terminates, unless the kernel associated with -the name comes from a dynamic library. In this case, the lifetime of the -memory may end if the dynamic library is unloaded. +the name comes from a dynamic library. +In this case, the lifetime of the memory may end if the dynamic library is +unloaded. ==== === Obtaining a kernel identifier An application can obtain an identifier for a kernel that is defined in the -application by calling one of the following free functions, or it may obtain an -identifier for a device's built-in kernels by querying the device with +application by calling one of the following free functions, or it may obtain +an identifier for a device's built-in kernels by querying the device with [code]#info::device::built_in_kernel_ids#. [source] @@ -15026,13 +15194,15 @@ template kernel_id get_kernel_id(); ---- _Preconditions:_ The template parameter [code]#KernelName# must be the -<> of a kernel that is defined in the <>. +<> of a kernel that is defined in the +<>. Since lambda functions have no standard type name, kernels defined as lambda functions must specify a [code]#KernelName# in their <> in order to obtain their identifier via this -function. Applications which call [code]#get_kernel_id()# for a -[code]#KernelName# that is not defined are ill formed, and the implementation -must issue a diagnostic in this case. +function. +Applications which call [code]#get_kernel_id()# for a [code]#KernelName# +that is not defined are ill formed, and the implementation must issue a +diagnostic in this case. _Returns:_ The identifier of the kernel associated with [code]#KernelName#. @@ -15042,18 +15212,20 @@ std::vector get_kernel_ids(); ---- _Returns:_ A vector with the identifiers for all kernels defined in the -<>. This does not include identifiers for any device -built-in kernels. +<>. +This does not include identifiers for any device built-in kernels. === Obtaining a kernel bundle A <> can obtain a kernel bundle by calling one of the -overloads of the free function [code]#get_kernel_bundle()#. The implementation -may return a bundle that consists of device images that were created by the -ahead-of-time compiler, or it may call the online compiler or linker to create -the bundle's device images in the requested state. A bundle may also contain -device images that represent a device's built-in kernels. +overloads of the free function [code]#get_kernel_bundle()#. +The implementation may return a bundle that consists of device images that +were created by the ahead-of-time compiler, or it may call the online +compiler or linker to create the bundle's device images in the requested +state. +A bundle may also contain device images that represent a device's built-in +kernels. When [code]#get_kernel_bundle()# is used to obtain a kernel bundle in [code]#bundle_state::object# or [code]#bundle_state::executable#, any @@ -15067,34 +15239,35 @@ kernel_bundle get_kernel_bundle(const context& ctxt, ---- _Returns:_ A kernel bundle in state [code]#State# which contains all of the -<> in the application which are compatible with at least one of -the devices in [code]#devs#. This does not include any device built-in kernels. +<> in the application which are compatible with at least one +of the devices in [code]#devs#. +This does not include any device built-in kernels. The bundle's set of associated devices is [code]#devs# (with any duplicate devices removed). Since the implementation may not represent all kernels in [code]#bundle_state::input# or [code]#bundle_state::object#, calling this -function with one of those states may return a bundle that is missing some of -the application's kernels. +function with one of those states may return a bundle that is missing some +of the application's kernels. _Throws:_ * An [code]#exception# with the [code]#errc::invalid# error code if any of - the devices in [code]#devs# is not one of devices contained by the context - [code]#ctxt# or is not a <> of some device in + the devices in [code]#devs# is not one of devices contained by the + context [code]#ctxt# or is not a <> of some device in [code]#ctxt#. * An [code]#exception# with the [code]#errc::invalid# error code if the [code]#devs# vector is empty. * An [code]#exception# with the [code]#errc::invalid# error code if - [code]#State# is [code]#bundle_state::input# and any device in [code]#devs# - does not have [code]#aspect::online_compiler#. + [code]#State# is [code]#bundle_state::input# and any device in + [code]#devs# does not have [code]#aspect::online_compiler#. * An [code]#exception# with the [code]#errc::invalid# error code if - [code]#State# is [code]#bundle_state::object# and any device in [code]#devs# - does not have [code]#aspect::online_linker#. + [code]#State# is [code]#bundle_state::object# and any device in + [code]#devs# does not have [code]#aspect::online_linker#. * An [code]#exception# with the [code]#errc::build# error code if [code]#State# is [code]#bundle_state::object# or - [code]#bundle_state::executable#, if the implementation needs to perform an - online compile or link, and if the online compile or link fails. + [code]#bundle_state::executable#, if the implementation needs to perform + an online compile or link, and if the online compile or link fails. [source] ---- @@ -15106,19 +15279,21 @@ kernel_bundle get_kernel_bundle(const context& ctxt, _Returns:_ A kernel bundle in state [code]#State# which contains all of the device images that are compatible with at least one of the devices in -[code]#devs#, further filtered to contain only those device images that contain -at least one of the kernels with the given identifiers. These identifiers may -represent kernels that are defined in the application, device built-in kernels, -or a mixture of the two. Since the device images may group many kernels -together, the returned bundle may contain additional kernels beyond those that -are requested in [code]#kernelIds#. The bundle's set of associated devices is -[code]#devs# (with duplicate devices removed). +[code]#devs#, further filtered to contain only those device images that +contain at least one of the kernels with the given identifiers. +These identifiers may represent kernels that are defined in the application, +device built-in kernels, or a mixture of the two. +Since the device images may group many kernels together, the returned bundle +may contain additional kernels beyond those that are requested in +[code]#kernelIds#. +The bundle's set of associated devices is [code]#devs# (with duplicate +devices removed). Since the implementation may not represent all kernels in [code]#bundle_state::input# or [code]#bundle_state::object#, calling this -function with one of those states may return a bundle that is missing some of -the kernels in [code]#kernelIds#. The application can test for this via -[code]#kernel_bundle::has_kernel()#. +function with one of those states may return a bundle that is missing some +of the kernels in [code]#kernelIds#. +The application can test for this via [code]#kernel_bundle::has_kernel()#. _Throws:_ @@ -15126,21 +15301,21 @@ _Throws:_ the kernels identified by [code]#kernelIds# are incompatible with all devices in [code]#devs#. * An [code]#exception# with the [code]#errc::invalid# error code if any of - the devices in [code]#devs# is not one of devices contained by the context - [code]#ctxt# or is not a <> of some device in + the devices in [code]#devs# is not one of devices contained by the + context [code]#ctxt# or is not a <> of some device in [code]#ctxt#. * An [code]#exception# with the [code]#errc::invalid# error code if the [code]#devs# vector is empty. * An [code]#exception# with the [code]#errc::invalid# error code if - [code]#State# is [code]#bundle_state::input# and any device in [code]#devs# - does not have [code]#aspect::online_compiler#. + [code]#State# is [code]#bundle_state::input# and any device in + [code]#devs# does not have [code]#aspect::online_compiler#. * An [code]#exception# with the [code]#errc::invalid# error code if - [code]#State# is [code]#bundle_state::object# and any device in [code]#devs# - does not have [code]#aspect::online_linker#. + [code]#State# is [code]#bundle_state::object# and any device in + [code]#devs# does not have [code]#aspect::online_linker#. * An [code]#exception# with the [code]#errc::build# error code if [code]#State# is [code]#bundle_state::object# or - [code]#bundle_state::executable#, if the implementation needs to perform an - online compile or link, and if the online compile or link fails. + [code]#bundle_state::executable#, if the implementation needs to perform + an online compile or link, and if the online compile or link fails. [source] ---- @@ -15151,47 +15326,50 @@ kernel_bundle get_kernel_bundle(const context& ctxt, ---- _Preconditions:_ The [code]#selector# must be a unary predicate whose return -value is convertible to [code]#bool# and whose parameter is -[code]#const device_image&#. +value is convertible to [code]#bool# and whose parameter is [code]#const +device_image&#. _Effects:_ The predicate function [code]#selector# is called once for every device image in the application of state [code]#State# which is compatible -with at least one of the devices in [code]#devs#. The function's return value -determines whether a device image is included in the new kernel bundle. The -[code]#selector# is called only for device images that contain kernels defined -in the application, not for device images that contain device built-in kernels. +with at least one of the devices in [code]#devs#. +The function's return value determines whether a device image is included in +the new kernel bundle. +The [code]#selector# is called only for device images that contain kernels +defined in the application, not for device images that contain device +built-in kernels. _Returns:_ A kernel bundle in state [code]#State# which contains all of the -device images for which the [code]#selector# returns [code]#true#. The -bundle's set of associated devices is [code]#devs# (with duplicate devices -removed). +device images for which the [code]#selector# returns [code]#true#. +The bundle's set of associated devices is [code]#devs# (with duplicate +devices removed). _Throws:_ * An [code]#exception# with the [code]#errc::invalid# error code if any of - the devices in [code]#devs# is not one of devices contained by the context - [code]#ctxt# or is not a <> of some device in + the devices in [code]#devs# is not one of devices contained by the + context [code]#ctxt# or is not a <> of some device in [code]#ctxt#. * An [code]#exception# with the [code]#errc::invalid# error code if the [code]#devs# vector is empty. * An [code]#exception# with the [code]#errc::invalid# error code if - [code]#State# is [code]#bundle_state::input# and any device in [code]#devs# - does not have [code]#aspect::online_compiler#. + [code]#State# is [code]#bundle_state::input# and any device in + [code]#devs# does not have [code]#aspect::online_compiler#. * An [code]#exception# with the [code]#errc::invalid# error code if [code]#State# is [code]#bundle_state::object# and any device in [code]#devs# does not have [code]#aspect::online_linker#. [NOTE] ==== -This function is intended to be used in conjunction with backend specific APIs -that allow the application to choose device images based on backend specific -criteria. - -This function does not call the online compiler or linker to translate device -images into state [code]#State#. If the application wants to select specific -device images and also compile or link them into the desired state, it can do -this by calling [code]#compile()# or [code]#link()# and then optionally joining -several bundles together with [code]#join()#. +This function is intended to be used in conjunction with backend specific +APIs that allow the application to choose device images based on backend +specific criteria. + +This function does not call the online compiler or linker to translate +device images into state [code]#State#. +If the application wants to select specific device images and also compile +or link them into the desired state, it can do this by calling +[code]#compile()# or [code]#link()# and then optionally joining several +bundles together with [code]#join()#. ==== [source] @@ -15207,11 +15385,12 @@ template // (3) kernel_bundle get_kernel_bundle(const context& ctxt, Selector selector); ---- - . Equivalent to [code]#get_kernel_bundle(ctxt, ctxt.get_devices())#. - . Equivalent to - [code]#get_kernel_bundle(ctxt, ctxt.get_devices(), kernelIds)#. - . Equivalent to - [code]#get_kernel_bundle(ctxt, ctxt.get_devices(), selector)#. + . Equivalent to [code]#get_kernel_bundle(ctxt, + ctxt.get_devices())#. + . Equivalent to [code]#get_kernel_bundle(ctxt, ctxt.get_devices(), + kernelIds)#. + . Equivalent to [code]#get_kernel_bundle(ctxt, ctxt.get_devices(), + selector)#. [source] ---- @@ -15224,24 +15403,26 @@ kernel_bundle get_kernel_bundle(const context& ctxt, ---- _Preconditions:_ The template parameter [code]#KernelName# must be the -<> of a kernel that is defined in the <>. +<> of a kernel that is defined in the +<>. Since lambda functions have no standard type name, kernels defined as lambda functions must specify a [code]#KernelName# in their -<> in order to use these functions. Applications -which call these functions for a [code]#KernelName# that is not defined are ill -formed, and the implementation must issue a diagnostic in this case. +<> in order to use these functions. +Applications which call these functions for a [code]#KernelName# that is not +defined are ill formed, and the implementation must issue a diagnostic in +this case. - . Equivalent to - [code]#get_kernel_bundle(ctxt, ctxt.get_devices(), {get_kernel_id()})#. - . Equivalent to - [code]#get_kernel_bundle(ctxt, devs, {get_kernel_id()})#. + . Equivalent to [code]#get_kernel_bundle(ctxt, ctxt.get_devices(), + {get_kernel_id()})#. + . Equivalent to [code]#get_kernel_bundle(ctxt, devs, + {get_kernel_id()})#. === Querying if a kernel bundle exists -Most overloads of [code]#get_kernel_bundle()# have a matching overload of the -free function [code]#has_kernel_bundle()# which checks to see if a kernel -bundle with the requested characteristics exists. +Most overloads of [code]#get_kernel_bundle()# have a matching overload of +the free function [code]#has_kernel_bundle()# which checks to see if a +kernel bundle with the requested characteristics exists. [source] ---- @@ -15262,8 +15443,8 @@ _Returns:_ [code]#true# only if all of the following are true: _Throws:_ * An [code]#exception# with the [code]#errc::invalid# error code if any of - the devices in [code]#devs# is not one of devices contained by the context - [code]#ctxt# or is not a <> of some device in + the devices in [code]#devs# is not one of devices contained by the + context [code]#ctxt# or is not a <> of some device in [code]#ctxt#. * An [code]#exception# with the [code]#errc::invalid# error code if the [code]#devs# vector is empty. @@ -15289,8 +15470,8 @@ _Returns:_ [code]#true# only if all of the following are true: _Throws:_ * An [code]#exception# with the [code]#errc::invalid# error code if any of - the devices in [code]#devs# is not one of devices contained by the context - [code]#ctxt# or is not a <> of some device in + the devices in [code]#devs# is not one of devices contained by the + context [code]#ctxt# or is not a <> of some device in [code]#ctxt#. * An [code]#exception# with the [code]#errc::invalid# error code if the [code]#devs# vector is empty. @@ -15306,8 +15487,8 @@ bool has_kernel_bundle(const context& ctxt, ---- . Equivalent to [code]#has_kernel_bundle(ctxt, ctxt.get_devices())#. - . Equivalent to - [code]#has_kernel_bundle(ctxt, ctxt.get_devices(), kernelIds)#. + . Equivalent to [code]#has_kernel_bundle(ctxt, ctxt.get_devices(), + kernelIds)#. [source] ---- @@ -15319,35 +15500,38 @@ bool has_kernel_bundle(const context& ctxt, const std::vector& devs); ---- _Preconditions:_ The template parameter [code]#KernelName# must be the -<> of a kernel that is defined in the <>. +<> of a kernel that is defined in the +<>. Since lambda functions have no standard type name, kernels defined as lambda functions must specify a [code]#KernelName# in their -<> in order to use these functions. Applications -which call these functions for a [code]#KernelName# that is not defined are ill -formed, and the implementation must issue a diagnostic in this case. +<> in order to use these functions. +Applications which call these functions for a [code]#KernelName# that is not +defined are ill formed, and the implementation must issue a diagnostic in +this case. - . Equivalent to - [code]#has_kernel_bundle(ctxt, {get_kernel_id()})#. - . Equivalent to - [code]#has_kernel_bundle(ctxt, devs, {get_kernel_id()})#. + . Equivalent to [code]#has_kernel_bundle(ctxt, + {get_kernel_id()})#. + . Equivalent to [code]#has_kernel_bundle(ctxt, devs, + {get_kernel_id()})#. === Querying if a kernel is compatible with a device -The following free functions allow an application to test whether a particular -kernel is compatible with a device. A kernel that is defined in the -application is compatible with a device unless: +The following free functions allow an application to test whether a +particular kernel is compatible with a device. +A kernel that is defined in the application is compatible with a device +unless: -* It uses optional features which are not supported on the device, as described - in <>; or +* It uses optional features which are not supported on the device, as + described in <>; or * It is decorated with a [code]#[[sycl::device_has()]]# {cpp} attribute that lists an aspect that is not supported by the device, as described in <>; or * The translation unit containing the kernel was compiled in a compilation - environment that does not support the device. Each implementation defines - the specific criteria for which devices are supported in its compilation - environment. For example, this might be dependent on options passed to the - compiler. + environment that does not support the device. + Each implementation defines the specific criteria for which devices are + supported in its compilation environment. + For example, this might be dependent on options passed to the compiler. A device built-in kernel is only compatible with the device for which it is built-in. @@ -15357,8 +15541,8 @@ built-in. bool is_compatible(const std::vector& kernelIds, const device& dev); ---- -_Returns:_ [code]#true# if all of the kernels identified by [code]#kernelIds# -are compatible with the device [code]#dev#. +_Returns:_ [code]#true# if all of the kernels identified by +[code]#kernelIds# are compatible with the device [code]#dev#. [source] ---- @@ -15366,25 +15550,28 @@ template bool is_compatible(const device& dev); ---- _Preconditions:_ The template parameter [code]#KernelName# must be the -<> of a kernel that is defined in the <>. +<> of a kernel that is defined in the +<>. Since lambda functions have no standard type name, kernels defined as lambda functions must specify a [code]#KernelName# in their -<> in order to use this function. Applications -which call this function for a [code]#KernelName# that is not defined are ill -formed, and the implementation must issue a diagnostic in this case. +<> in order to use this function. +Applications which call this function for a [code]#KernelName# that is not +defined are ill formed, and the implementation must issue a diagnostic in +this case. -Equivalent to -[code]#is_compatible({get_kernel_id()}, dev)#. +Equivalent to [code]#is_compatible({get_kernel_id()}, +dev)#. === Joining kernel bundles Two or more kernel bundles of the same state may be joined together into a -single composite bundle. Joining bundles together is not the same as online -compiling or linking because it produces a new bundle in the same state as its -inputs. Rather, joining creates the union of all the devices images from the -input bundles, eliminates duplicate copies of the same device image, and -creates a new bundle from the result. +single composite bundle. +Joining bundles together is not the same as online compiling or linking +because it produces a new bundle in the same state as its inputs. +Rather, joining creates the union of all the devices images from the input +bundles, eliminates duplicate copies of the same device image, and creates a +new bundle from the result. [source] ---- @@ -15392,10 +15579,10 @@ template kernel_bundle join(const std::vector>& bundles); ---- -_Returns:_ A new kernel bundle that contains a copy of all the device images in -the input [code]#bundles# with duplicates removed. The new bundle has the same -associated context and the same set of associated devices as those in -[code]#bundles#. +_Returns:_ A new kernel bundle that contains a copy of all the device images +in the input [code]#bundles# with duplicates removed. +The new bundle has the same associated context and the same set of +associated devices as those in [code]#bundles#. _Throws:_ @@ -15414,14 +15601,15 @@ state [code]#bundle_state::object# or to transform a bundle from [code]#bundle_state::object# into a bundle of state [code]#bundle_state::executable#. -An application can query whether the implementation provides an online compiler -or linker by querying a device for [code]#aspect::online_compiler# or -[code]#aspect::online_linker#. +An application can query whether the implementation provides an online +compiler or linker by querying a device for [code]#aspect::online_compiler# +or [code]#aspect::online_linker#. -All of the functions in this section accept a [code]#property_list# parameter, -which can affect the semantics of the compilation or linking operation. The -<> does not currently define any such properties, but vendors may -specify these properties as an extension. +All of the functions in this section accept a [code]#property_list# +parameter, which can affect the semantics of the compilation or linking +operation. +The <> does not currently define any such properties, but vendors +may specify these properties as an extension. [source] ---- @@ -15430,17 +15618,18 @@ compile(const kernel_bundle& inputBundle, const std::vector& devs, const property_list& propList = {}); ---- -_Effects:_ The device images from [code]#inputBundle# are translated into one -or more new device images of state [code]#bundle_state::object#, and a new -kernel bundle is created to contain these new device images. The new bundle -represents all of the <> in [code]#inputBundles# that are -compatible with at least one of the devices in [code]#devs#. Any remaining -kernels (those that are not compatible with any of the devices [code]#devs#) -are not compiled and not represented in the new kernel bundle. +_Effects:_ The device images from [code]#inputBundle# are translated into +one or more new device images of state [code]#bundle_state::object#, and a +new kernel bundle is created to contain these new device images. +The new bundle represents all of the <> in +[code]#inputBundles# that are compatible with at least one of the devices in +[code]#devs#. +Any remaining kernels (those that are not compatible with any of the devices +[code]#devs#) are not compiled and not represented in the new kernel bundle. -The new bundle has the same associated context as [code]#inputBundle#, and the -new bundle's set of associated devices is [code]#devs# (with duplicate devices -removed). +The new bundle has the same associated context as [code]#inputBundle#, and +the new bundle's set of associated devices is [code]#devs# (with duplicate +devices removed). _Returns:_ The new kernel bundle. @@ -15448,10 +15637,10 @@ _Throws:_ * An [code]#exception# with the [code]#errc::invalid# error code if any of the devices in [code]#devs# are not in the set of associated devices for - [code]#inputBundle# (as defined by [code]#kernel_bundle::get_devices()#) or - if the [code]#devs# vector is empty. - * An [code]#exception# with the [code]#errc::build# error code if the online - compile operation fails. + [code]#inputBundle# (as defined by [code]#kernel_bundle::get_devices()#) + or if the [code]#devs# vector is empty. + * An [code]#exception# with the [code]#errc::build# error code if the + online compile operation fails. [source] ---- @@ -15461,14 +15650,15 @@ link(const std::vector>& objectBundles, ---- _Effects:_ Duplicate device images from [code]#objectBundles# are eliminated -as though they were joined via [code]#join()#, then the remaining device images -are translated into one or more new device images of state -[code]#bundle_state::executable#, and a new kernel bundle is created to contain -these new device images. The new bundle represents all of the -<> in [code]#objectBundles# that are compatible with at least -one of the devices in [code]#devs#. Any remaining kernels (those that are not -compatible with any of the devices in [code]#devs#) are not linked and not -represented in the new bundle. +as though they were joined via [code]#join()#, then the remaining device +images are translated into one or more new device images of state +[code]#bundle_state::executable#, and a new kernel bundle is created to +contain these new device images. +The new bundle represents all of the <> in +[code]#objectBundles# that are compatible with at least one of the devices +in [code]#devs#. +Any remaining kernels (those that are not compatible with any of the devices +in [code]#devs#) are not linked and not represented in the new bundle. The new bundle has the same associated context as those in [code]#objectBundles#, and the new bundle's set of associated devices is @@ -15486,8 +15676,8 @@ _Throws:_ any of the bundles in [code]#objectBundles# (as defined by [code]#kernel_bundle::get_devices()#) or if the [code]#devs# vector is empty. - * An [code]#exception# with the [code]#errc::build# error code if the online - link operation fails. + * An [code]#exception# with the [code]#errc::build# error code if the + online link operation fails. [source] ---- @@ -15497,19 +15687,21 @@ build(const kernel_bundle& inputBundle, ---- _Effects:_ This function performs both an online compile and link operation, -translating a kernel bundle of state [code]#bundle_state::input# into a bundle -of state [code]#bundle_state::executable#. The device images from -[code]#inputBundle# are translated into one or more new device images of state -[code]#bundle_state::executable#, and a new bundle is created to contain these -new device images. The new bundle represents all of the <> in +translating a kernel bundle of state [code]#bundle_state::input# into a +bundle of state [code]#bundle_state::executable#. +The device images from [code]#inputBundle# are translated into one or more +new device images of state [code]#bundle_state::executable#, and a new +bundle is created to contain these new device images. +The new bundle represents all of the <> in [code]#inputBundle# that are compatible with at least one of the devices in -[code]#devs#. Any remaining kernels (those that are not compatible with any of -the devices [code]#devs#) are not compiled or linked and are not represented in -the new bundle. +[code]#devs#. +Any remaining kernels (those that are not compatible with any of the devices +[code]#devs#) are not compiled or linked and are not represented in the new +bundle. -The new bundle has the same associated context as [code]#inputBundle#, and the -new bundle's set of associated devices is [code]#devs# (with duplicate devices -removed). +The new bundle has the same associated context as [code]#inputBundle#, and +the new bundle's set of associated devices is [code]#devs# (with duplicate +devices removed). _Returns:_ The new kernel bundle. @@ -15517,10 +15709,10 @@ _Throws:_ * An [code]#exception# with the [code]#errc::invalid# error code if any of the devices in [code]#devs# are not in the set of associated devices for - [code]#inputBundle# (as defined by [code]#kernel_bundle::get_devices()#) or - if the [code]#devs# vector is empty. - * An [code]#exception# with the [code]#errc::build# error code if the online - compile or link operations fail. + [code]#inputBundle# (as defined by [code]#kernel_bundle::get_devices()#) + or if the [code]#devs# vector is empty. + * An [code]#exception# with the [code]#errc::build# error code if the + online compile or link operations fail. [source] ---- @@ -15545,8 +15737,8 @@ build(const kernel_bundle& inputBundle, const property_list& propList = {}); ---- - . Equivalent to - [code]#compile(inputBundle, inputBundle.get_devices(), propList)#. + . Equivalent to [code]#compile(inputBundle, inputBundle.get_devices(), + propList)#. . Equivalent to [code]#link({objectBundle}, devs, propList)#. @@ -15554,25 +15746,26 @@ build(const kernel_bundle& inputBundle, [code]#devs# is the intersection of associated devices in common for all bundles in [code]#objectBundles#. - . Equivalent to - [code]#link({objectBundle}, objectBundle.get_devices(), propList)#. + . Equivalent to [code]#link({objectBundle}, objectBundle.get_devices(), + propList)#. - . Equivalent to - [code]#build(inputBundle, inputBundle.get_devices(), propList)#. + . Equivalent to [code]#build(inputBundle, inputBundle.get_devices(), + propList)#. === The [code]#kernel_bundle# class -A synopsis of the [code]#kernel_bundle# class is shown below. Additionally, -this class provides the common special member functions and common member -functions that are listed in <> in +A synopsis of the [code]#kernel_bundle# class is shown below. +Additionally, this class provides the common special member functions and +common member functions that are listed in <> in <> and <>, respectively. As with all SYCL objects that have the common reference semantics, kernel -bundles are equality comparable. Two bundles of the same <> are -considered to be equal if they are associated with the same context, have the -same set of associated devices, and contain the same set of device images. +bundles are equality comparable. +Two bundles of the same <> are considered to be equal if they +are associated with the same context, have the same set of associated +devices, and contain the same set of device images. There is no public default constructor for this class. @@ -15584,7 +15777,8 @@ include::{header_dir}/bundle/kernelBundleClass.h[lines=4..-1] [[sec:bundles.query]] ==== Queries -The following member functions provide various queries for a <>. +The following member functions provide various queries for a +<>. [source] ---- @@ -15636,18 +15830,20 @@ bool has_kernel(const device& dev) const noexcept; // (2) ---- _Preconditions:_ The template parameter [code]#KernelName# must be the -<> of a kernel that is defined in the <>. +<> of a kernel that is defined in the +<>. Since lambda functions have no standard type name, kernels defined as lambda functions must specify a [code]#KernelName# in their -<> in order to use these functions. Applications -which call these functions for a [code]#KernelName# that is not defined are ill -formed, and the implementation must issue a diagnostic in this case. +<> in order to use these functions. +Applications which call these functions for a [code]#KernelName# that is not +defined are ill formed, and the implementation must issue a diagnostic in +this case. . _Returns:_ [code]#true# only if the kernel bundle contains the kernel identified by [code]#KernelName#. . _Returns:_ [code]#true# only if the kernel bundle contains the kernel - identified by [code]#KernelName# and if that kernel is compatible with the - device [code]#dev#. + identified by [code]#KernelName# and if that kernel is compatible with + the device [code]#dev#. [source] ---- @@ -15662,8 +15858,8 @@ the kernel bundle. kernel get_kernel(const kernel_id& kernelId) const; ---- -_Preconditions:_ This member function is only available if the kernel bundle's -state is [code]#bundle_state::executable#. +_Preconditions:_ This member function is only available if the kernel +bundle's state is [code]#bundle_state::executable#. _Returns:_ A [code]#kernel# object representing the kernel identified by [code]#kernelId#, which resides in the bundle. @@ -15671,22 +15867,24 @@ _Returns:_ A [code]#kernel# object representing the kernel identified by _Throws:_ * An [code]#exception# with the [code]#errc::invalid# error code if the - kernel bundle does not contain the kernel identified by [code]#kernelId#. + kernel bundle does not contain the kernel identified by + [code]#kernelId#. [source] ---- template kernel get_kernel() const; ---- -_Preconditions:_ This member function is only available if the kernel bundle's -state is [code]#bundle_state::executable#. The template parameter -[code]#KernelName# must be the <> of a kernel that is defined -in the <>. Since lambda functions have no standard type -name, kernels defined as lambda functions must specify a [code]#KernelName# in -their <> in order to use this function. +_Preconditions:_ This member function is only available if the kernel +bundle's state is [code]#bundle_state::executable#. +The template parameter [code]#KernelName# must be the <> +of a kernel that is defined in the <>. +Since lambda functions have no standard type name, kernels defined as lambda +functions must specify a [code]#KernelName# in their +<> in order to use this function. Applications which call this function for a [code]#KernelName# that is not -defined are ill formed, and the implementation must issue a diagnostic in this -case. +defined are ill formed, and the implementation must issue a diagnostic in +this case. _Returns:_ A [code]#kernel# object representing the kernel identified by [code]#KernelName#, which resides in the bundle. @@ -15694,27 +15892,31 @@ _Returns:_ A [code]#kernel# object representing the kernel identified by _Throws:_ * An [code]#exception# with the [code]#errc::invalid# error code if the - kernel bundle does not contain the kernel identified by [code]#KernelName#. + kernel bundle does not contain the kernel identified by + [code]#KernelName#. ==== Specialization constant support The following member functions allow an application to manipulate <> that are used in the -device images of a <>. Applications can set the value of -specialization constants in a kernel bundle whose state is -[code]#bundle_state::input# and then online compile that bundle into -[code]#bundle_state::object# or [code]#bundle_state::executable#. The value of -the specialization constants then become fixed in the compiled bundle and -cannot be changed. Specialization constants that have not had their values set -by the time the bundle is compiled take their default values. +device images of a <>. +Applications can set the value of specialization constants in a kernel +bundle whose state is [code]#bundle_state::input# and then online compile +that bundle into [code]#bundle_state::object# or +[code]#bundle_state::executable#. +The value of the specialization constants then become fixed in the compiled +bundle and cannot be changed. +Specialization constants that have not had their values set by the time the +bundle is compiled take their default values. [NOTE] ==== It is expected that many implementations will use an intermediate language representation for a bundle in state [code]#bundle_state::input# such as SPIR-V, and the intermediate language will have native support for -specialization constants. However, implementations that do not have such -native support must still support specialization constants in some other way. +specialization constants. +However, implementations that do not have such native support must still +support specialization constants in some other way. ==== [source] @@ -15722,17 +15924,17 @@ native support must still support specialization constants in some other way. bool contains_specialization_constants() const noexcept; ---- -_Returns:_ [code]#true# only if the kernel bundle contains at least one device -image which uses a specialization constant. +_Returns:_ [code]#true# only if the kernel bundle contains at least one +device image which uses a specialization constant. [source] ---- bool native_specialization_constant() const noexcept; ---- -_Returns:_ [code]#true# only if the kernel bundle contains at least one device -image which uses a specialization constant and all specialization constants -used in all of the bundle's device images are +_Returns:_ [code]#true# only if the kernel bundle contains at least one +device image which uses a specialization constant and all specialization +constants used in all of the bundle's device images are <>. [source] @@ -15750,16 +15952,18 @@ void set_specialization_constant( typename std::remove_reference_t::value_type value); ---- -_Preconditions:_ This member function is only available if the kernel bundle's -state is [code]#bundle_state::input#. +_Preconditions:_ This member function is only available if the kernel +bundle's state is [code]#bundle_state::input#. -_Effects:_ Sets the value of the <> whose address is -[code]#SpecName# for this bundle. If the specialization constant's value was -previously set in this bundle, the value is overwritten. +_Effects:_ Sets the value of the <> whose address +is [code]#SpecName# for this bundle. +If the specialization constant's value was previously set in this bundle, +the value is overwritten. -The new value applies to all device images in the bundle. It is allowed to set -the value of a specialization constant even if no device image in the bundle -uses it; doing so has no effect on the execution of kernels from that bundle. +The new value applies to all device images in the bundle. +It is allowed to set the value of a specialization constant even if no +device image in the bundle uses it; doing so has no effect on the execution +of kernels from that bundle. [source] ---- @@ -15769,14 +15973,17 @@ get_specialization_constant() const; ---- _Returns:_ The value of the <> whose address is -[code]#SpecName# for this kernel bundle. The value returned is as follows: +[code]#SpecName# for this kernel bundle. +The value returned is as follows: * If the value of this specialization constant was previously set in this - bundle, that value is returned. Otherwise, + bundle, that value is returned. + Otherwise, * If this bundle is the result of compiling, linking or joining another bundle and this specialization constant was set in that other bundle prior - to compiling, linking or joining; then that value is returned. Otherwise, + to compiling, linking or joining; then that value is returned. + Otherwise, * The specialization constant's default value is returned. @@ -15791,9 +15998,9 @@ using device_image_iterator = __unspecified__; ---- An iterator type that satisfies the {cpp} requirements of -[code]#LegacyForwardIterator#. The iterator's referenced type is -[code]#const device_image#, where [code]#State# is the same state as the -containing [code]#kernel_bundle#. +[code]#LegacyForwardIterator#. +The iterator's referenced type is [code]#const device_image#, where +[code]#State# is the same state as the containing [code]#kernel_bundle#. [source] ---- @@ -15801,17 +16008,17 @@ device_image_iterator begin() const; // (1) device_image_iterator end() const; // (2) ---- - . _Returns:_ An iterator to the first <> contained by the + . _Returns:_ An iterator to the first <> contained by the kernel bundle. - . _Returns:_ An iterator to one past the last <> contained by - the kernel bundle. + . _Returns:_ An iterator to one past the last <> contained + by the kernel bundle. === The [code]#kernel# class -A synopsis of the [code]#kernel# class is shown below. Additionally, -this class provides the common special member functions and common member -functions that are listed in <> in +A synopsis of the [code]#kernel# class is shown below. +Additionally, this class provides the common special member functions and +common member functions that are listed in <> in <> and <>, respectively. @@ -15857,8 +16064,8 @@ _Preconditions:_ The [code]#Param# must be one of the [code]#info::kernel# descriptors defined in <>, and the type alias [code]#Param::return_type# must be defined in accordance with that table. -_Returns:_ Information about the kernel that is not specific to the device on -which it is invoked. +_Returns:_ Information about the kernel that is not specific to the device +on which it is invoked. [source] ---- @@ -15871,8 +16078,8 @@ _Preconditions:_ The [code]#Param# must be one of the <>, and the type alias [code]#Param::return_type# must be defined in accordance with that table. -_Returns:_ Information about the kernel that applies when the kernel is invoked -on the device [code]#dev#. +_Returns:_ Information about the kernel that applies when the kernel is +invoked on the device [code]#dev#. _Throws:_ @@ -15888,22 +16095,23 @@ template typename Param::return_type get_backend_info() const; _Preconditions:_ The [code]#Param# must be one of a descriptor defined by a <> specification. -_Returns:_ Backend specific information about the kernel that is not specific -to the device on which it is invoked. +_Returns:_ Backend specific information about the kernel that is not +specific to the device on which it is invoked. _Throws:_ - * An [code]#exception# with the [code]#errc::backend_mismatch# error code if - the <> that corresponds with [code]#Param# is different from the - <> that is associated with this kernel bundle. + * An [code]#exception# with the [code]#errc::backend_mismatch# error code + if the <> that corresponds with [code]#Param# is different from + the <> that is associated with this kernel bundle. ==== Kernel information descriptors A <> can be queried for information using the [code]#get_info()# -member function, specifying one of the info parameters in [code]#info::kernel#. +member function, specifying one of the info parameters in +[code]#info::kernel#. All info parameters in [code]#info::kernel# are specified in -<> and the synopsis for [code]#info::kernel# is described in -<>. +<> and the synopsis for [code]#info::kernel# is described +in <>. [[table.kernel.info]] .Kernel class information descriptors @@ -15940,10 +16148,10 @@ info::kernel::attributes A <> can also be queried for device specific information using the [code]#get_info()# member function, specifying one of the info parameters in -[code]#info::kernel_device_specific#. All info parameters in -[code]#info::kernel_device_specific# are specified in -<>. The synopsis for -[code]#info::kernel_device_specific# is described in +[code]#info::kernel_device_specific#. +All info parameters in [code]#info::kernel_device_specific# are specified in +<>. +The synopsis for [code]#info::kernel_device_specific# is described in <>. [[table.kernel.devicespecificinfo]] @@ -16053,9 +16261,9 @@ info::kernel_device_specific::compile_sub_group_size === The [code]#device_image# class -A synopsis of the [code]#device_image# class is shown below. Additionally, -this class provides the common special member functions and common member -functions that are listed in <> in +A synopsis of the [code]#device_image# class is shown below. +Additionally, this class provides the common special member functions and +common member functions that are listed in <> in <> and <>, respectively. @@ -16083,17 +16291,19 @@ bool has_kernel(const kernel_id& kernelId, === Example usage This section provides some examples showing typical use cases for kernel -bundles. These examples are intended to clarify the definition of the kernel -bundle interfaces, but the content of this section is non-normative. +bundles. +These examples are intended to clarify the definition of the kernel bundle +interfaces, but the content of this section is non-normative. ==== Controlling the timing of online compilation In some cases an application may want to pre-compile its kernels before -submitting them to a device. This gives the application control over when the -overhead of online compilation happens, rather than relying on the default -behavior (which may cause the online compilation to happen at the point when -the kernel is submitted to a device). The following example shows how this can -be achieved. +submitting them to a device. +This gives the application control over when the overhead of online +compilation happens, rather than relying on the default behavior (which may +cause the online compilation to happen at the point when the kernel is +submitted to a device). +The following example shows how this can be achieved. [source,,linenums] ---- @@ -16132,45 +16342,51 @@ include::{code_dir}/bundle-builtin-kernel.cpp[lines=4..-1] == Defining kernels -In SYCL, functions that are executed on a SYCL device are referred to -as <>. A <> containing such a -<> is enqueued on a device queue in order to -be executed on that particular device. +In SYCL, functions that are executed on a SYCL device are referred to as +<>. +A <> containing such a <> is enqueued on a +device queue in order to be executed on that particular device. -The return type of the <> is [code]#void#, and all memory -accesses between host and device are through <> or through -<>. +The return type of the <> is [code]#void#, and all +memory accesses between host and device are through <> +or through <>. There are two ways of defining kernels: as named function objects or as -lambda functions. A backend may also provide interoperability interfaces for -defining kernels. +lambda functions. +A backend may also provide interoperability interfaces for defining kernels. [[sec:interfaces.kernels.as.function-objects]] === Defining kernels as named function objects -A kernel can be defined as a named function object type. These function objects -provide the same functionality as any {cpp} function object, with the -restriction that they need to follow SYCL rules to be <>. -The kernel function can be templated via templating the kernel -function object type. For details on restrictions for kernel naming, -please refer to <>. - -The [code]#operator()# member function must be const-qualified, and it may take -different parameters depending on the data accesses defined for the specific -kernel. If the [code]#operator()# function writes to any of the member variables, +A kernel can be defined as a named function object type. +These function objects provide the same functionality as any {cpp} function +object, with the restriction that they need to follow SYCL rules to be +<>. +The kernel function can be templated via templating the kernel function +object type. +For details on restrictions for kernel naming, please refer to +<>. + +The [code]#operator()# member function must be const-qualified, and it may +take different parameters depending on the data accesses defined for the +specific kernel. +If the [code]#operator()# function writes to any of the member variables, the behavior is undefined. -The following example defines a <>, -_RandomFiller_, which initializes a buffer with a random number. The -random number is generated during the construction of the function object -while processing the command group. The [code]#operator()# member -function of the function object receives an [code]#item# object. This -member function will be called for each work-item of the execution range. The value -of the random number will be assigned to each element of the buffer. In this -case, the accessor and the scalar random number are members of the function -object and therefore will be arguments to the device kernel. Usual -restrictions of passing arguments to kernels apply. +The following example defines a <>, _RandomFiller_, +which initializes a buffer with a random number. +The random number is generated during the construction of the function +object while processing the command group. +The [code]#operator()# member function of the function object receives an +[code]#item# object. +This member function will be called for each work-item of the execution +range. +The value of the random number will be assigned to each element of the +buffer. +In this case, the accessor and the scalar random number are members of the +function object and therefore will be arguments to the device kernel. +Usual restrictions of passing arguments to kernels apply. [source,,linenums] ---- @@ -16181,30 +16397,32 @@ include::{code_dir}/myfunctor.cpp[lines=4..-1] [[sec:interfaces.kernels.as.lambdas]] === Defining kernels as lambda functions -In {cpp}, function objects can be defined using lambda functions. Kernels may be -defined as lambda functions in SYCL. The name of a lambda function -in SYCL may optionally be specified by passing it as a template parameter to the invoking -member function, and in that case, the lambda name is a [keyword]#{cpp} typename# which must -be forward declarable at namespace scope. If the lambda -function relies on template arguments, then if specified, -the name of the lambda function must contain those template arguments which must -also be forward declarable at namespace scope. The -class used for the name of a lambda function is only used for naming purposes -and is not required to be defined. For details on restrictions for kernel -naming, please refer to <>. +In {cpp}, function objects can be defined using lambda functions. +Kernels may be defined as lambda functions in SYCL. +The name of a lambda function in SYCL may optionally be specified by passing +it as a template parameter to the invoking member function, and in that +case, the lambda name is a [keyword]#{cpp} typename# which must be forward +declarable at namespace scope. +If the lambda function relies on template arguments, then if specified, the +name of the lambda function must contain those template arguments which must +also be forward declarable at namespace scope. +The class used for the name of a lambda function is only used for naming +purposes and is not required to be defined. +For details on restrictions for kernel naming, please refer to +<>. The kernel function for the lambda function is the lambda function itself. -The kernel lambda must use copy for all of its captures (i.e. [code]#[=]#), and -the lambda must not use the [code]#mutable# specifier. +The kernel lambda must use copy for all of its captures (i.e. [code]#[=]#), +and the lambda must not use the [code]#mutable# specifier. [source,,linenums] ---- include::{code_dir}/mykernel.cpp[lines=4..-1] ---- -Explicit lambda naming is shown in the following code example, -including an illegal case that uses a class within the kernel -name which is not forward declarable ([code]#std::complex#). +Explicit lambda naming is shown in the following code example, including an +illegal case that uses a class within the kernel name which is not forward +declarable ([code]#std::complex#). [source,,linenums] ---- @@ -16223,8 +16441,8 @@ namespace sycl { }; .... -[code]#is_device_copyable# is a user specializable class template to indicate -that a type [code]#T# is <>. +[code]#is_device_copyable# is a user specializable class template to +indicate that a type [code]#T# is <>. * [code]#is_device_copyable# must meet the Cpp17UnaryTrait requirements. * If [code]#is_device_copyable# is specialized such that @@ -16232,34 +16450,36 @@ that a type [code]#T# is <>. satisfy all the requirements of a device copyable type, the results are unspecified. -If the application defines a type [code]#UDT# that satisfies the requirements of -a <> type (as defined in <>) but the type -is not implicitly device copyable as defined in that section, then the -application must provide a specialization of [code]#is_device_copyable# that -derives from [code]#std:true_type# in order to use that type in a context that -requires a device copyable type. Such a specialization can be declared like -this: +If the application defines a type [code]#UDT# that satisfies the +requirements of a <> type (as defined in +<>) but the type is not implicitly device copyable as +defined in that section, then the application must provide a specialization +of [code]#is_device_copyable# that derives from [code]#std:true_type# in +order to use that type in a context that requires a device copyable type. +Such a specialization can be declared like this: .... template<> struct sycl::is_device_copyable : std::true_type {}; .... -It is legal to provide this specialization even if the implementation does not -define [code]#SYCL_DEVICE_COPYABLE# to [code]#1#, but the type cannot be used -as a device copyable type in that case and the specialization is ignored. +It is legal to provide this specialization even if the implementation does +not define [code]#SYCL_DEVICE_COPYABLE# to [code]#1#, but the type cannot be +used as a device copyable type in that case and the specialization is +ignored. [[sec:kernel.parameter.passing]] === Rules for parameter passing to kernels -A SYCL application passes parameters to a kernel in different ways depending on -whether the kernel is a named function object or a lambda function. If the -kernel is a named function object, the [code]#operator()# member function (or -other member functions that it calls) may reference member variables inside the -same named function object. Any such member variables become parameters to the -kernel. If the kernel is a lambda function, any variables captured by the -lambda become parameters to the kernel. +A SYCL application passes parameters to a kernel in different ways depending +on whether the kernel is a named function object or a lambda function. +If the kernel is a named function object, the [code]#operator()# member +function (or other member functions that it calls) may reference member +variables inside the same named function object. +Any such member variables become parameters to the kernel. +If the kernel is a lambda function, any variables captured by the lambda +become parameters to the kernel. Regardless of how the parameter is passed, the following rules define the allowable types for a kernel parameter: @@ -16282,22 +16502,24 @@ allowable types for a kernel parameter: - [code]#marray# when [code]#T# is <>; - [code]#vec#. -* An array of element types [code]#T# is a legal parameter type if [code]#T# is - a legal parameter type. +* An array of element types [code]#T# is a legal parameter type if [code]#T# + is a legal parameter type. -* A class type [code]#S# with a non-static member variable of type [code]#T# is - a legal parameter type if [code]#T# is a legal parameter type and if +* A class type [code]#S# with a non-static member variable of type [code]#T# + is a legal parameter type if [code]#T# is a legal parameter type and if [code]#S# would otherwise be a legal parameter type aside from this member variable. -* A class type [code]#S# with a non-virtual base class of type [code]#T# is a - legal parameter type if [code]#T# is a legal parameter type and if [code]#S# - would otherwise be a legal parameter type aside from this base class. +* A class type [code]#S# with a non-virtual base class of type [code]#T# is + a legal parameter type if [code]#T# is a legal parameter type and if + [code]#S# would otherwise be a legal parameter type aside from this base + class. [NOTE] ==== Pointer types are trivially copyable, so they may be passed as kernel -parameters. However, only the pointer value itself is passed to the kernel. +parameters. +However, only the pointer value itself is passed to the kernel. Dereferencing the pointer on the kernel results in undefined behavior unless the pointer points to an address within a <> memory region that is accessible on the device. @@ -16308,9 +16530,9 @@ kernel parameters. [NOTE] ==== -The [code]#reducer# class is a special type of kernel parameter which is passed -to a kernel in a different way. <> describes how this parameter -type is used. +The [code]#reducer# class is a special type of kernel parameter which is +passed to a kernel in a different way. +<> describes how this parameter type is used. ==== // %%%%%%%%%%%%%%%%%%%%%%%%%%%% end expressingParallelism %%%%%%%%%%%%%%%%%%%%%%%%%%%% @@ -16321,61 +16543,67 @@ type is used. === Error handling rules -Error handling in a SYCL application (host code) uses {cpp} exceptions. If an error -occurs, it will be thrown by the API function call and may be caught by the user -through standard {cpp} exception handling mechanisms. +Error handling in a SYCL application (host code) uses {cpp} exceptions. +If an error occurs, it will be thrown by the API function call and may be +caught by the user through standard {cpp} exception handling mechanisms. -SYCL applications are asynchronous in the sense that host and device code executions -are decoupled from one another except at specific points. For example, device code -executions often begin when dependencies in the SYCL task graph are satisfied, which -occurs asynchronously from host code execution. As a result of this the errors -that occur on a device cannot be thrown directly from a host API call, because the call -enqueueing a device action has typically already returned by the time that the error -occurs. Such errors are not detected until the error-causing task executes or tries to -execute, and we refer to these as <>. +SYCL applications are asynchronous in the sense that host and device code +executions are decoupled from one another except at specific points. +For example, device code executions often begin when dependencies in the +SYCL task graph are satisfied, which occurs asynchronously from host code +execution. +As a result of this the errors that occur on a device cannot be thrown +directly from a host API call, because the call enqueueing a device action +has typically already returned by the time that the error occurs. +Such errors are not detected until the error-causing task executes or tries +to execute, and we refer to these as <>. [[subsubsec:exception.async]] ==== Asynchronous error handler -The queue and context classes can optionally take an asynchronous handler object -<> on construction, which is a callable such as a function class -or lambda, with an [code]#exception_list# as a parameter. -Invocation of an <> may be triggered by the queue member functions -[code]#queue::wait_and_throw()# or [code]#queue::throw_asynchronous()#, by -the event member function [code]#event::wait_and_throw()#, or -automatically on destruction of a queue or context that contains unconsumed -asynchronous errors. When invoked, an <> is called and receives an -[code]#exception_list# argument containing a list of exception objects representing -any unconsumed <> associated with the queue or context. - -When an <> instance has been passed to an <>, then -that instance of the error has been consumed for handling and is not reported on -any subsequent invocations of the <>. - -The <> may be a named function object type, a lambda -function or a [code]#std::function#. The [code]#exception_list# -object passed to the <> is constructed by the <>. +The queue and context classes can optionally take an asynchronous handler +object <> on construction, which is a callable such as a +function class or lambda, with an [code]#exception_list# as a parameter. +Invocation of an <> may be triggered by the queue member +functions [code]#queue::wait_and_throw()# or +[code]#queue::throw_asynchronous()#, by the event member function +[code]#event::wait_and_throw()#, or automatically on destruction of a queue +or context that contains unconsumed asynchronous errors. +When invoked, an <> is called and receives an +[code]#exception_list# argument containing a list of exception objects +representing any unconsumed <> associated +with the queue or context. + +When an <> instance has been passed to an <>, +then that instance of the error has been consumed for handling and is not +reported on any subsequent invocations of the <>. + +The <> may be a named function object type, a lambda function +or a [code]#std::function#. +The [code]#exception_list# object passed to the <> is +constructed by the <>. [[subsubsec:exception.nohandler]] ==== Behavior without an <> -If an asynchronous error occurs in a queue or context that has no user-supplied -asynchronous error handler object <>, then an implementation-defined -default <> is called to handle the error in the same situations that -a user-supplied <> would be, as defined in -<>. The default <> must in some way -report all errors passed to it, when possible, and must then invoke -[code]#std::terminate# or equivalent. +If an asynchronous error occurs in a queue or context that has no +user-supplied asynchronous error handler object <>, then an +implementation-defined default <> is called to handle the +error in the same situations that a user-supplied <> would +be, as defined in <>. +The default <> must in some way report all errors passed to +it, when possible, and must then invoke [code]#std::terminate# or +equivalent. ==== Priorities of <> If the SYCL runtime can associate an <> with a specific queue, then: - * If the queue was constructed with an <>, that handler - is invoked to handle the error. + * If the queue was constructed with an <>, that handler is + invoked to handle the error. * Otherwise if the context enclosed by the queue was constructed with an <>, that handler is invoked to handle the error. * Otherwise when no handler was passed to either queue or context on @@ -16384,8 +16612,8 @@ then: * All handler invocations in this list occur at times as defined by <>. -If the SYCL runtime cannot associate an <> with a specific queue, -then: +If the SYCL runtime cannot associate an <> with a specific +queue, then: * If the context in which the error occurred was constructed with an <>, then that handler is invoked to handle the error. @@ -16398,29 +16626,31 @@ then: ==== Asynchronous errors with a secondary queue -If an <> occurs when running or enqueuing a command group which has -a secondary queue specified, then the command group may be enqueued -to the secondary queue instead of the primary queue. The error handling in this -case is also configured using the <> provided for both -queues. If there is no <> given on any of the queues, -then the asynchronous error handling proceeds through the contexts -associated with the queues, and if they were also constructed without -<>s, then the default handler will be used. -If the primary queue fails and there is an <> given at -this queue's construction, which populates the [code]#exception_list# -parameter, then any errors will be added and can be thrown whenever the user -chooses to handle those exceptions. Since there were errors on the primary -queue and a secondary queue was given, then the execution of the kernel is -re-scheduled to the secondary queue and any error reporting for the kernel -execution on that queue is done through that queue, in the same way as -described above. The secondary queue may fail as well, and the errors will be -thrown if there is an <> and either -[code]#wait_and_throw()# or [code]#throw()# are called on that queue. If no -<> was specified, then the one associated with the queue's context -will be used and if the context was also constructed without an <>, +If an <> occurs when running or enqueuing a command group which +has a secondary queue specified, then the command group may be enqueued to +the secondary queue instead of the primary queue. +The error handling in this case is also configured using the +<> provided for both queues. +If there is no <> given on any of the queues, then the +asynchronous error handling proceeds through the contexts associated with +the queues, and if they were also constructed without <>s, then the default handler will be used. -The <> event returned by that function will be -relevant to the queue where the kernel has been enqueued. +If the primary queue fails and there is an <> given at this +queue's construction, which populates the [code]#exception_list# parameter, +then any errors will be added and can be thrown whenever the user chooses to +handle those exceptions. +Since there were errors on the primary queue and a secondary queue was +given, then the execution of the kernel is re-scheduled to the secondary +queue and any error reporting for the kernel execution on that queue is done +through that queue, in the same way as described above. +The secondary queue may fail as well, and the errors will be thrown if there +is an <> and either [code]#wait_and_throw()# or +[code]#throw()# are called on that queue. +If no <> was specified, then the one associated with the +queue's context will be used and if the context was also constructed without +an <>, then the default handler will be used. +The <> event returned by that function will +be relevant to the queue where the kernel has been enqueued. Below is an example of catching a SYCL [code]#exception# and printing out the error message. @@ -16447,18 +16677,19 @@ include::{code_dir}/handlingErrorCode.cpp[lines=4..-1] include::{header_dir}/exception.h[lines=4..-1] ---- -The SYCL [code]#exception_list# -class is also available in order to provide a list of synchronous and -asynchronous exceptions. +The SYCL [code]#exception_list# class is also available in order to provide +a list of synchronous and asynchronous exceptions. Errors can occur both in the SYCL library and SYCL host side, or may come -directly from a <>. The member functions on these exceptions provide the -corresponding information. -<> can provide additional exception class objects as long as they derive -from [code]#sycl::exception# object, or any of its derived classes. +directly from a <>. +The member functions on these exceptions provide the corresponding +information. +<> can provide additional exception class objects as +long as they derive from [code]#sycl::exception# object, or any of its +derived classes. -A specialization of [code]#std::is_error_code_enum# must be defined -for [code]#sycl::errc# that inherits from [code]#std::true_type#. +A specialization of [code]#std::is_error_code_enum# must be defined for +[code]#sycl::errc# that inherits from [code]#std::true_type#. [[table.members.exception]] @@ -16780,23 +17011,24 @@ std::error_code make_error_code(errc e) noexcept; == Data types -SYCL as a {cpp} programming model supports the {cpp} core language data types, -and it also provides the ability for all SYCL applications to be executed on SYCL -compatible devices. The scalar and vector data types that -are supported by the SYCL system are defined below. More details about the SYCL -device compiler support for fundamental and backend interoperability types are found -in <>. +SYCL as a {cpp} programming model supports the {cpp} core language data +types, and it also provides the ability for all SYCL applications to be +executed on SYCL compatible devices. +The scalar and vector data types that are supported by the SYCL system are +defined below. +More details about the SYCL device compiler support for fundamental and +backend interoperability types are found in <>. === Scalar data types -The fundamental {cpp} data types which are supported in SYCL are described in -<>. Note these types are fundamental and therefore -do not exist within the [code]#sycl# namespace. +The fundamental {cpp} data types which are supported in SYCL are described +in <>. +Note these types are fundamental and therefore do not exist within the +[code]#sycl# namespace. Additional scalar data types which are supported by SYCL within the -[code]#sycl# namespace are described in -<>. +[code]#sycl# namespace are described in <>. [[table.types.additional]] @@ -16832,40 +17064,42 @@ half [[sec:vector.type]] === Vector types -SYCL provides a cross-platform class template that works -efficiently on SYCL devices as well as in host {cpp} code. This type -allows sharing of vectors between the host and its SYCL devices. The -vector supports member functions that allow construction of a new vector from a -swizzled set of component elements. - -[code]#vec# -is a vector type -that compiles down to a <> built-in vector types on SYCL devices, -where possible, and provides compatible support on the host or when it is -not possible. The [code]#vec# class is templated on its number of -elements and its element type. The number of elements parameter, -_NumElements_, can be one of: 1, 2, 3, 4, 8 or 16. Any other value shall -produce a compilation failure. The element type parameter, _DataT_, must -be one of the basic scalar types supported in device code. +SYCL provides a cross-platform class template that works efficiently on SYCL +devices as well as in host {cpp} code. +This type allows sharing of vectors between the host and its SYCL devices. +The vector supports member functions that allow construction of a new vector +from a swizzled set of component elements. + +[code]#vec# is a vector type that +compiles down to a <> built-in vector types on SYCL devices, where +possible, and provides compatible support on the host or when it is not +possible. +The [code]#vec# class is templated on its number of elements and its element +type. +The number of elements parameter, _NumElements_, can be one of: 1, 2, 3, 4, +8 or 16. +Any other value shall produce a compilation failure. +The element type parameter, _DataT_, must be one of the basic scalar types +supported in device code. The SYCL [code]#vec# class template provides interoperability with the -underlying vector type defined by [code]#vector_t# which is -available only when compiled for the device. The SYCL [code]#vec# class can -be constructed from an instance of [code]#vector_t# and can implicitly -convert to an instance of [code]#vector_t# in order to support -interoperability with native <> functions from a SYCL kernel function. - -An instance of the SYCL [code]#vec# class template can also be -implicitly converted to an instance of the data type when the number of -elements is [code]#1# in order to allow single element vectors and -scalars to be convertible with each other. +underlying vector type defined by [code]#vector_t# which is available only +when compiled for the device. +The SYCL [code]#vec# class can be constructed from an instance of +[code]#vector_t# and can implicitly convert to an instance of +[code]#vector_t# in order to support interoperability with native +<> functions from a SYCL kernel function. + +An instance of the SYCL [code]#vec# class template can also be implicitly +converted to an instance of the data type when the number of elements is +[code]#1# in order to allow single element vectors and scalars to be +convertible with each other. ==== Vec interface The constructors, member functions and non-member functions of the SYCL -[code]#vec# class template are listed in -<>, <> and -<> respectively. +[code]#vec# class template are listed in <>, +<> and <> respectively. // Interface for class: vec [source,,linenums] @@ -17508,30 +17742,28 @@ The SYCL programming API provides all permutations of the type alias: [code]#+using = vec<, >+# -where [code]## is [code]#2#, [code]#3#, [code]#4#, -[code]#8# and [code]#16#, and pairings of [code]## and -[code]## for integral types are [code]#char# and -[code]#int8_t#, [code]#uchar# and [code]#uint8_t#, -[code]#short# and [code]#int16_t#, [code]#ushort# and -[code]#uint16_t#, [code]#int# and [code]#int32_t#, -[code]#uint# and [code]#uint32_t#, [code]#long# and -[code]#int64_t#, [code]#ulong# and [code]#uint64_t#, and for -floating point types are both [code]#half#, [code]#float# and -[code]#double#. +where [code]## is [code]#2#, [code]#3#, [code]#4#, [code]#8# and +[code]#16#, and pairings of [code]## and [code]## for +integral types are [code]#char# and [code]#int8_t#, [code]#uchar# and +[code]#uint8_t#, [code]#short# and [code]#int16_t#, [code]#ushort# and +[code]#uint16_t#, [code]#int# and [code]#int32_t#, [code]#uint# and +[code]#uint32_t#, [code]#long# and [code]#int64_t#, [code]#ulong# and +[code]#uint64_t#, and for floating point types are both [code]#half#, +[code]#float# and [code]#double#. -For example [code]#uint4# is the alias to [code]#vec# -and [code]#float16# is the alias to [code]#vec#. +For example [code]#uint4# is the alias to [code]#vec# and +[code]#float16# is the alias to [code]#vec#. ==== Swizzles -Swizzle operations can be performed in two ways. Firstly by calling the -[code]#swizzle# member function template, which takes a variadic number -of integer template arguments between [code]#0# and -[code]#NumElements-1#, specifying swizzle indexes. Secondly by calling -one of the simple swizzle member functions defined in -<> as [code]#XYZW_SWIZZLE# and -[code]#RGBA_SWIZZLE#. Note that the simple swizzle functions are only -available for up to 4 element vectors and are only available when the macro +Swizzle operations can be performed in two ways. +Firstly by calling the [code]#swizzle# member function template, which takes +a variadic number of integer template arguments between [code]#0# and +[code]#NumElements-1#, specifying swizzle indexes. +Secondly by calling one of the simple swizzle member functions defined in +<> as [code]#XYZW_SWIZZLE# and [code]#RGBA_SWIZZLE#. +Note that the simple swizzle functions are only available for up to 4 +element vectors and are only available when the macro [code]#SYCL_SIMPLE_SWIZZLES# is defined before including [code]##. @@ -17539,21 +17771,21 @@ In both cases the return type is always an instance of [code]#+__swizzled_vec__+#, an implementation-defined temporary class representing the result of the swizzle operation on the original [code]#vec# instance. -Since the swizzle operation may result in a different number of elements, the -[code]#+__swizzled_vec__+# instance may represent a different number of +Since the swizzle operation may result in a different number of elements, +the [code]#+__swizzled_vec__+# instance may represent a different number of elements than the original [code]#vec#. -Both kinds of swizzle member functions must not perform the swizzle operation -themselves, instead the swizzle operation must be performed by the returned -instance of [code]#+__swizzled_vec__+# when used within an expression, -meaning if the returned [code]#+__swizzled_vec__+# is never used in an -expression no swizzle operation is performed. +Both kinds of swizzle member functions must not perform the swizzle +operation themselves, instead the swizzle operation must be performed by the +returned instance of [code]#+__swizzled_vec__+# when used within an +expression, meaning if the returned [code]#+__swizzled_vec__+# is never used +in an expression no swizzle operation is performed. -Both the [code]#swizzle# member function template and the simple -swizzle member functions allow swizzle indexes to be repeated. +Both the [code]#swizzle# member function template and the simple swizzle +member functions allow swizzle indexes to be repeated. -A series of static constexpr values are provided within the -[code]#elem# struct to allow specifying named swizzle indexes when -calling the [code]#swizzle# member function template. +A series of static constexpr values are provided within the [code]#elem# +struct to allow specifying named swizzle indexes when calling the +[code]#swizzle# member function template. [[swizzled-vec-class]] @@ -17563,30 +17795,32 @@ The [code]#+__swizzled_vec__+# class must define an unspecified temporary which provides the entire interface of the SYCL [code]#vec# class template, including swizzled member functions, with the additions and alterations described below. -The member functions of the [code]#+__swizzled_vec__+# class behave as though -they operate on a [code]#vec# that is the result of the swizzle operation. +The member functions of the [code]#+__swizzled_vec__+# class behave as +though they operate on a [code]#vec# that is the result of the swizzle +operation. * The [code]#+__swizzled_vec__+# class template must be readable as an - r-value reference on the RHS of an expression. In this case the swizzle - operation is performed on the RHS of the expression and then the result - is applied to the LHS of the expression. - * The [code]#+__swizzled_vec__+# class template must be assignable as - an l-value reference on the LHS of an expression. In this case the RHS - of the expression is applied to the original SYCL [code]#vec# which - the [code]#+__swizzled_vec__+# represents via the swizzle operation. + r-value reference on the RHS of an expression. + In this case the swizzle operation is performed on the RHS of the + expression and then the result is applied to the LHS of the expression. + * The [code]#+__swizzled_vec__+# class template must be assignable as an + l-value reference on the LHS of an expression. + In this case the RHS of the expression is applied to the original SYCL + [code]#vec# which the [code]#+__swizzled_vec__+# represents via the + swizzle operation. Note that a [code]#+__swizzled_vec__+# that is used in an l-value expression may not contain any repeated element indexes. + For example: [code]#f4.xxxx() = fx.wzyx()# would not be valid. - * The [code]#+__swizzled_vec__+# class template must be convertible to - an instance of SYCL [code]#vec# with the type [code]#DataT# - and number of elements specified by the swizzle member function, if - [code]#NumElements > 1#, and must be convertible to an instance of - type [code]#DataT#, if [code]#NumElements == 1#. + * The [code]#+__swizzled_vec__+# class template must be convertible to an + instance of SYCL [code]#vec# with the type [code]#DataT# and number of + elements specified by the swizzle member function, if [code]#NumElements + > 1#, and must be convertible to an instance of type [code]#DataT#, if + [code]#NumElements == 1#. * The [code]#+__swizzled_vec__+# class template must be non-copyable, non-moveable, non-user constructible and may not be bound to a l-value - or escape the expression it was constructed in. For example - [code]#auto x = f4.x()# would not be valid. + or escape the expression it was constructed in. + For example [code]#auto x = f4.x()# would not be valid. * The [code]#+__swizzled_vec__+# class template should return [code]#+__swizzled_vec__&+# for each operator inherited from the [code]#vec# class template interface which would return @@ -17657,8 +17891,8 @@ of the element type in bytes multiplied by the number of elements: ++++ The exception to this is when the number of element is three in which case -the SYCL [code]#vec# is aligned to the size of the element type in -bytes multiplied by four: +the SYCL [code]#vec# is aligned to the size of the element type in bytes +multiplied by four: [[vec3-memory-alignment]] [latexmath] @@ -17667,22 +17901,23 @@ bytes multiplied by four: ++++ This is true for both host and device code in order to allow for instances -of the [code]#vec# class template to be passed to SYCL kernel -functions. +of the [code]#vec# class template to be passed to SYCL kernel functions. -In no case, however, is the alignment guaranteed to be greater than 64 bytes. +In no case, however, is the alignment guaranteed to be greater than 64 +bytes. [NOTE] ==== The alignment guarantee is limited to 64 bytes because some host compilers -(e.g. on Microsoft Windows) limit the maximum alignment of function parameters -to this value. +(e.g. on Microsoft Windows) limit the maximum alignment of function +parameters to this value. ==== ==== Performance note -The usage of the subscript [code]#operator[]# may not be efficient on some devices. +The usage of the subscript [code]#operator[]# may not be efficient on some +devices. // %%%%%%%%%%%%%%%%%%%%%%%%%%%% end vec_class %%%%%%%%%%%%%%%%%%%%%%%%%%%% @@ -17693,30 +17928,32 @@ The usage of the subscript [code]#operator[]# may not be efficient on some devic [[sec:marray.type]] === Math array types -SYCL provides an [code]#marray# class -template to represent a contiguous fixed-size container. This type allows -sharing of containers between the host and its SYCL devices. +SYCL provides an [code]#marray# +class template to represent a contiguous fixed-size container. +This type allows sharing of containers between the host and its SYCL +devices. The [code]#marray# class is templated on its element type and number of -elements. The number of elements parameter, [code]#NumElements#, is a positive -value of the [code]#std::size_t# type. The element type parameter, [code]#DataT#, -must be a _numeric type_ as it is defined by {cpp} standard. +elements. +The number of elements parameter, [code]#NumElements#, is a positive value +of the [code]#std::size_t# type. +The element type parameter, [code]#DataT#, must be a _numeric type_ as it is +defined by {cpp} standard. -An instance of the [code]#marray# class template can also be -implicitly converted to an instance of the data type when the number of -elements is [code]#1# in order to allow single element arrays and -scalars to be convertible with each other. +An instance of the [code]#marray# class template can also be implicitly +converted to an instance of the data type when the number of elements is +[code]#1# in order to allow single element arrays and scalars to be +convertible with each other. -Logical and comparison operators for [code]#marray# class template -return [code]#marray#. +Logical and comparison operators for [code]#marray# class template return +[code]#marray#. ==== Math array interface The constructors, member functions and non-member functions of the SYCL -[code]#marray# class template are listed in -<>, <> and -<> respectively. +[code]#marray# class template are listed in <>, +<> and <> respectively. // Interface for class: vec [source,,linenums] @@ -18180,19 +18417,17 @@ The SYCL programming API provides all permutations of the type alias: [code]#+using m = marray<, >+# -where [code]## is [code]#2#, [code]#3#, [code]#4#, -[code]#8# and [code]#16#, and pairings of [code]## and -[code]## for integral types are [code]#char# and -[code]#int8_t#, [code]#uchar# and [code]#uint8_t#, -[code]#short# and [code]#int16_t#, [code]#ushort# and -[code]#uint16_t#, [code]#int# and [code]#int32_t#, -[code]#uint# and [code]#uint32_t#, [code]#long# and -[code]#int64_t#, [code]#ulong# and [code]#uint64_t#, for -floating point types are both [code]#half#, [code]#float# and -[code]#double#, and for boolean type [code]#bool#. +where [code]## is [code]#2#, [code]#3#, [code]#4#, [code]#8# and +[code]#16#, and pairings of [code]## and [code]## for +integral types are [code]#char# and [code]#int8_t#, [code]#uchar# and +[code]#uint8_t#, [code]#short# and [code]#int16_t#, [code]#ushort# and +[code]#uint16_t#, [code]#int# and [code]#int32_t#, [code]#uint# and +[code]#uint32_t#, [code]#long# and [code]#int64_t#, [code]#ulong# and +[code]#uint64_t#, for floating point types are both [code]#half#, +[code]#float# and [code]#double#, and for boolean type [code]#bool#. -For example [code]#muint4# is the alias to [code]#marray# -and [code]#mfloat16# is the alias to [code]#marray#. +For example [code]#muint4# is the alias to [code]#marray# and +[code]#mfloat16# is the alias to [code]#marray#. [[memory-layout-and-alignment.marray]] @@ -18212,19 +18447,19 @@ The available features are: buffer and image data structures to provide points at which underlying queue synchronization primitives must be generated. * Atomic operations: SYCL devices support a restricted subset of {cpp} - atomics and SYCL uses the library syntax from the next {cpp} specification - to make this available. + atomics and SYCL uses the library syntax from the next {cpp} + specification to make this available. * Fences: Fence primitives are made available to order loads and stores. - They are exposed through the [code]#atomic_fence# function. Fences - can have acquire semantics, release semantics or both. + They are exposed through the [code]#atomic_fence# function. + Fences can have acquire semantics, release semantics or both. * Barriers: Barrier primitives are made available to synchronize sets of - work-items within individual <>. They are exposed through the - [code]#group_barrier# function. + work-items within individual <>. + They are exposed through the [code]#group_barrier# function. * Hierarchical parallel dispatch: In the hierarchical parallelism model of describing computations, synchronization within the work-group is made explicit through multiple instances of the - [code]#parallel_for_work_item# function call, rather than through - the use of explicit <> operations. + [code]#parallel_for_work_item# function call, rather than through the + use of explicit <> operations. * Device event: they are used inside SYCL kernel functions to wait for asynchronous operations within a SYCL kernel function to complete. @@ -18232,9 +18467,9 @@ The available features are: [[sec:barriers-fences]] === Barriers and fences -A <> or <> provides memory ordering semantics -over both the local address space and global address space. A -<> provides control over the re-ordering of memory load and +A <> or <> provides memory ordering semantics over +both the local address space and global address space. +A <> provides control over the re-ordering of memory load and store operations, subject to the associated memory [code]#order# and memory [code]#scope#, when paired with synchronization through an atomic object. @@ -18243,38 +18478,39 @@ store operations, subject to the associated memory [code]#order# and memory include::{header_dir}/synchronization.h[lines=4..-1] ---- -The effects of a call to [code]#atomic_fence# depend on the value of -the [code]#order# parameter: +The effects of a call to [code]#atomic_fence# depend on the value of the +[code]#order# parameter: * [code]#memory_order::relaxed:# No effect * [code]#memory_order::acquire:# Acquire fence * [code]#memory_order::release:# Release fence - * [code]#memory_order::acq_rel:# Both an acquire fence and a release - fence - * [code]#memory_order::seq_cst:# A sequentially consistent acquire - and release fence + * [code]#memory_order::acq_rel:# Both an acquire fence and a release fence + * [code]#memory_order::seq_cst:# A sequentially consistent acquire and + release fence A <> acts as both an acquire fence and a release fence: all work-items in the group execute a release fence prior to synchronizing at the barrier, and all work-items in the group execute an acquire fence -afterwards. A <> provides implicit atomic synchronization -as if through an internal atomic object, such that the acquire and release fences +afterwards. +A <> provides implicit atomic synchronization as if through +an internal atomic object, such that the acquire and release fences associated with the barrier synchronize with each other, without an explicit -atomic operation being required on an atomic object to synchronize the fences. +atomic operation being required on an atomic object to synchronize the +fences. [[device-event-class]] === [code]#device_event# class The SYCL [code]#device_event# class encapsulates a single SYCL device event -which is available only within SYCL kernel functions and can be used to wait for -asynchronous operations within a SYCL kernel function to complete. +which is available only within SYCL kernel functions and can be used to wait +for asynchronous operations within a SYCL kernel function to complete. -All member functions of the [code]#device_event# class must not throw a -SYCL exception. +All member functions of the [code]#device_event# class must not throw a SYCL +exception. -A synopsis of the SYCL [code]#device_event# class is provided below. The -constructors and member functions of the SYCL [code]#device_event# class +A synopsis of the SYCL [code]#device_event# class is provided below. +The constructors and member functions of the SYCL [code]#device_event# class are listed in <> and <> respectively. @@ -18319,44 +18555,47 @@ device_event(___unspecified___) The [code]#sycl::atomic_ref# class provides the ability to perform atomic operations in device code with a syntax similar to the {cpp} standard -[code]#std::atomic_ref#. The [code]#sycl::atomic_ref# class must not be used -in host code. +[code]#std::atomic_ref#. +The [code]#sycl::atomic_ref# class must not be used in host code. Unlike [code]#std::atomic_ref#, [code]#sycl::atomic_ref# does not provide a -default memory ordering for its operations. Instead, the application must -specify a default ordering via the [code]#DefaultOrder# template parameter. +default memory ordering for its operations. +Instead, the application must specify a default ordering via the +[code]#DefaultOrder# template parameter. This ordering is used as a default for most of the atomic operations, but most member functions also provide an optional parameter that allows the -application to override this default. The set of supported orderings is -specific to a device, but every device is guaranteed to support at least -[code]#memory_order::relaxed#. If the default order is set to -[code]#memory_order::relaxed#, all memory order arguments default to -[code]#memory_order::relaxed#. If the default order is set to -[code]#memory_order::acq_rel#, memory order arguments default to -[code]#memory_order::acquire# for load operations, +application to override this default. +The set of supported orderings is specific to a device, but every device is +guaranteed to support at least [code]#memory_order::relaxed#. +If the default order is set to [code]#memory_order::relaxed#, all memory +order arguments default to [code]#memory_order::relaxed#. +If the default order is set to [code]#memory_order::acq_rel#, memory order +arguments default to [code]#memory_order::acquire# for load operations, [code]#memory_order::release# for store operations and -[code]#memory_order::acq_rel# for read-modify-write operations. If the -default order is set to [code]#memory_order::seq_cst#, all memory order -arguments default to [code]#memory_order::seq_cst#. +[code]#memory_order::acq_rel# for read-modify-write operations. +If the default order is set to [code]#memory_order::seq_cst#, all memory +order arguments default to [code]#memory_order::seq_cst#. The [code]#sycl::atomic_ref# class has a template parameter -[code]#DefaultScope#, which allows the application to define a default memory -scope for the atomic operations. Most member functions also provide an -optional parameter that allows the application to override this default. +[code]#DefaultScope#, which allows the application to define a default +memory scope for the atomic operations. +Most member functions also provide an optional parameter that allows the +application to override this default. The [code]#sycl::atomic_ref# class also has a template parameter -[code]#AddressSpace#, which allows the application to make an assertion about -the address space of the object of type [code]#T# that it references. The -default value for this parameter is -[code]#access::address_space::generic_space#, which indicates that the object -could be in either the global or local address spaces. If the application -knows the address space, it can set this template parameter to either -[code]#access::address_space::global_space# or +[code]#AddressSpace#, which allows the application to make an assertion +about the address space of the object of type [code]#T# that it references. +The default value for this parameter is +[code]#access::address_space::generic_space#, which indicates that the +object could be in either the global or local address spaces. +If the application knows the address space, it can set this template +parameter to either [code]#access::address_space::global_space# or [code]#access::address_space::local_space# as an assertion to the -implementation. Specifying the address space via this template parameter may -allow the implementation to perform certain optimizations. Specifying an -address space that does not match the object's actual address space results in -undefined behavior. +implementation. +Specifying the address space via this template parameter may allow the +implementation to perform certain optimizations. +Specifying an address space that does not match the object's actual address +space results in undefined behavior. The template parameter [code]#T# must be one of the following types: @@ -18369,15 +18608,16 @@ The template parameter [code]#T# must be one of the following types: * [code]#float#, or * [code]#double#. -In addition, the type [code]#T# must satisfy one of the following conditions: +In addition, the type [code]#T# must satisfy one of the following +conditions: * [code]#sizeof(T) == 4#, or * [code]#sizeof(T) == 8# and the code containing this [code]#atomic_ref# was submitted to a device that has [code]#aspect::atomic64#. -For floating-point types, the member functions of the [code]#atomic_ref# class -may be emulated, and they may use a different floating-point environment from -those defined by [code]#info::device::single_fp_config# and +For floating-point types, the member functions of the [code]#atomic_ref# +class may be emulated, and they may use a different floating-point +environment from those defined by [code]#info::device::single_fp_config# and [code]#info::device::double_fp_config# (i.e. floating-point atomics may use different rounding modes and may have different exception behavior). @@ -18390,25 +18630,26 @@ include::{header_dir}/atomicref.h[lines=4..-1] The constructors and member functions for instances of the SYCL [code]#atomic_ref# class using any compatible type are listed in -<> -and <> respectively. Additional member -functions for integral, floating-point and pointer types are listed in -<>, -<> -and <> respectively. - -The static member [code]#required_alignment# describes the minimum -required alignment in bytes of an object that can be referenced by an +<> and <> +respectively. +Additional member functions for integral, floating-point and pointer types +are listed in <>, +<> and +<> respectively. + +The static member [code]#required_alignment# describes the minimum required +alignment in bytes of an object that can be referenced by an [code]#atomic_ref#, which must be at least [code]#alignof(T)#. The static member [code]#is_always_lock_free# is true if all atomic -operations for type [code]#T# are always lock-free. A SYCL -implementation is not guaranteed to support atomic operations that are not -lock-free. +operations for type [code]#T# are always lock-free. +A SYCL implementation is not guaranteed to support atomic operations that +are not lock-free. -The static members [code]#default_read_order#, [code]#default_write_order# and -[code]#default_read_modify_write_order# reflect the default memory order values for -each type of atomic operation, consistent with the [code]#DefaultOrder# template. +The static members [code]#default_read_order#, [code]#default_write_order# +and [code]#default_read_modify_write_order# reflect the default memory order +values for each type of atomic operation, consistent with the +[code]#DefaultOrder# template. The atomic operations and member functions behave as described in the {cpp} specification, barring the restrictions discussed above. @@ -18417,9 +18658,10 @@ specification, barring the restrictions discussed above. ==== Care must be taken when using atomics for work-item coordination, because work-items are not required to provide stronger than weakly parallel forward -progress guarantees. Operations that block a work-item, such as continuously -checking the value of an atomic variable until some condition holds, or using -atomic operations that are not lock-free, may prevent overall progress. +progress guarantees. +Operations that block a work-item, such as continuously checking the value +of an atomic variable until some condition holds, or using atomic operations +that are not lock-free, may prevent overall progress. ==== [[table.atomic-refs.constructors]] @@ -18883,13 +19125,13 @@ T* operator--() const // Deprecated atomics from SYCL 1.2.1 The atomic types and operations on atomic types provided by SYCL 1.2.1 are -deprecated in SYCL 2020, and will be removed in a future version of SYCL. The -types and operations are made available in the [code]#cl::sycl::# +deprecated in SYCL 2020, and will be removed in a future version of SYCL. +The types and operations are made available in the [code]#cl::sycl::# namespace for backwards compatibility. -The constructors and member functions for the [code]#cl::sycl::atomic# -class are listed in <> -and <> respectively. +The constructors and member functions for the [code]#cl::sycl::atomic# class +are listed in <> and <> +respectively. [source,,linenums] ---- @@ -19258,24 +19500,25 @@ Equivalent to calling [code]#object.fetch_max(operand, memoryOrder)#. When a kernel runs on a device that has either [code]#aspect::usm_atomic_host_allocations# or [code]#aspect::usm_atomic_shared_allocations#, the device code and the host -code can concurrently access the same memory. This has a ramification on the -atomic operations because it is possible for device code and host code to -perform atomic operations on the same object _M_ in this shared memory. It -also has a ramification on the fence operations because the {cpp} core language -defines the semantics of these fence operations in relation to atomic -operations on some shared object _M_. The following paragraphs specify the -guarantees that the SYCL implementation provides when the application performs -atomic or fence operations in device code using the memory scope -[code]#memory_scope::system#. +code can concurrently access the same memory. +This has a ramification on the atomic operations because it is possible for +device code and host code to perform atomic operations on the same object +_M_ in this shared memory. +It also has a ramification on the fence operations because the {cpp} core +language defines the semantics of these fence operations in relation to +atomic operations on some shared object _M_. +The following paragraphs specify the guarantees that the SYCL implementation +provides when the application performs atomic or fence operations in device +code using the memory scope [code]#memory_scope::system#. Atomic operations in device code using [code]#sycl::atomic_ref# on an object -_M_ are guaranteed to be atomic with respect to atomic operations in host code -using [code]#std::atomic_ref# on that same object _M_. +_M_ are guaranteed to be atomic with respect to atomic operations in host +code using [code]#std::atomic_ref# on that same object _M_. Fence operations in device code using [code]#sycl::atomic_fence# synchronize with fence operations in host code using [code]#std::atomic_thread_fence# if -the fence operations shared the same atomic object _M_ and follow the rules for -fence synchronization defined in the {cpp} core language. +the fence operations shared the same atomic object _M_ and follow the rules +for fence synchronization defined in the {cpp} core language. Fence operations in device code using [code]#sycl::atomic_fence# synchronize with atomic operations in host code using [code]#std::atomic_ref# if the @@ -19284,67 +19527,66 @@ synchronization defined in the {cpp} core language. Atomic operations in device code using [code]#sycl::atomic_ref# synchronize with fence operations in host code using [code]#std::atomic_thread_fence# if -the operations share the same atomic object _M_ and follow the rules for fence -synchronization defined in the {cpp} core language. +the operations share the same atomic object _M_ and follow the rules for +fence synchronization defined in the {cpp} core language. [[subsec:stream]] == Stream class The SYCL [code]#stream# class is a buffered output stream that allows -outputting the values of built-in, vector and SYCL types to the console. The -implementation of how values are streamed to the console is left as an +outputting the values of built-in, vector and SYCL types to the console. +The implementation of how values are streamed to the console is left as an implementation detail. -The way in which values are output by an instance of the SYCL -[code]#stream# class can also be altered using a range of manipulators. +The way in which values are output by an instance of the SYCL [code]#stream# +class can also be altered using a range of manipulators. -There are two limits that are relevant for the [code]#stream# class. The -[code]#totalBufferSize# limit specifies the maximum size of the overall +There are two limits that are relevant for the [code]#stream# class. +The [code]#totalBufferSize# limit specifies the maximum size of the overall character stream that can be output during a kernel invocation, and the -[code]#workItemBufferSize# limit specifies the maximum size of the -character stream that can be output within a work-item before a flush must be -performed. Both of these limits are specified in bytes. The -[code]#totalBufferSize# limit must be sufficient to contain the characters -output by all stream statements during execution of a kernel invocation (the -aggregate of outputs from all work-items), and the +[code]#workItemBufferSize# limit specifies the maximum size of the character +stream that can be output within a work-item before a flush must be +performed. +Both of these limits are specified in bytes. +The [code]#totalBufferSize# limit must be sufficient to contain the +characters output by all stream statements during execution of a kernel +invocation (the aggregate of outputs from all work-items), and the [code]#workItemBufferSize# limit must be sufficient to contain the characters output within a work-item between stream flush operations. -If the [code]#totalBufferSize# or [code]#workItemBufferSize# -limits are exceeded, it is implementation-defined whether the streamed -characters exceeding the limit are output, or silently ignored/discarded, -and if output it is implementation-defined whether those extra characters -exceeding the [code]#workItemBufferSize# limit count toward the -[code]#totalBufferSize# limit. Regardless of this implementation -defined behavior of output exceeding the limits, no undefined or erroneous -behavior is permitted of an implementation when the limits are exceeded. +If the [code]#totalBufferSize# or [code]#workItemBufferSize# limits are +exceeded, it is implementation-defined whether the streamed characters +exceeding the limit are output, or silently ignored/discarded, and if output +it is implementation-defined whether those extra characters exceeding the +[code]#workItemBufferSize# limit count toward the [code]#totalBufferSize# +limit. +Regardless of this implementation defined behavior of output exceeding the +limits, no undefined or erroneous behavior is permitted of an implementation +when the limits are exceeded. Unused characters within [code]#workItemBufferSize# (any portion of the -[code]#workItemBufferSize# capacity that has not been used at the time -of a stream flush) do not count toward the [code]#totalBufferSize# -limit, in that only characters flushed count toward the -[code]#totalBufferSize# limit. +[code]#workItemBufferSize# capacity that has not been used at the time of a +stream flush) do not count toward the [code]#totalBufferSize# limit, in that +only characters flushed count toward the [code]#totalBufferSize# limit. -The SYCL [code]#stream# class provides the common reference semantics -(see <>). +The SYCL [code]#stream# class provides the common reference semantics (see +<>). === Stream class interface -The constructors and member functions of the SYCL [code]#stream# class -are listed in <>, -<>, and <> respectively. The -additional common special member functions and common member functions are -listed in <> and +The constructors and member functions of the SYCL [code]#stream# class are +listed in <>, <>, and +<> respectively. +The additional common special member functions and common member functions +are listed in <> and <>, respectively. The operand types that are supported by the SYCL [code]#stream# class -[code]#operator<<()# operator are listed in -<>. +[code]#operator<<()# operator are listed in <>. The manipulators that are supported by the SYCL [code]#stream# class -[code]#operator<<()# operator are listed in -<>. +[code]#operator<<()# operator are listed in <>. // Interface of the device class [source,,linenums] @@ -19598,33 +19840,39 @@ template const stream& operator<<(const stream& os, const T& rhs) === Synchronization -An instance of the SYCL [code]#stream# class is required to synchronize with the host, and must output -everything that is streamed to it via the [code]#operator<<()# operator before a flush operation (that -doesn't exceed the [code]#workItemBufferSize# or [code]#totalBufferSize# limits) within a SYCL -kernel function by the time that the event associated with a command group submission enters the completed -state. The point at which this synchronization occurs and the member function by which this synchronization is -performed are implementation-defined. For example it is valid for an implementation to use -[code]#printf()#. - -The SYCL [code]#stream# class is required to output the content of each stream, between flushes (up to -[code]#workItemBufferSize)#, without mixing with content from the same stream in other work-items. -There are no other output order guarantees between work-items or between streams. The stream flush -operation therefore delimits the unit of output that is guaranteed to be displayed without mixing with -other work-items, with respect to a single stream. +An instance of the SYCL [code]#stream# class is required to synchronize with +the host, and must output everything that is streamed to it via the +[code]#operator<<()# operator before a flush operation (that doesn't exceed +the [code]#workItemBufferSize# or [code]#totalBufferSize# limits) within a +SYCL kernel function by the time that the event associated with a command +group submission enters the completed state. +The point at which this synchronization occurs and the member function by +which this synchronization is performed are implementation-defined. +For example it is valid for an implementation to use [code]#printf()#. + +The SYCL [code]#stream# class is required to output the content of each +stream, between flushes (up to [code]#workItemBufferSize)#, without mixing +with content from the same stream in other work-items. +There are no other output order guarantees between work-items or between +streams. +The stream flush operation therefore delimits the unit of output that is +guaranteed to be displayed without mixing with other work-items, with +respect to a single stream. === Implicit flush -There is guaranteed to be an implicit flush of each stream used by a -kernel, at the end of kernel execution, from the perspective of each -work-item. There is also an implicit flush when the endl stream -manipulator is executed. No other implicit flushes are permitted in -an implementation. +There is guaranteed to be an implicit flush of each stream used by a kernel, +at the end of kernel execution, from the perspective of each work-item. +There is also an implicit flush when the endl stream manipulator is +executed. +No other implicit flushes are permitted in an implementation. === Performance note -The usage of the [code]#stream# class is designed for debugging purposes and is therefore not recommended for performance critical applications. +The usage of the [code]#stream# class is designed for debugging purposes and +is therefore not recommended for performance critical applications. // \input{builtin_functions} @@ -19636,30 +19884,33 @@ The usage of the [code]#stream# class is designed for debugging purposes and is // Intentional OpenCL reference SYCL kernels may execute on any SYCL device, which requires the functions -used in the kernels to be compiled and linked for both device and host. In -the SYCL programming model, the built-ins are available for the entire SYCL -application within the [code]#sycl# namespace, although their semantics -may be different. This section follows the OpenCL 1.2 specification document -<> - except that for SYCL, all functions are located -within the [code]#sycl# namespace - and describes the behavior of these -functions for SYCL host and device. The expected precision and any other -semantic requirements are defined in the backend specification. +used in the kernels to be compiled and linked for both device and host. +In the SYCL programming model, the built-ins are available for the entire +SYCL application within the [code]#sycl# namespace, although their semantics +may be different. +This section follows the OpenCL 1.2 specification document <> - except that for SYCL, all functions are located within the +[code]#sycl# namespace - and describes the behavior of these functions for +SYCL host and device. +The expected precision and any other semantic requirements are defined in +the backend specification. The SYCL built-in functions are available throughout the SYCL application, and depending on where they execute, they are either implemented using their -host implementation or the device implementation. The SYCL system guarantees -that all of the built-in functions fulfill the same requirements for both -host and device. +host implementation or the device implementation. +The SYCL system guarantees that all of the built-in functions fulfill the +same requirements for both host and device. [[sec:function-objects]] === Function objects -SYCL provides a number of function objects in the [code]#sycl# namespace -on host and device. All function objects obey {cpp} conversion and promotion -rules. Each function object is additionally specialized for [code]#void# -as a _transparent_ function object that deduces its parameter types -and return type. +SYCL provides a number of function objects in the [code]#sycl# namespace on +host and device. +All function objects obey {cpp} conversion and promotion rules. +Each function object is additionally specialized for [code]#void# as a +_transparent_ function object that deduces its parameter types and return +type. [source,,linenums] ---- @@ -19797,33 +20048,35 @@ T operator()(const T& x, const T& y) const [[sec:group-functions]] === Group functions -SYCL provides a number of functions that expose functionality tied to groups of -work-items (such as <> and collective operations). -These group functions act as synchronization points and must be encountered in -converged <> by all work-items in the group. If one work-item in -a group calls a group function, then all work-items in that group must call -exactly the same function under the same set of conditions --- calling the same -function under different conditions (e.g. in different iterations of a loop, or -different branches of a conditional statement) results in undefined behavior. +SYCL provides a number of functions that expose functionality tied to groups +of work-items (such as <> and collective +operations). +These group functions act as synchronization points and must be encountered +in converged <> by all work-items in the group. +If one work-item in a group calls a group function, then all work-items in +that group must call exactly the same function under the same set of +conditions --- calling the same function under different conditions (e.g. in +different iterations of a loop, or different branches of a conditional +statement) results in undefined behavior. Additionally, restrictions may be placed on the arguments passed to each function in order to ensure that all work-items in the group agree on the -operation that is being performed. Any such restrictions on the arguments -passed to a function are defined within the descriptions of those functions. +operation that is being performed. +Any such restrictions on the arguments passed to a function are defined +within the descriptions of those functions. Violating these restrictions results in undefined behavior. -All group functions are supported for the fundamental scalar types supported by -SYCL (see <>) and instances of the SYCL +All group functions are supported for the fundamental scalar types supported +by SYCL (see <>) and instances of the SYCL [code]#vec# and [code]#marray# classes. -Using a group function inside of a kernel may introduce additional -limits on the resources available to user code inside the same kernel. The -behavior of these limits is implementation-defined, but must be reflected by -calls to kernel querying functions (such as +Using a group function inside of a kernel may introduce additional limits on +the resources available to user code inside the same kernel. +The behavior of these limits is implementation-defined, but must be +reflected by calls to kernel querying functions (such as [code]#kernel::get_info#) as described in <>. It is undefined behavior for any group function to be invoked within a -[code]#parallel_for_work_group# or [code]#parallel_for_work_item# -context. +[code]#parallel_for_work_group# or [code]#parallel_for_work_item# context. ==== Group type trait @@ -19832,17 +20085,17 @@ context. include::{header_dir}/algorithms/is_group.h[lines=4..-1] ---- -The [code]#is_group# type trait is used to determine which types of groups are -supported by group functions, and to control when group functions participate -in overload resolution. +The [code]#is_group# type trait is used to determine which types of groups +are supported by group functions, and to control when group functions +participate in overload resolution. [code]#is_group# inherits from [code]#std::true_type# if [code]#T# is the type of a standard SYCL group ([code]#group# or [code]#sub_group#) and it -inherits from [code]#std::false_type# otherwise. A SYCL implementation may -introduce additional specializations of [code]#is_group# for -implementation-defined group types, if the interface of those types supports all -member functions and static members common to the [code]#group# and -[code]#sub_group# classes. +inherits from [code]#std::false_type# otherwise. +A SYCL implementation may introduce additional specializations of +[code]#is_group# for implementation-defined group types, if the interface +of those types supports all member functions and static members common to +the [code]#group# and [code]#sub_group# classes. ==== [code]#group_broadcast# @@ -19859,8 +20112,8 @@ include::{header_dir}/groups/broadcast.h[lines=4..-1] trivially copyable type. + -- -_Returns:_ The value of [code]#x# from the work-item with the smallest linear -id within group [code]#g#. +_Returns:_ The value of [code]#x# from the work-item with the smallest +linear id within group [code]#g#. -- . _Constraints:_ Available only if @@ -19868,11 +20121,11 @@ id within group [code]#g#. trivially copyable type. + -- -_Preconditions:_ [code]#local_linear_id# must be the same for all work-items in -the group and must be in the range [code]#[0, get_local_linear_range())#. +_Preconditions:_ [code]#local_linear_id# must be the same for all work-items +in the group and must be in the range [code]#[0, get_local_linear_range())#. -_Returns:_ The value of [code]#x# from the work-item with the specified linear -id within group [code]#g#. +_Returns:_ The value of [code]#x# from the work-item with the specified +linear id within group [code]#g#. -- . _Constraints:_ Available only if @@ -19882,8 +20135,8 @@ id within group [code]#g#. -- _Preconditions:_ [code]#local_id# must be the same for all work-items in the group, and its dimensionality must match the dimensionality of the group. -The value of [code]#local_id# in each dimension must be greater than or equal -to 0 and less than the value of [code]#get_local_range()# in the same +The value of [code]#local_id# in each dimension must be greater than or +equal to 0 and less than the value of [code]#get_local_range()# in the same dimension. _Returns:_ The value of [code]#x# from the work-item with the specified id @@ -19904,51 +20157,55 @@ include::{header_dir}/groups/barrier.h[lines=4..-1] [code]#sycl::is_group_v># is true. + -- -_Effects:_ Synchronizes all work-items in group [code]#g#. The current -work-item will wait at the barrier until all work-items in group [code]#g# have -reached the barrier. In addition, the barrier performs <> operations ensuring that -memory accesses issued before the barrier are not re-ordered with those issued -after the barrier: all work-items in group [code]#g# execute a release fence -prior to synchronizing at the barrier, all work-items in group [code]#g# -execute an acquire fence afterwards, and there is an implicit synchronization -of these fences as if provided by an explicit atomic operation on an atomic -object. - -By default, the scope of these fences is set to the narrowest -scope including all work-items in group [code]#g# (as reported by -[code]#Group::fence_scope#). This scope may be optionally overridden -with a wider scope, specified by the [code]#fence_scope# argument. +_Effects:_ Synchronizes all work-items in group [code]#g#. +The current work-item will wait at the barrier until all work-items in group +[code]#g# have reached the barrier. +In addition, the barrier performs <> operations ensuring that +memory accesses issued before the barrier are not re-ordered with those +issued after the barrier: all work-items in group [code]#g# execute a +release fence prior to synchronizing at the barrier, all work-items in group +[code]#g# execute an acquire fence afterwards, and there is an implicit +synchronization of these fences as if provided by an explicit atomic +operation on an atomic object. + +By default, the scope of these fences is set to the narrowest scope +including all work-items in group [code]#g# (as reported by +[code]#Group::fence_scope#). +This scope may be optionally overridden with a wider scope, specified by the +[code]#fence_scope# argument. -- [[sec:algorithms]] === Group algorithms library -SYCL provides an algorithms library based on the functions described -in Section 28 of the {cpp17} specification. The first argument to each function -is a <>, and data ranges can be described using pointers, iterators or -instances of the [code]#multi_ptr# class. The functions defined in this -section are free functions available in the [code]#sycl# namespace. - -Any restrictions from the standard algorithms library apply. Some of the -functions in the SYCL algorithms library introduce additional restrictions -in order to maximize portability across different devices and to minimize -the chances of encountering unexpected behavior. - -All algorithms are supported for the fundamental scalar types supported by SYCL -(see <>) and instances of the SYCL -[code]#vec# and [code]#marray# classes. - -The <> argument to a SYCL algorithm denotes that it should be performed -collaboratively by the work-items in the specified group. All algorithms -act as group functions (as defined in <>), inheriting all -restrictions of group functions. Unless the description of a function says -otherwise, how the elements of a range are processed by the work-items in a -group is undefined. - -SYCL provides separate functions for algorithms which use the work-items in a -group to execute an operation over a range of iterators and algorithms which -are applied to data held directly by the work-items in a group. An example -of the usage of these functions is given below: +SYCL provides an algorithms library based on the functions described in +Section 28 of the {cpp17} specification. +The first argument to each function is a <>, and data ranges can be +described using pointers, iterators or instances of the [code]#multi_ptr# +class. +The functions defined in this section are free functions available in the +[code]#sycl# namespace. + +Any restrictions from the standard algorithms library apply. +Some of the functions in the SYCL algorithms library introduce additional +restrictions in order to maximize portability across different devices and +to minimize the chances of encountering unexpected behavior. + +All algorithms are supported for the fundamental scalar types supported by +SYCL (see <>) and instances of the SYCL [code]#vec# +and [code]#marray# classes. + +The <> argument to a SYCL algorithm denotes that it should be +performed collaboratively by the work-items in the specified group. +All algorithms act as group functions (as defined in +<>), inheriting all restrictions of group functions. +Unless the description of a function says otherwise, how the elements of a +range are processed by the work-items in a group is undefined. + +SYCL provides separate functions for algorithms which use the work-items in +a group to execute an operation over a range of iterators and algorithms +which are applied to data held directly by the work-items in a group. +An example of the usage of these functions is given below: [[listing.group.algorithms]] .Using the group algorithms library to perform a work-group reduce @@ -19959,9 +20216,9 @@ include::{code_dir}/algorithms.cpp[lines=4..-1] ==== [code]#any_of#, [code]#all_of# and [code]#none_of# -The [code]#any_of#, [code]#all_of# and [code]#none_of# functions from standard -{cpp} test whether Boolean conditions hold for any of, all of or none of the -values in a range, respectively. +The [code]#any_of#, [code]#all_of# and [code]#none_of# functions from +standard {cpp} test whether Boolean conditions hold for any of, all of or +none of the values in a range, respectively. SYCL provides two sets of similar algorithms: @@ -19969,7 +20226,8 @@ SYCL provides two sets of similar algorithms: work-items in a group to execute the corresponding algorithm in parallel. . [code]#any_of_group#, [code]#all_of_group# and [code]#none_of_group# test -Boolean conditions applied to data held directly by the work-items in a group. +Boolean conditions applied to data held directly by the work-items in a +group. [source,,linenums] ---- @@ -19977,13 +20235,13 @@ include::{header_dir}/algorithms/any_of.h[lines=4..-1] ---- . _Constraints:_ Available only if - [code]#sycl::is_group_v># is true and [code]#Ptr# is a - pointer. + [code]#sycl::is_group_v># is true and [code]#Ptr# is + a pointer. + -- _Preconditions:_ [code]#first# and [code]#last# must be the same for all -work-items in group [code]#g#, and [code]#pred# must be an immutable callable -with the same type and state for all work-items in group [code]#g#. +work-items in group [code]#g#, and [code]#pred# must be an immutable +callable with the same type and state for all work-items in group [code]#g#. _Returns:_ true if [code]#pred# returns true when applied to the result of dereferencing any iterator in the range [code]#[first, last)#. @@ -19993,8 +20251,8 @@ dereferencing any iterator in the range [code]#[first, last)#. [code]#sycl::is_group_v># is true. + -- -_Preconditions:_ [code]#pred# must be an immutable callable with the same type -and state for all work-items in group [code]#g#. +_Preconditions:_ [code]#pred# must be an immutable callable with the same +type and state for all work-items in group [code]#g#. _Returns:_ true if [code]#pred(x)# returns true for any work-item in group [code]#g#. @@ -20004,7 +20262,8 @@ _Returns:_ true if [code]#pred(x)# returns true for any work-item in group [code]#sycl::is_group_v># is true. + -- -_Returns:_ true if [code]#pred# is true for any work-item in group [code]#g#. +_Returns:_ true if [code]#pred# is true for any work-item in group +[code]#g#. -- [source,,linenums] @@ -20013,13 +20272,13 @@ include::{header_dir}/algorithms/all_of.h[lines=4..-1] ---- . _Constraints:_ Available only if - [code]#sycl::is_group_v># is true and [code]#Ptr# is a - pointer. + [code]#sycl::is_group_v># is true and [code]#Ptr# is + a pointer. + -- _Preconditions:_ [code]#first# and [code]#last# must be the same for all -work-items in group [code]#g#, and [code]#pred# must be an immutable callable -with the same type and state for all work-items in group [code]#g#. +work-items in group [code]#g#, and [code]#pred# must be an immutable +callable with the same type and state for all work-items in group [code]#g#. _Returns:_ true if [code]#pred# returns true when applied to the result of dereferencing all iterators in the range [code]#[first, last)#. @@ -20029,8 +20288,8 @@ dereferencing all iterators in the range [code]#[first, last)#. [code]#sycl::is_group_v># is true. + -- -_Preconditions:_ [code]#pred# must be an immutable callable with the same type -and state for all work-items in group [code]#g#. +_Preconditions:_ [code]#pred# must be an immutable callable with the same +type and state for all work-items in group [code]#g#. _Returns:_ true if [code]#pred(x)# returns true for all work-items in group [code]#g#. @@ -20040,7 +20299,8 @@ _Returns:_ true if [code]#pred(x)# returns true for all work-items in group [code]#sycl::is_group_v># is true. + -- -_Returns:_ true if [code]#pred# is true for all work-items in group [code]#g#. +_Returns:_ true if [code]#pred# is true for all work-items in group +[code]#g#. -- [source,,linenums] @@ -20049,13 +20309,13 @@ include::{header_dir}/algorithms/none_of.h[lines=4..-1] ---- . _Constraints:_ Available only if - [code]#sycl::is_group_v># is true and [code]#Ptr# is a - pointer. + [code]#sycl::is_group_v># is true and [code]#Ptr# is + a pointer. + -- _Preconditions:_ [code]#first# and [code]#last# must be the same for all -work-items in group [code]#g#, and [code]#pred# must be an immutable callable -with the same type and state for all work-items in group [code]#g#. +work-items in group [code]#g#, and [code]#pred# must be an immutable +callable with the same type and state for all work-items in group [code]#g#. _Returns:_ true if [code]#pred# returns false when applied to the result of dereferencing all iterators in the range [code]#[first, last)#. @@ -20065,8 +20325,8 @@ dereferencing all iterators in the range [code]#[first, last)#. [code]#sycl::is_group_v># is true. + -- -_Preconditions:_ [code]#pred# must be an immutable callable with the same type -and state for all work-items in group [code]#g#. +_Preconditions:_ [code]#pred# must be an immutable callable with the same +type and state for all work-items in group [code]#g#. _Returns:_ true if [code]#pred(x)# returns false for all work-items in group [code]#g#. @@ -20076,7 +20336,8 @@ _Returns:_ true if [code]#pred(x)# returns false for all work-items in group [code]#sycl::is_group_v># is true. + -- -_Returns:_ true if [code]#pred# is false for all work-items in group [code]#g#. +_Returns:_ true if [code]#pred# is false for all work-items in group +[code]#g#. -- ==== [code]#shift_left# and [code]#shift_right# @@ -20084,11 +20345,12 @@ _Returns:_ true if [code]#pred# is false for all work-items in group [code]#g#. The [code]#shift_left# and [code]#shift_right# functions from standard {cpp} move values in a range down (to the left) or up (to the right) respectively. -SYCL provides similar algorithms compatible with the [code]#sub_group# class: +SYCL provides similar algorithms compatible with the [code]#sub_group# +class: . [code]#shift_group_left# and [code]#shift_group_right# move values held by -the work-items in a group directly to another work-item in group [code]#g#, by -shifting values a fixed number of work-items to the left or right. +the work-items in a group directly to another work-item in group [code]#g#, +by shifting values a fixed number of work-items to the left or right. [source,,linenums] ---- @@ -20129,8 +20391,8 @@ SYCL provides an algorithm to permute the values held by work-items in a sub-group: . [code]#permute_group_by_xor# permutes values by exchanging values held by pairs -of work-items identified by computing the bitwise exclusive OR of the work-item -id and some fixed mask. +of work-items identified by computing the bitwise exclusive OR of the +work-item id and some fixed mask. [source,,linenums] ---- @@ -20145,16 +20407,17 @@ include::{header_dir}/algorithms/permute.h[lines=4..-1] _Preconditions:_ [code]#mask# must be the same for all work-items in the group. -_Returns:_ the value of [code]#x# from the work-item whose group local id -is equal to the bitwise exclusive OR of the calling work-item's group local id -and [code]#mask#. The result of the exclusive OR may be greater than or equal to -the group's linear size, but the value returned in this case is unspecified. +_Returns:_ the value of [code]#x# from the work-item whose group local id is +equal to the bitwise exclusive OR of the calling work-item's group local id +and [code]#mask#. +The result of the exclusive OR may be greater than or equal to the group's +linear size, but the value returned in this case is unspecified. -- ==== [code]#select# -SYCL provides an algorithm to directly exchange the values held by work-items in -a sub-group: +SYCL provides an algorithm to directly exchange the values held by +work-items in a sub-group: . [code]#select_from_group# allows work-items to obtain a copy of a value held by any other work-item in group [code]#g#. @@ -20170,17 +20433,18 @@ include::{header_dir}/algorithms/select.h[lines=4..-1] + -- _Returns:_ the value of [code]#x# from the work-item with the group local id -specified by [code]#remote_local_id#. The value of [code]#remote_local_id# may -be outside of the group, but the value returned in this case is unspecified. +specified by [code]#remote_local_id#. +The value of [code]#remote_local_id# may be outside of the group, but the +value returned in this case is unspecified. -- ==== [code]#reduce# -The [code]#reduce# function from standard {cpp} combines the values in a range in -an unspecified order using a binary operator. +The [code]#reduce# function from standard {cpp} combines the values in a +range in an unspecified order using a binary operator. -SYCL provides two similar algorithms that compute the same generalized sum as -defined by standard {cpp}: +SYCL provides two similar algorithms that compute the same generalized sum +as defined by standard {cpp}: . [code]#joint_reduce# uses the work-items in a group to execute a [code]#reduce# operation in parallel. @@ -20189,10 +20453,10 @@ defined by standard {cpp}: a group. The result of a call to these functions is non-deterministic if the binary -operator is not commutative and associative. Only the binary operators defined -in <> are supported by the [code]#reduce# functions in -SYCL 2020, but the standard {cpp} syntax is used for forward compatibility with -future SYCL versions. +operator is not commutative and associative. +Only the binary operators defined in <> are supported +by the [code]#reduce# functions in SYCL 2020, but the standard {cpp} syntax +is used for forward compatibility with future SYCL versions. [source,,linenums] ---- @@ -20208,14 +20472,14 @@ include::{header_dir}/algorithms/reduce.h[lines=4..-1] _Mandates:_ [code]#binary_op(*first, *first)# must return a value of type [code]#std::iterator_traits::value_type#. -_Preconditions:_ [code]#first#, [code]#last# and the type of [code]#binary_op# -must be the same for all work-items in group [code]#g#. [code]#binary_op# must -be an instance of a SYCL function object. +_Preconditions:_ [code]#first#, [code]#last# and the type of +[code]#binary_op# must be the same for all work-items in group [code]#g#. +[code]#binary_op# must be an instance of a SYCL function object. -_Returns:_ The result of combining the values resulting from dereferencing all -iterators in the range [code]#[first, last)# using the operator -[code]#binary_op#, where the values are combined according to the generalized -sum defined in standard {cpp}. +_Returns:_ The result of combining the values resulting from dereferencing +all iterators in the range [code]#[first, last)# using the operator +[code]#binary_op#, where the values are combined according to the +generalized sum defined in standard {cpp}. -- . _Constraints:_ Available only if @@ -20231,10 +20495,10 @@ _Preconditions:_ [code]#first#, [code]#last#, [code]#init# and the type of [code]#binary_op# must be the same for all work-items in group [code]#g#. [code]#binary_op# must be an instance of a SYCL function object. -_Returns:_ The result of combining the values resulting from dereferencing all -iterators in the range [code]#[first, last)# and the initial value -[code]#init# using the operator [code]#binary_op#, where the values are combined -according to the generalized sum defined in standard {cpp}. +_Returns:_ The result of combining the values resulting from dereferencing +all iterators in the range [code]#[first, last)# and the initial value +[code]#init# using the operator [code]#binary_op#, where the values are +combined according to the generalized sum defined in standard {cpp}. -- . _Constraints:_ Available only if @@ -20249,8 +20513,9 @@ _Preconditions:_ [code]#binary_op# must be an instance of a SYCL function object. _Returns:_ The result of combining all the values of [code]#x# specified by -each work-item in group [code]#g# using the operator [code]#binary_op#, where -the values are combined according to the generalized sum defined in standard {cpp}. +each work-item in group [code]#g# using the operator [code]#binary_op#, +where the values are combined according to the generalized sum defined in +standard {cpp}. -- . _Constraints:_ Available only if @@ -20259,42 +20524,45 @@ the values are combined according to the generalized sum defined in standard {cp function object type. + -- -_Mandates:_ [code]#binary_op(init, x)# must return a value of type [code]#T#. +_Mandates:_ [code]#binary_op(init, x)# must return a value of type +[code]#T#. _Preconditions:_ [code]#binary_op# must be an instance of a SYCL function object. _Returns:_ The result of combining all the values of [code]#x# specified by -each work-item in group [code]#g# and the initial value [code]#init# using the -operator [code]#binary_op#, where the values are combined according to the -generalized sum defined in standard {cpp}. +each work-item in group [code]#g# and the initial value [code]#init# using +the operator [code]#binary_op#, where the values are combined according to +the generalized sum defined in standard {cpp}. -- ==== [code]#exclusive_scan# and [code]#inclusive_scan# The [code]#exclusive_scan# and [code]#inclusive_scan# functions in standard -{cpp} compute a prefix sum using a binary operator. For a scan of elements -_[x~0~, {ldots}, x~n~]_, the _i_ th result in an exclusive scan is the -generalized noncommutative sum of all elements preceding _x~i~_ (excluding -_x~i~_ itself), whereas the _i_ th result in an inclusive scan is the -generalized noncommutative sum of all elements preceding _x~i~_ (including -_x~i~_ itself). +{cpp} compute a prefix sum using a binary operator. +For a scan of elements _[x~0~, {ldots}, x~n~]_, the _i_ th result in an +exclusive scan is the generalized noncommutative sum of all elements +preceding _x~i~_ (excluding _x~i~_ itself), whereas the _i_ th result in an +inclusive scan is the generalized noncommutative sum of all elements +preceding _x~i~_ (including _x~i~_ itself). -SYCL provides two similar sets of algorithms that compute the same prefix sums -using the generalized noncommutative sum as defined by standard {cpp}: +SYCL provides two similar sets of algorithms that compute the same prefix +sums using the generalized noncommutative sum as defined by standard {cpp}: . [code]#joint_exclusive_scan# and [code]#joint_inclusive_scan# use the -work-items in a group to execute the corresponding algorithm in parallel, and -intermediate partial prefix sums are written to memory as in standard {cpp}. +work-items in a group to execute the corresponding algorithm in parallel, +and intermediate partial prefix sums are written to memory as in standard +{cpp}. . [code]#exclusive_scan_over_group# and [code]#inclusive_scan_over_group# -perform a scan over values held directly by the work-items in a group, and the -result returned to each work-item represents a partial prefix sum. +perform a scan over values held directly by the work-items in a group, and +the result returned to each work-item represents a partial prefix sum. -The result of a call to a scan is non-deterministic if the binary operator is not -associative. Only the binary operators defined in <> are -supported by the scan functions in SYCL 2020, but the standard {cpp} syntax is -used for forward compatibility with future SYCL versions. +The result of a call to a scan is non-deterministic if the binary operator +is not associative. +Only the binary operators defined in <> are supported +by the scan functions in SYCL 2020, but the standard {cpp} syntax is used +for forward compatibility with future SYCL versions. [source,,linenums] ---- @@ -20312,19 +20580,20 @@ _Mandates:_ [code]#binary_op(*first, *first)# must return a value of type _Preconditions:_ [code]#first#, [code]#last#, [code]#result# and the type of [code]#binary_op# must be the same for all work-items in group [code]#g#. -[code]#binary_op# must be an instance of a SYCL function object. +[code]#binary_op# must be an instance of a SYCL function object. [NOTE] ==== Note that [code]#first# may be equal to [code]#result#. ==== -_Effects:_ The value written to [code]#result# + _i_ is the exclusive scan of -the values resulting from dereferencing the first _i_ values in the range +_Effects:_ The value written to [code]#result# + _i_ is the exclusive scan +of the values resulting from dereferencing the first _i_ values in the range [code]#[first, last)# and the identity value of [code]#binary_op# (as identified by [code]#sycl::known_identity#), using the operator -[code]#binary_op#. The scan is computed using a generalized noncommutative sum -as defined in standard {cpp}. +[code]#binary_op#. +The scan is computed using a generalized noncommutative sum as defined in +standard {cpp}. _Returns:_ A pointer to the end of the output range. -- @@ -20339,20 +20608,22 @@ _Returns:_ A pointer to the end of the output range. _Mandates:_ [code]#binary_op(init, *first)# must return a value of type [code]#T#. -_Preconditions:_ [code]#first#, [code]#last#, [code]#result#, [code]#init# and the -type of [code]#binary_op# must be the same for all work-items in group -[code]#g#. [code]#binary_op# must be an instance of a SYCL function object. +_Preconditions:_ [code]#first#, [code]#last#, [code]#result#, [code]#init# +and the type of [code]#binary_op# must be the same for all work-items in +group [code]#g#. +[code]#binary_op# must be an instance of a SYCL function object. [NOTE] ==== Note that [code]#first# may be equal to [code]#result#. ==== -_Effects:_ The value written to [code]#result# + _i_ is the exclusive scan of -the values resulting from dereferencing the first _i_ values in the range +_Effects:_ The value written to [code]#result# + _i_ is the exclusive scan +of the values resulting from dereferencing the first _i_ values in the range [code]#[first, last)# and an initial value specified by [code]#init#, using -the operator [code]#binary_op#. The scan is computed using a generalized -noncommutative sum as defined in standard {cpp}. +the operator [code]#binary_op#. +The scan is computed using a generalized noncommutative sum as defined in +standard {cpp}. _Returns:_ A pointer to the end of the output range. -- @@ -20368,12 +20639,14 @@ _Mandates:_ [code]#binary_op(x, x)# must return a value of type [code]#T#. _Preconditions:_ [code]#binary_op# must be an instance of a SYCL function object. -_Returns:_ The value returned on work-item _i_ is the exclusive scan of -the first _i_ values in group [code]#g# and the identity value of +_Returns:_ The value returned on work-item _i_ is the exclusive scan of the +first _i_ values in group [code]#g# and the identity value of [code]#binary_op# (as identified by [code]#sycl::known_identity#), using the -operator [code]#binary_op#. The scan is computed using a generalized -noncommutative sum as defined in standard {cpp}. For multi-dimensional groups, -the order of work-items in group [code]#g# is determined by their linear id. +operator [code]#binary_op#. +The scan is computed using a generalized noncommutative sum as defined in +standard {cpp}. +For multi-dimensional groups, the order of work-items in group [code]#g# is +determined by their linear id. -- . _Constraints:_ Available only if @@ -20382,16 +20655,18 @@ the order of work-items in group [code]#g# is determined by their linear id. function object type. + -- -_Mandates:_ [code]#binary_op(init, x)# must return a value of type [code]#T#. +_Mandates:_ [code]#binary_op(init, x)# must return a value of type +[code]#T#. _Preconditions:_ [code]#binary_op# must be an instance of a SYCL function object. -_Returns:_ The value returned on work-item _i_ is the exclusive scan of -the first _i_ values in group [code]#g# and an initial value specified by -[code]#init#, using the operator [code]#binary_op#. The scan is computed using -a generalized noncommutative sum as defined in standard {cpp}. For -multi-dimensional groups, the order of work-items in group [code]#g# is +_Returns:_ The value returned on work-item _i_ is the exclusive scan of the +first _i_ values in group [code]#g# and an initial value specified by +[code]#init#, using the operator [code]#binary_op#. +The scan is computed using a generalized noncommutative sum as defined in +standard {cpp}. +For multi-dimensional groups, the order of work-items in group [code]#g# is determined by their linear id. -- @@ -20411,44 +20686,48 @@ _Mandates:_ [code]#binary_op(*first, *first)# must return a value of type _Preconditions:_ [code]#first#, [code]#last#, [code]#result# and the type of [code]#binary_op# must be the same for all work-items in group [code]#g#. -[code]#binary_op# must be an instance of a SYCL function object. +[code]#binary_op# must be an instance of a SYCL function object. [NOTE] ==== Note that [code]#first# may be equal to [code]#result#. ==== -_Effects:_ The value written to [code]#result# + _i_ is the inclusive scan of -the values resulting from dereferencing the first _i_ values in the range -[code]#[first, last)#, using the operator [code]#binary_op#. The scan is -computed using a generalized noncommutative sum as defined in standard {cpp}. +_Effects:_ The value written to [code]#result# + _i_ is the inclusive scan +of the values resulting from dereferencing the first _i_ values in the range +[code]#[first, last)#, using the operator [code]#binary_op#. +The scan is computed using a generalized noncommutative sum as defined in +standard {cpp}. _Returns:_ A pointer to the end of the output range. -- . _Constraints:_ Available only if [code]#sycl::is_group_v># is true, [code]#InPtr# and - [code]#OutPtr# are pointers to fundamental types, [code]#BinaryOperation# - is a SYCL function object type, and [code]#T# is a fundamental type. + [code]#OutPtr# are pointers to fundamental types, + [code]#BinaryOperation# is a SYCL function object type, and [code]#T# is + a fundamental type. + -- _Mandates:_ [code]#binary_op(init, *first)# must return a value of type [code]#T#. -_Preconditions:_ [code]#first#, [code]#last#, [code]#result#, [code]#init# and the -type of [code]#binary_op# must be the same for all work-items in group -[code]#g#. [code]#binary_op# must be an instance of a SYCL function object. +_Preconditions:_ [code]#first#, [code]#last#, [code]#result#, [code]#init# +and the type of [code]#binary_op# must be the same for all work-items in +group [code]#g#. +[code]#binary_op# must be an instance of a SYCL function object. [NOTE] ==== Note that [code]#first# may be equal to [code]#result#. ==== -_Effects:_ The value written to [code]#result# + _i_ is the inclusive scan of -the values resulting from dereferencing the first _i_ values in the range -[code]#[first, last)# and an initial value specified by -[code]#init#, using the operator [code]#binary_op#. The scan is computed using -a generalized noncommutative sum as defined in standard {cpp}. +_Effects:_ The value written to [code]#result# + _i_ is the inclusive scan +of the values resulting from dereferencing the first _i_ values in the range +[code]#[first, last)# and an initial value specified by [code]#init#, using +the operator [code]#binary_op#. +The scan is computed using a generalized noncommutative sum as defined in +standard {cpp}. _Returns:_ A pointer to the end of the output range. -- @@ -20464,29 +20743,32 @@ _Mandates:_ [code]#binary_op(x, x)# must return a value of type [code]#T#. _Preconditions:_ [code]#binary_op# must be an instance of a SYCL function object. -_Returns:_ The value returned on work-item _i_ is the inclusive scan of -the first _i_ values in group [code]#g#, using the operator [code]#binary_op#. +_Returns:_ The value returned on work-item _i_ is the inclusive scan of the +first _i_ values in group [code]#g#, using the operator [code]#binary_op#. The scan is computed using a generalized noncommutative sum as defined in -standard {cpp}. For multi-dimensional groups, the order of work-items in group -[code]#g# is determined by their linear id. +standard {cpp}. +For multi-dimensional groups, the order of work-items in group [code]#g# is +determined by their linear id. -- . _Constraints:_ Available only if [code]#sycl::is_group_v># is true, [code]#V# is a - fundamental type, [code]#BinaryOperation# is a SYCL function object type, - and [code]#T# is a fundamental type. + fundamental type, [code]#BinaryOperation# is a SYCL function object + type, and [code]#T# is a fundamental type. + -- -_Mandates:_ [code]#binary_op(init, x)# must return a value of type [code]#T#. +_Mandates:_ [code]#binary_op(init, x)# must return a value of type +[code]#T#. _Preconditions:_ [code]#binary_op# must be an instance of a SYCL function object. -_Returns:_ The value returned on work-item _i_ is the inclusive scan of -the first _i_ values in group [code]#g# and an initial value specified by -[code]#init#, using the operator [code]#binary_op#. The scan is computed using -a generalized noncommutative sum as defined in standard {cpp}. For -multi-dimensional groups, the order of work-items in group [code]#g# is +_Returns:_ The value returned on work-item _i_ is the inclusive scan of the +first _i_ values in group [code]#g# and an initial value specified by +[code]#init#, using the operator [code]#binary_op#. +The scan is computed using a generalized noncommutative sum as defined in +standard {cpp}. +For multi-dimensional groups, the order of work-items in group [code]#g# is determined by their linear id. -- @@ -20494,11 +20776,11 @@ determined by their linear id. === Math functions In SYCL the OpenCL math functions are available in the namespace -[code]#sycl# on host and device with the same precision -guarantees as defined in the OpenCL 1.2 specification document -<> for host and device. For a SYCL platform the -numerical requirements for host need to match the numerical -requirements of the OpenCL math built-in functions. +[code]#sycl# on host and device with the same precision guarantees as +defined in the OpenCL 1.2 specification document <> for host and device. +For a SYCL platform the numerical requirements for host need to match the +numerical requirements of the OpenCL math built-in functions. The built-in functions available for SYCL host and device, with the same precision requirements for both host and device, are described in @@ -23182,9 +23464,9 @@ corresponding [code]#vec#. === Native precision math functions -In SYCL the implementation-defined precision math functions are -defined in the namespace [code]#sycl::native#. The functions -that are available within this namespace are specified in +In SYCL the implementation-defined precision math functions are defined in +the namespace [code]#sycl::native#. +The functions that are available within this namespace are specified in <>. The range of valid input values and the maximum error for these functions is @@ -23645,12 +23927,12 @@ corresponding [code]#vec#. === Half precision math functions -In SYCL the half precision math functions are defined in -the namespace [code]#sycl::half_precision#. The functions that are -available within this namespace are specified in -<>. These functions are -implemented with a minimum of 10-bits of accuracy i.e. the maximum error is -less than or equal to 8192 ulp. +In SYCL the half precision math functions are defined in the namespace +[code]#sycl::half_precision#. +The functions that are available within this namespace are specified in +<>. +These functions are implemented with a minimum of 10-bits of accuracy i.e. +the maximum error is less than or equal to 8192 ulp. [[table.half.math.functions]] .Half precision math functions @@ -24126,8 +24408,8 @@ corresponding [code]#vec#. <> describes the integer math functions that are available in the [code]#sycl# namespace in both host and device code. -The function descriptions in this section use the term _generic integer type_ -to represent the following types: +The function descriptions in this section use the term _generic integer +type_ to represent the following types: * [code]#char# * [code]#signed char# @@ -24975,12 +25257,13 @@ corresponding [code]#vec#. === Common functions In SYCL the OpenCL [keyword]#common functions# are available in the -namespace [code]#sycl# on host and device as defined in the -OpenCL 1.2 specification document <>. They -are described here in <>. +namespace [code]#sycl# on host and device as defined in the OpenCL 1.2 +specification document <>. +They are described here in <>. -The function descriptions in this section use the term _generic floating point -type_ to represent the following types: +The function descriptions in this section use the term _generic floating +point type_ to represent the following types: * [code]#float# * [code]#double# @@ -25439,16 +25722,17 @@ corresponding [code]#vec#. In SYCL the OpenCL [keyword]#geometric functions# are available in the namespace [code]#sycl# on host and device as defined in the OpenCL 1.2 -specification document <>. On the host the -vector types use the [code]#vec# class and on an SYCL device use the -corresponding native <> vector types. All of the geometric functions -use round-to-nearest-even rounding mode. +specification document <>. +On the host the vector types use the [code]#vec# class and on an SYCL device +use the corresponding native <> vector types. +All of the geometric functions use round-to-nearest-even rounding mode. <> contains the definitions of supported geometric functions. The function descriptions in this section use two terms that refer to a -specific list of types. The term _generic geometric type_ represents the -following types: +specific list of types. +The term _generic geometric type_ represents the following types: * [code]#float# * [code]#double# @@ -25695,8 +25979,8 @@ else with the following exceptions: -- - . If the sum of squares is greater than [code]#FLT_MAX# then the - value of the floating-point values in the result vector are undefined. + . If the sum of squares is greater than [code]#FLT_MAX# then the value of + the floating-point values in the result vector are undefined. . If the sum of squares is less than [code]#FLT_MIN# then the implementation may return back [code]#p#. -- @@ -25709,20 +25993,20 @@ corresponding [code]#vec#. === Relational functions -The functions in <> are defined in the [code]#sycl# -namespace and are available on both host and device. These functions perform -various relational comparisons on [code]#vec#, [code]#marray#, and scalar -types. +The functions in <> are defined in the +[code]#sycl# namespace and are available on both host and device. +These functions perform various relational comparisons on [code]#vec#, +[code]#marray#, and scalar types. The comparisons performed by [code]#isequal#, [code]#isgreater#, [code]#isgreaterequal#, [code]#isless#, [code]#islessequal#, and -[code]#islessgreater# are false when one or both operands are NaN. The -comparison performed by [code]#isnotequal# is true when one or both operands -are NaN. +[code]#islessgreater# are false when one or both operands are NaN. +The comparison performed by [code]#isnotequal# is true when one or both +operands are NaN. The function descriptions in this section use two terms that refer to a -specific list of types. The term _generic scalar type_ represents the -following types: +specific list of types. +The term _generic scalar type_ represents the following types: * [code]#char# * [code]#signed char# diff --git a/adoc/chapters/references.adoc b/adoc/chapters/references.adoc index 750c3980..542d14f1 100644 --- a/adoc/chapters/references.adoc +++ b/adoc/chapters/references.adoc @@ -3,7 +3,8 @@ = References [[cpp17]] International Organization for Standardization (ISO). -"`Programming Languages — {cpp}`". ISO/IEC 14882:2017, 2017. +"`Programming Languages — {cpp}`". +ISO/IEC 14882:2017, 2017. [[dr2325]] International Organization for Standardization (ISO). @@ -27,6 +28,5 @@ https://www.khronos.org/registry/OpenCL/specs/opencl-2.0.pdf . [[cpp20]] International Organization for Standardization (ISO). -" Programming Languages — {cpp}, Langages de programmation -— C++ ", International Standard ISO/IEC 14882:2020(E), -Sixth edition 2020-12, 2020. +" Programming Languages — {cpp}, Langages de programmation — C++ ", +International Standard ISO/IEC 14882:2020(E), Sixth edition 2020-12, 2020. diff --git a/adoc/chapters/what_changed.adoc b/adoc/chapters/what_changed.adoc index 090d23f9..34b0641b 100644 --- a/adoc/chapters/what_changed.adoc +++ b/adoc/chapters/what_changed.adoc @@ -7,30 +7,32 @@ [[sec:what-changed-between]] == What has changed from SYCL 1.2.1 to SYCL 2020 -The SYCL runtime moved from namespace [code]#cl::sycl# provided -by [code]#{hash}include # to namespace [code]#sycl# -provided by [code]#{hash}include # as explained in -<>. The old header file is still -available for compatibility with SYCL 1.2.1 applications. +The SYCL runtime moved from namespace [code]#cl::sycl# provided by +[code]#{hash}include # to namespace [code]#sycl# provided by +[code]#{hash}include # as explained in +<>. +The old header file is still available for compatibility with SYCL 1.2.1 +applications. The SYCL specification is now based on the core language of {cpp17}, as -described in <>. Features of -{cpp17} are now enabled within the specification, such as deduction guides -for class template argument deduction. +described in <>. +Features of {cpp17} are now enabled within the specification, such as +deduction guides for class template argument deduction. Naming of lambda functions passed to kernel invocations is now optional. Changes to buffers, images and accessors: - * The [code]#image# class has been removed. There are now new classes - [code]#unsampled_image# and [code]#sampled_image# which represent sampled - and unsampled images. The [code]#sampler# class has been removed and - replaced with the new [code]#image_sampler# structure. + * The [code]#image# class has been removed. + There are now new classes [code]#unsampled_image# and + [code]#sampled_image# which represent sampled and unsampled images. + The [code]#sampler# class has been removed and replaced with the new + [code]#image_sampler# structure. * Support for image arrays has been removed. - * The type name [code]#access::target# has been deprecated and replaced with - the type [code]#target#. + * The type name [code]#access::target# has been deprecated and replaced + with the type [code]#target#. * The type name [code]#access::mode# has been deprecated and replaced with the type [code]#access_mode#. @@ -39,46 +41,52 @@ Changes to buffers, images and accessors: has been deprecated and replaced with [code]#target::device#. * Support for the [code]#accessor# target [code]#target::host_buffer# has - been deprecated. There is now a new accessor class [code]#host_accessor# - which provides equivalent functionality. + been deprecated. + There is now a new accessor class [code]#host_accessor# which provides + equivalent functionality. * The [code]#buffer# member functions which return an [code]#accessor# of - type [code]#target::host_buffer# have been deprecated. A new member - function [code]#get_host_access()# has been added which returns a - [code]#host_accessor#. + type [code]#target::host_buffer# have been deprecated. + A new member function [code]#get_host_access()# has been added which + returns a [code]#host_accessor#. * The [code]#buffer# class has a new variadic overload of the [code]#get_access()# member function which allows construction of an [code]#accessor# with various parameters. * Support for the [code]#accessor# target [code]#target::local# has been - deprecated. There is now a new accessor class [code]#local_accessor# which - provides equivalent functionality. + deprecated. + There is now a new accessor class [code]#local_accessor# which provides + equivalent functionality. * Support for the [code]#accessor# targets [code]#target::image# and - [code]#target::host_image# have been removed. There are now new accessor - classes for sampled and unsampled images: [code]#sampled_image_accessor#, - [code]#host_sampled_image_accessor#, [code]#unsampled_image_accessor# and + [code]#target::host_image# have been removed. + There are now new accessor classes for sampled and unsampled images: + [code]#sampled_image_accessor#, [code]#host_sampled_image_accessor#, + [code]#unsampled_image_accessor# and [code]#host_unsampled_image_accessor#. * A new [code]#accessor# target [code]#target::host_task# has been added, which allows access to a [code]#buffer# from a <>. - * Support for the [code]#accessor# modes [code]#access_mode::discard_write# - and [code]#access_mode::discard_read_write# has been deprecated. Accessors - can now be constructed with a property list, and the new property - [code]#property::no_init# provides equivalent functionality. - - * Support for the [code]#accessor# mode [code]#access_mode::atomic# and the - member functions that return an instance of the [code]#atomic# class have - been deprecated in favor of using the new [code]#atomic_ref# class instead. - - * Support for the [code]#accessor# template parameter [code]#isPlaceholder# - has been deprecated, and the value of this parameter no longer has any - bearing on whether the accessor is a placeholder. The enumerated type - [code]#access::placeholder# is also deprecated. A placeholder - accessor can now be constructed by calling the appropriate constructor, - without regard to the template parameter. + * Support for the [code]#accessor# modes + [code]#access_mode::discard_write# and + [code]#access_mode::discard_read_write# has been deprecated. + Accessors can now be constructed with a property list, and the new + property [code]#property::no_init# provides equivalent functionality. + + * Support for the [code]#accessor# mode [code]#access_mode::atomic# and + the member functions that return an instance of the [code]#atomic# class + have been deprecated in favor of using the new [code]#atomic_ref# class + instead. + + * Support for the [code]#accessor# template parameter + [code]#isPlaceholder# has been deprecated, and the value of this + parameter no longer has any bearing on whether the accessor is a + placeholder. + The enumerated type [code]#access::placeholder# is also deprecated. + A placeholder accessor can now be constructed by calling the appropriate + constructor, without regard to the template parameter. * The return type of [code]#accessor::is_placeholder()# is no longer [code]#constexpr#. @@ -92,38 +100,42 @@ Changes to buffers, images and accessors: parameter, which allows the class template parameters to be inferred via {cpp} class template argument deduction (CTAD). - * The [code]#buffer# member function [code]#get_access()# now has a default - value for the [code]#target# template parameter, so it is no longer - necessary to provide any template parameters in order to get a + * The [code]#buffer# member function [code]#get_access()# now has a + default value for the [code]#target# template parameter, so it is no + longer necessary to provide any template parameters in order to get a [code]#access_mode::read_write# accessor. * The [code]#accessor# template parameters [code]#Dimensions# and - [code]#AccessMode# now have default values, so the only required template - parameter is [code]#DataT#. Moreover, the default access mode is either - [code]#access_mode::read_write# or [code]#access_mode::read#, - depending on the constness of the [code]#DataT# type. This makes it - possible to declare a read-only accessor by simply using a [code]#const# - qualified type. + [code]#AccessMode# now have default values, so the only required + template parameter is [code]#DataT#. + Moreover, the default access mode is either + [code]#access_mode::read_write# or [code]#access_mode::read#, depending + on the constness of the [code]#DataT# type. + This makes it possible to declare a read-only accessor by simply using a + [code]#const# qualified type. * Implicit conversions have been added between the two forms of read-only [code]#accessor# (one form has [code]#const DataT# and [code]#access_mode::read# and the other has non-const [code]#DataT# and - [code]#access_mode::read#). There is also an implicit conversion from - a read-write [code]#accessor# to either of the read-only forms. + [code]#access_mode::read#). + There is also an implicit conversion from a read-write [code]#accessor# + to either of the read-only forms. * Member functions of [code]#accessor# which return a reference to an element have been changed to return a [code]#const# reference for - read-only accessors. The [code]#get_pointer()# member function has also - been changed to return a [code]#const# pointer for read-only accessors. + read-only accessors. + The [code]#get_pointer()# member function has also been changed to + return a [code]#const# pointer for read-only accessors. The [code]#value_type# and [code]#reference# member types of - [code]#accessor# have been changed to be [code]#const# types for read-only - accessors. + [code]#accessor# have been changed to be [code]#const# types for + read-only accessors. * The [code]#accessor# class now meets the {cpp} requirement of - [code]#ReversibleContainer#. This includes (but is not limited to) - returning [code]#begin# and [code]#end# iterators, specifying a default - constructible accessor that can be passed to a kernel but not dereferenced, - and making them equality comparable. + [code]#ReversibleContainer#. + This includes (but is not limited to) returning [code]#begin# and + [code]#end# iterators, specifying a default constructible accessor that + can be passed to a kernel but not dereferenced, and making them equality + comparable. * Many of the [code]#accessor# member functions have been marked [code]#noexcept#. @@ -132,210 +144,225 @@ Changes to buffers, images and accessors: outside of its range; attempting to do so produces undefined behavior. * The semantics of the subscript operator have been changed for a - <> which has an offset. Calling [code]#operator[](0)# now - returns a reference to the first element in the range, rather than a - reference to the first element in the underlying buffer. + <> which has an offset. + Calling [code]#operator[](0)# now returns a reference to the first + element in the range, rather than a reference to the first element in + the underlying buffer. - * The behavior of buffers and accessors with a zero-sized range has been clarified. + * The behavior of buffers and accessors with a zero-sized range has been + clarified. Constant memory no longer appears in the SYCL device memory model in SYCL 2020. -The {cpp} attributes that decorate kernels are now better described, and their -position has changed so that they are applied directly to the kernel function. -(Previously, they were applied to a device function that the kernel calls, and -the implementation needed to propagate the information up to the enclosing -kernel.) The old {cpp} attribute form is no longer included in the SYCL -specification. +The {cpp} attributes that decorate kernels are now better described, and +their position has changed so that they are applied directly to the kernel +function. +(Previously, they were applied to a device function that the kernel calls, +and the implementation needed to propagate the information up to the +enclosing kernel.) The old {cpp} attribute form is no longer included in the +SYCL specification. Changes to the built-in functions specified in <>: * The specification no longer uses pseudo "generic type names" to describe these functions, and it now lists the exact synopsis for each function. - * The return type of the integer [code]#abs# and [code]#abs_diff# functions - has changed. The return type is now the same as the input type, matching - the {cpp} [code]#std::abs# function. + * The return type of the integer [code]#abs# and [code]#abs_diff# + functions has changed. + The return type is now the same as the input type, matching the {cpp} + [code]#std::abs# function. * The geometric functions specified in <> now support the [code]#half# data type. * The [code]#ctz# function was added to <>. - * The specification of [code]#clz# was clarified for the case when the input - is zero. + * The specification of [code]#clz# was clarified for the case when the + input is zero. The classes [code]#vector_class#, [code]#string_class#, -[code]#function_class#, [code]#mutex_class#, -[code]#shared_ptr_class#, [code]#weak_ptr_class#, -[code]#hash_class# and [code]#exception_ptr_class# have been -removed from the API and the standard classes -[code]#std::vector#, [code]#std::string#, -[code]#std::function#, [code]#std::mutex#, -[code]#std::shared_ptr#, [code]#std::weak_ptr#, -[code]#std::hash# and [code]#std::exception_ptr# are used -instead. - -The specific [code]#sycl::buffer# API taking -[code]#std::unique_ptr# has been removed. The behavior is the -same as in SYCL 1.2.1 but with a simplified API. Since there is still -the API taking [code]#std::shared_ptr# and there is an implicit -conversion from a [code]#std::unique_ptr# prvalue to a -[code]#std::shared_ptr#, the API can still be used as before with -a [code]#std::unique_ptr# to give away memory ownership. - -Offsets to [code]#parallel_for#, [code]#nd_range#, [code]#nd_item# and [code]#item# classes have been deprecated. -As such, the parallel iteration spaces all begin at [code]#(0,0,0)# and developers are now required to handle any offset arithmetic themselves. -The behavior of [code]#nd_item.get_global_linear_id()# and [code]#nd_item.get_local_linear_id()# has been clarified accordingly. - -Unified Shared Memory (USM), in <>, has been added as a pointer-based strategy -for data management. It defines several types of allocations with various -accessibility rules for host and devices. USM is meant to complement -buffers, not replace them. - -The [code]#queue# class received a new [code]#property# -that requires in-order semantics for a queue where operations are -executed in the order in which they are submitted. - -The [code]#queue# class received several new member functions to -invoke kernels directly on a queue objects instead of inside a -command group handler in the [code]#submit# member function. - -The [code]#queue# constructor overloads that accept both a [code]#context# and -a [code]#device# parameter have been broadened to allow the device to be either -a device that is in the context or a <> of a device that is -in the context. +[code]#function_class#, [code]#mutex_class#, [code]#shared_ptr_class#, +[code]#weak_ptr_class#, [code]#hash_class# and [code]#exception_ptr_class# +have been removed from the API and the standard classes [code]#std::vector#, +[code]#std::string#, [code]#std::function#, [code]#std::mutex#, +[code]#std::shared_ptr#, [code]#std::weak_ptr#, [code]#std::hash# and +[code]#std::exception_ptr# are used instead. + +The specific [code]#sycl::buffer# API taking [code]#std::unique_ptr# has +been removed. +The behavior is the same as in SYCL 1.2.1 but with a simplified API. +Since there is still the API taking [code]#std::shared_ptr# and there is an +implicit conversion from a [code]#std::unique_ptr# prvalue to a +[code]#std::shared_ptr#, the API can still be used as before with a +[code]#std::unique_ptr# to give away memory ownership. + +Offsets to [code]#parallel_for#, [code]#nd_range#, [code]#nd_item# and +[code]#item# classes have been deprecated. +As such, the parallel iteration spaces all begin at [code]#(0,0,0)# and +developers are now required to handle any offset arithmetic themselves. +The behavior of [code]#nd_item.get_global_linear_id()# and +[code]#nd_item.get_local_linear_id()# has been clarified accordingly. + +Unified Shared Memory (USM), in <>, has been added as a +pointer-based strategy for data management. +It defines several types of allocations with various accessibility rules for +host and devices. +USM is meant to complement buffers, not replace them. + +The [code]#queue# class received a new [code]#property# that requires +in-order semantics for a queue where operations are executed in the order in +which they are submitted. + +The [code]#queue# class received several new member functions to invoke +kernels directly on a queue objects instead of inside a command group +handler in the [code]#submit# member function. + +The [code]#queue# constructor overloads that accept both a [code]#context# +and a [code]#device# parameter have been broadened to allow the device to be +either a device that is in the context or a <> of a +device that is in the context. The [code]#program# class has been removed and replaced with a new class -[code]#kernel_bundle#, which provides similar functionality in a type-safe and -thread-safe way. The [code]#kernel# class has changed, and some member -functions have been removed. - -Support has been added for <>, -which allow a <> to use constant variables whose values -aren't known until the kernel is invoked. A <> can now -take an optional parameter of type [code]#kernel_handler#, which allows the -kernel to read the values of +[code]#kernel_bundle#, which provides similar functionality in a type-safe +and thread-safe way. +The [code]#kernel# class has changed, and some member functions have been +removed. + +Support has been added for +<>, which allow a +<> to use constant variables whose values aren't known +until the kernel is invoked. +A <> can now take an optional parameter of type +[code]#kernel_handler#, which allows the kernel to read the values of <>. -The constructors for SYCL [code]#context# and [code]#queue# -are made [code]#explicit# to prevent ambiguities in the selected -constructor resulting from implicit type conversion. +The constructors for SYCL [code]#context# and [code]#queue# are made +[code]#explicit# to prevent ambiguities in the selected constructor +resulting from implicit type conversion. -The requirement for {cpp} standard layout for data shared between host -and devices has been relaxed. SYCL now requires data shared between -host and devices to be <> as defined <>. +The requirement for {cpp} standard layout for data shared between host and +devices has been relaxed. +SYCL now requires data shared between host and devices to be +<> as defined <>. -The concept of a <> of <> was generalized to include -<> and <>. A <> is represented -by the [code]#sycl::group# class as in SYCL 1.2.1, and a <> -is represented by the new [code]#sycl::sub_group# class. +The concept of a <> of <> was generalized to +include <> and <>. +A <> is represented by the [code]#sycl::group# class as in SYCL +1.2.1, and a <> is represented by the new [code]#sycl::sub_group# +class. The [code]#host_task# member function for the [code]#queue# has been -introduced for en-queueing <> on a <> to schedule the -<> to invoke native {cpp} functions, conforming to the SYCL memory -model. <> also support interoperability with the native -<> objects associated at that point in the DAG using -the optional [code]#interop_handle# class. - -A library of algorithms based on the {cpp17} algorithms library -was introduced in <>. These algorithms -provide a simple way for developers to apply common parallel algorithms -using the work-items of a group. - -The definition of the [code]#sycl::group# class was modified to -support the new group functions in <>. +introduced for en-queueing <> on a <> to +schedule the <> to invoke native {cpp} functions, conforming +to the SYCL memory model. +<> also support interoperability with the native +<> objects associated at that point in the DAG using the optional +[code]#interop_handle# class. + +A library of algorithms based on the {cpp17} algorithms library was +introduced in <>. +These algorithms provide a simple way for developers to apply common +parallel algorithms using the work-items of a group. + +The definition of the [code]#sycl::group# class was modified to support the +new group functions in <>. New member types and variables were added to enable generic programming, and member functions were updated to encapsulate all functionality tied to -<> in the [code]#sycl::group# class. See -<> for details. +<> in the [code]#sycl::group# class. +See <> for details. The [code]#barrier# and [code]#mem_fence# member functions of the -[code]#nd_item# class have been removed. The [code]#barrier# member -function has been replaced by the [code]#group_barrier()# function, which -can be used to synchronize either <> or <>. The -[code]#mem_fence# member function has been replaced by the +[code]#nd_item# class have been removed. +The [code]#barrier# member function has been replaced by the +[code]#group_barrier()# function, which can be used to synchronize either +<> or <>. +The [code]#mem_fence# member function has been replaced by the [code]#atomic_fence# function, which is more closely aligned with -[code]#std::atomic_thread_fence# and offers control over memory ordering -and scope. +[code]#std::atomic_thread_fence# and offers control over memory ordering and +scope. -Changes in the SYCL [code]#vec# class described in -<>: +Changes in the SYCL [code]#vec# class described in <>: * [code]#operator[]# was added; * unary [code]#pass:[operator+()]# and [code]#operator-()# were added; -The device selection now relies on a simpler API based on ranking -functions used as <> described in +The device selection now relies on a simpler API based on ranking functions +used as <> described in <>. -A new device selector utility has been added to <>, -the [code]#aspect_selector#, which returns a selector object -that only selects devices that have all the requested aspects. +A new device selector utility has been added to <>, the +[code]#aspect_selector#, which returns a selector object that only selects +devices that have all the requested aspects. The device query [code]#info::fp_config::correctly_rounded_divide_sqrt# has been deprecated. A new reduction library consisting of the [code]#reduction# function and [code]#reducer# class was introduced to simplify the expression of variables -with <> semantics in SYCL kernels. See <>. +with <> semantics in SYCL kernels. +See <>. The [code]#atomic# class from SYCL 1.2.1 was deprecated in favor of a new [code]#atomic_ref# interface. -The SYCL exception class hierarchy has been condensed into a single exception -type: [code]#exception#. -[code]#exception# now derives from -[code]#std::exception#. The variety of errors are now provided via error -codes, which aligns with the {cpp} error code mechanism. +The SYCL exception class hierarchy has been condensed into a single +exception type: [code]#exception#. +[code]#exception# now derives from [code]#std::exception#. +The variety of errors are now provided via error codes, which aligns with +the {cpp} error code mechanism. The new error code mechanism now also generalizes the previous [code]#get_cl_code# interface to provide a generic interface way for querying backend-specific error codes. -Default asynchronous error handling behavior is now defined, so that asynchronous -errors will cause abnormal program termination even if a user-defined -asynchronous handler function is not defined. This prevents asynchronous errors -from being silently lost during early stages of application development. - -Kernel invocation functions, such as [code]#parallel_for#, now take -kernel functions by [code]#const# reference. Kernel functions must now have -a [code]#const#-qualified [code]#operator()#, and are allowed to be copied zero -or more times by an implementation. These clarifications allow implementations -to have flexibility for specific devices, and define what users should expect -with kernel functors. Specifically, kernel functors can not be marked as -[code]#mutable#, and sharing of data between work-items should not be -attempted through state stored within a kernel functor. - -A new concept called device <> has been added, which tells the set -of optional features a device supports. This new mechanism replaces the -[code]#has_extension()# function and some uses of [code]#get_info()#. - -There is a new <> which describes how extensions -to the SYCL language can be added by vendors and by the Khronos Group. - -A [code]#queue# constructor has been added that takes both a -[code]#device# and [code]#context#, to simplify interfacing -with libraries. - -The [code]#parallel_for# interface has been simplified in some forms -to accept a braced initializer list in place of a [code]#range#, and -to always take [code]#item# arguments. Kernel invocation functions have -also been modified to accept generic lambda expressions. Implicit conversions -from one-dimensional [code]#item# and one-dimensional [code]#id# to scalar types -have been defined. All of these modifications lead to simpler SYCL code in common -use cases. +Default asynchronous error handling behavior is now defined, so that +asynchronous errors will cause abnormal program termination even if a +user-defined asynchronous handler function is not defined. +This prevents asynchronous errors from being silently lost during early +stages of application development. + +Kernel invocation functions, such as [code]#parallel_for#, now take kernel +functions by [code]#const# reference. +Kernel functions must now have a [code]#const#-qualified [code]#operator()#, +and are allowed to be copied zero or more times by an implementation. +These clarifications allow implementations to have flexibility for specific +devices, and define what users should expect with kernel functors. +Specifically, kernel functors can not be marked as [code]#mutable#, and +sharing of data between work-items should not be attempted through state +stored within a kernel functor. + +A new concept called device <> has been added, which tells +the set of optional features a device supports. +This new mechanism replaces the [code]#has_extension()# function and some +uses of [code]#get_info()#. + +There is a new <> which describes how extensions to the +SYCL language can be added by vendors and by the Khronos Group. + +A [code]#queue# constructor has been added that takes both a [code]#device# +and [code]#context#, to simplify interfacing with libraries. + +The [code]#parallel_for# interface has been simplified in some forms to +accept a braced initializer list in place of a [code]#range#, and to always +take [code]#item# arguments. +Kernel invocation functions have also been modified to accept generic lambda +expressions. +Implicit conversions from one-dimensional [code]#item# and one-dimensional +[code]#id# to scalar types have been defined. +All of these modifications lead to simpler SYCL code in common use cases. The behaviour of executing a kernel over a [code]#range# or [code]#nd_range# with index space of zero has been clarified. -Some device-specific queries have been renamed to more clearly be "`device-specific -kernel`" [code]#get_info# queries ([code]#info::kernel_device_specific#) -instead of "`work-group`" ([code]#get_workgroup_info#) and sub-group -([code]#get_sub_group_info#) queries. +Some device-specific queries have been renamed to more clearly be +"`device-specific kernel`" [code]#get_info# queries +([code]#info::kernel_device_specific#) instead of "`work-group`" +([code]#get_workgroup_info#) and sub-group ([code]#get_sub_group_info#) +queries. -A new math array type [code]#marray# has been defined to begin disambiguation -of the multiple possible interpretations of how [code]#sycl::vec# should be -interpreted and implemented. +A new math array type [code]#marray# has been defined to begin +disambiguation of the multiple possible interpretations of how +[code]#sycl::vec# should be interpreted and implemented. Changes in SYCL address spaces: @@ -343,11 +370,11 @@ Changes in SYCL address spaces: * the generic address space was introduced; * the constant address space was deprecated; * behavior of unannotated pointer/reference (raw pointer/reference) is now - dependent on the compilation mode. The compiler can either interpret - unannotated pointer/reference has addressing the generic address space - or to be deduced; - * some ambiguities in the address space deduction were clarified. Notably - that deduced type does not affect the user-provided type. + dependent on the compilation mode. + The compiler can either interpret unannotated pointer/reference has + addressing the generic address space or to be deduced; + * some ambiguities in the address space deduction were clarified. + Notably that deduced type does not affect the user-provided type. Changes in [code]#multi_ptr# interface: @@ -355,24 +382,25 @@ Changes in [code]#multi_ptr# interface: the generic address space; * deprecation of [code]#access::address_space::constant_space#; * an extra template parameter to allow to select a flavor of the - [code]#multi_ptr# interface. There are now 3 different interfaces: - ** interface exposing undecorated types. Returned pointer and reference - are not annotated by an address space; - ** interface exposing decorated types. Returned pointer and reference are - annotated by an address space; + [code]#multi_ptr# interface. + There are now 3 different interfaces: + ** interface exposing undecorated types. + Returned pointer and reference are not annotated by an address space; + ** interface exposing decorated types. + Returned pointer and reference are annotated by an address space; ** legacy 1.2.1 interface (deprecated). * deprecation of the 1.2.1 interface; * deprecation of [code]#constant_ptr#; - * [code]#global_ptr#, [code]#local_ptr# and - [code]#private_ptr# alias take the new extra parameter; + * [code]#global_ptr#, [code]#local_ptr# and [code]#private_ptr# alias take + the new extra parameter; * addition of the [code]#address_space_cast# free function to cast undecorated pointer to [code]#multi_pointer#; * addition of construction/conversion operator for the generic address space; * removal of the constructor and assignment operator taking an unannotated pointer; - * implicit conversion to a pointer is now deprecated. [code]#get# should - be used instead; + * implicit conversion to a pointer is now deprecated. + [code]#get# should be used instead; * the return type of the member function [code]#get# now depends on the selected interface. * addition of the member function [code]#get_raw# which returns the @@ -385,64 +413,64 @@ The [code]#cl::sycl::byte# has been deprecated and now the {cpp17} [code]#std::byte# should be used instead. A SYCL implementation is no longer required to provide a host device. -Instead, an implementation is only required to provide at least one -device. Implementations are still allowed to provide devices that are -implemented on the host, but it is no longer required. The specification -no longer defines any special semantics for a "host device" and APIs -specific to the host device have been removed. +Instead, an implementation is only required to provide at least one device. +Implementations are still allowed to provide devices that are implemented on +the host, but it is no longer required. +The specification no longer defines any special semantics for a "host +device" and APIs specific to the host device have been removed. The default constructors for the [code]#device# and [code]#platform# classes -have been changed to construct a copy of the default device and a copy of the -platform containing the default device. Previously, they returned a copy of -the host device and a copy of the platform containing the host device. The -default constructor for the [code]#event# class has also been changed to +have been changed to construct a copy of the default device and a copy of +the platform containing the default device. +Previously, they returned a copy of the host device and a copy of the +platform containing the host device. +The default constructor for the [code]#event# class has also been changed to construct an event that comes from a default-constructed [code]#queue#. Previously, it constructed an event that used the host backend. -Explicit copy functions of the handler class -have also been introduced to the queue class as shortcuts for the handler ones. -This is enabled by the improved placeholder accessors -to help reduce code verbosity in certain cases -because the shortcut functions implicitly create a command group -and call [code]#handler::require#. +Explicit copy functions of the handler class have also been introduced to +the queue class as shortcuts for the handler ones. +This is enabled by the improved placeholder accessors to help reduce code +verbosity in certain cases because the shortcut functions implicitly create +a command group and call [code]#handler::require#. -Information query descriptors have been changed to structures under namespaces -named accordingly. [code]#param_traits# has been removed and the return type of -an information query is now contained in the descriptor. -The [code]#sycl::info::device::max_work_item_sizes# is now a -template that takes a dimension parameter corresponding to the number of -dimensions of the work-item size maxima. +Information query descriptors have been changed to structures under +namespaces named accordingly. +[code]#param_traits# has been removed and the return type of an information +query is now contained in the descriptor. +The [code]#sycl::info::device::max_work_item_sizes# is now a template that +takes a dimension parameter corresponding to the number of dimensions of the +work-item size maxima. Changes to retrieving size information: - * all [code]#get_size()# member functions have been deprecated - and replaced with [code]#byte_size()#, which is marked [code]#noexcept#; - * all [code]#get_count()# member functions have been deprecated - and replaced with [code]#size()#, which is marked [code]#noexcept#; - * in the [code]#vec# class the functions [code]#byte_size()# and [code]#size()# - are now static member functions; - * in the [code]#stream# class [code]#get_size()# has been deprecated - in favor of [code]#size()#, - whereas [code]#stream::byte_size()# is not available; + * all [code]#get_size()# member functions have been deprecated and + replaced with [code]#byte_size()#, which is marked [code]#noexcept#; + * all [code]#get_count()# member functions have been deprecated and + replaced with [code]#size()#, which is marked [code]#noexcept#; + * in the [code]#vec# class the functions [code]#byte_size()# and + [code]#size()# are now static member functions; + * in the [code]#stream# class [code]#get_size()# has been deprecated in + favor of [code]#size()#, whereas [code]#stream::byte_size()# is not + available; * accessors for sampled and unsampled images only define [code]#size()# and not [code]#byte_size()#. The device descriptors [code]#info::device::max_constant_buffer_size# and [code]#info::device::max_constant_args# are deprecated in SYCL 2020. -The [code]#buffer_allocator# is now templated on the data type -and follows the C++ named requirement [code]#Allocator#. +The [code]#buffer_allocator# is now templated on the data type and follows +the C++ named requirement [code]#Allocator#. // Expose various workarounds showing how to typeset +, ++ and -- The -The SYCL [code]#id# and [code]#range# have now unary -pass:quotes[[code\]#+#] and [code]#-# operations, prefix -[code]#++# and [code]#--# operations, postfix -pass:quotes[[code\]#++#] and pass:quotes[[code\]#--#] operations which -were forgotten in SYCL 1.2.1. - -In SYCL 1.2.1, the [code]#handler::copy()# overload with two [code]#accessor# -parameters did not clearly specify which accessor's size determines the amount -of memory that is copied. The spec now clarifies that the [code]#src# -accessor's size is used. +The SYCL [code]#id# and [code]#range# have now unary pass:quotes[[code\]#+#] +and [code]#-# operations, prefix [code]#++# and +[code]#--# operations, postfix pass:quotes[[code\]#++#] and +pass:quotes[[code\]#--#] operations which were forgotten in SYCL 1.2.1. + +In SYCL 1.2.1, the [code]#handler::copy()# overload with two +[code]#accessor# parameters did not clearly specify which accessor's size +determines the amount of memory that is copied. +The spec now clarifies that the [code]#src# accessor's size is used. // %%%%%%%%%%%%%%%%%%%%%%%%%%%% end what_changed %%%%%%%%%%%%%%%%%%%%%%%%%%%%