Skip to content

Commit

Permalink
[SYCL][DOC] Add an overload for memory_required in joint_sorter (inte…
Browse files Browse the repository at this point in the history
…l#11727)

From the implementation perspective it's important to have
`memory_required` in `default_sorters` dependent on the SYCL device. It
happens because `default_sorters` has no specified algorithm inside it
can depend on the specific SYCL device. `radix_sorters` has specific
algorithm (radix sorter) that doesn't depend on the backend.

Removed `constexpr` for default sorters since it's extra there. It's
still fine for radix sorters since the algorithm is fixed and can't
depend on the backend

Signed-off-by: Fedorov, Andrey <andrey.fedorov@intel.com>

---------

Signed-off-by: Fedorov, Andrey <andrey.fedorov@intel.com>
  • Loading branch information
andreyfe1 authored Feb 7, 2024
1 parent 774a662 commit 653af67
Showing 1 changed file with 18 additions and 13 deletions.
31 changes: 18 additions & 13 deletions sycl/doc/extensions/proposed/sycl_ext_oneapi_group_sort.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -514,8 +514,8 @@ namespace sycl::ext::oneapi::experimental {
void operator()(Group g, Ptr first, Ptr last); // (2)
template<typename T>
static constexpr size_t
memory_required(sycl::memory_scope scope, std::size_t range_size); // (3)
static size_t
memory_required(sycl::device d, sycl::memory_scope scope, std::size_t range_size); // (3)
};
template<typename T,
Expand All @@ -534,8 +534,8 @@ namespace sycl::ext::oneapi::experimental {
sycl::span<T, ElementsPerWorkItem> values,
Properties properties); // (6)
static constexpr size_t
memory_required(sycl::memory_scope scope, std::size_t range_size); // (7)
static size_t
memory_required(sycl::device d, sycl::memory_scope scope, std::size_t range_size); // (7)
};
template<typename T,
Expand All @@ -557,8 +557,8 @@ namespace sycl::ext::oneapi::experimental {
sycl::span<U, ElementsPerWorkItem> values,
Properties property); // (10)
static constexpr std::size_t
memory_required(sycl::memory_scope scope, std::size_t range_size); // (11)
static std::size_t
memory_required(sycl::device d, sycl::memory_scope scope, std::size_t range_size); // (11)
};
}
Expand Down Expand Up @@ -671,9 +671,11 @@ the `joint_sort` algorithm.
_Complexity_: Let `N` be `last - first`. `O(N*log(N)*log(N))` comparisons.

(3) Returns size of temporary memory (in bytes) that is required by
the default sorting algorithm defined by the sorter calling by `joint_sort`.
the default sorting algorithm defined by the sorter calling by `joint_sort`
depending on `d`.
`range_size` represents a range size for sorting,
e.g. `last-first` from `operator()` arguments.
It mustn't be called within a SYCL kernel, only on host.
Result depends on the `scope` parameter:
use `sycl::memory_scope::work_group` to get memory size required
for each work-group;
Expand All @@ -694,9 +696,11 @@ _Complexity_: Let `N` be the `Group` size multiplied by `ElementsPerWorkItem`.
`O(N*log(N)*log(N))` comparisons.

(7) Returns the size of temporary memory (in bytes) that is required by the default
sorting algorithm defined by the sorter calling by `sort_over_group`.
sorting algorithm defined by the sorter calling by `sort_over_group`
depending on `d`.
`ElementsPerWorkItem` is the extent parameter for `sycl::span`
that is an input parameter for `sort_over_group`.
It mustn't be called within a SYCL kernel, only on host.
If `scope == sycl::memory_scope::work_group`,
`range_size` is the size of the local range for `sycl::nd_range`
that was used to run the kernel;
Expand All @@ -719,7 +723,9 @@ _Complexity_: Let `N` be the `Group` size multiplied by `ElementsPerWorkItem`.

(11) Returns size of temporary memory (in bytes) that is required by
the default key-value
sorting algorithm defined by the sorter calling by `sort_key_value_over_group`.
sorting algorithm defined by the sorter calling by `sort_key_value_over_group`
depending on `d`.
It mustn't be called within a SYCL kernel, only on host.
If `scope == sycl::memory_scope::work_group`,
`range_size` is the size of the local range for `sycl::nd_range`
that was used to run the kernel;
Expand Down Expand Up @@ -998,7 +1004,7 @@ namespace my_sycl = sycl::ext::oneapi::experimental;
// calculate required local memory size
size_t temp_memory_size =
my_sycl::default_sorters::joint_sorter<>::memory_required<T>(
sycl::memory_scope::work_group, n);
d, sycl::memory_scope::work_group, n);
q.submit([&](sycl::handler& h) {
auto acc = sycl::accessor(buf, h);
Expand Down Expand Up @@ -1075,7 +1081,7 @@ using TupleType =
// calculate required local memory size
size_t temp_memory_size =
my_sycl::default_sorters::joint_sorter<>::memory_required<TupleType>(
sycl::memory_scope::work_group, n);
d, sycl::memory_scope::work_group, n);
q.submit([&](sycl::handler& h) {
auto keys_acc = sycl::accessor(keys_buf, h);
Expand Down Expand Up @@ -1185,8 +1191,6 @@ because it's easy to pass different comparator types.
. Think about reducing overloads for sorting functions. The thing is that
overloads with `Compare` objects seems extra and overloads with sorters,
without sorters are enough.
. It would be better if `memory_required` methods had a `sycl::device` parameter
because different devices can require different amount of memory.

== Non-implemented features
Please, note that following is not inplemented yet for the open-source repo:
Expand All @@ -1206,4 +1210,5 @@ Please, note that following is not inplemented yet for the open-source repo:
|3|2021-12-16|Andrey Fedorov|Some refactoring, sections reordering,
making the entire extension experimental
|4|2022-11-14|Andrey Fedorov|Fixed size arrays, key-value sorting and properties
|5|2023-11-09|Andrey Fedorov|Changed `memory_required` functions for default sorters
|========================================

0 comments on commit 653af67

Please sign in to comment.