Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[group] Refine scan_over_group for sub-group #839

Merged
merged 5 commits into from
Dec 1, 2023

Conversation

Nuullll
Copy link
Contributor

@Nuullll Nuullll commented Nov 24, 2023

Use global linear id to index input data. Store both the sub-group id and item linear id within the sub-group to recover sub-group construction when verifying, so that we don't make any assumption on the implementation-defined sub-group partitioning and ordering.

Should use `get_max_local_range` to get the maximum sub-group size for
the executing kernel.

Signed-off-by: Yilong Guo <yilong.guo@intel.com>
@Nuullll Nuullll requested a review from a team as a code owner November 24, 2023 08:20
@Nuullll Nuullll marked this pull request as draft November 24, 2023 08:21
Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks right. It is easy to forget not all the groups have the same size.

tests/group_functions/group_scan.h Outdated Show resolved Hide resolved
@Nuullll
Copy link
Contributor Author

Nuullll commented Nov 24, 2023

This looks right. It is easy to forget not all the groups have the same size.

Thanks for reviewing. Please allow me to finish some extra testings.

@Nuullll
Copy link
Contributor Author

Nuullll commented Nov 24, 2023

Neither sub_group.get_local_linear_range() nor sub_group.get_max_local_range() is reliable:

  1. Not all the sub-groups are guaranteed to have the same size, so we shouldn't use get_local_linear_range().
  2. Considering the 2-dimensional NDRange (3, 4) with reqd_sub_group_size=16, then the implementation may construct 3 sub-groups (in the highest dimension) where get_max_local_range() == 16 for all the sub-groups. It's a little counter-intuitive, but as noted in OpenCL C Spec - cl_intel_required_subgroup_size:

The optional attribute((intel_reqd_sub_group_size())) can be used to indicate that the kernel must be compiled and executed with the specified sub-group size. When this attribute is present, get_max_sub_group_size() is guaranteed to return the specified integer value. This is important for the correctness of many sub-group algorithms, and in some cases may be used by the compiler to generate more optimal code.

So using the global linear id directly is probably safer.

@Nuullll Nuullll marked this pull request as ready for review November 24, 2023 09:30
Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking this to avoid regression.

@bader bader added the help wanted Extra attention is needed label Nov 25, 2023
@Nuullll Nuullll marked this pull request as draft November 29, 2023 03:44
Reorder ref_input according to the actual sub-group partitioning and
ordering.
@Nuullll Nuullll changed the title [group] Fix subgroup index calculation [group] Refine scan_over_group for sub-group Nov 29, 2023
@Nuullll Nuullll marked this pull request as ready for review November 29, 2023 08:11
Copy link
Contributor

@steffenlarsen steffenlarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @Nuullll ! LGTM!

tests/group_functions/group_scan.h Outdated Show resolved Hide resolved
@Nuullll
Copy link
Contributor Author

Nuullll commented Dec 1, 2023

Any further comments? Can we merge this?

Copy link
Member

@keryell keryell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@bader bader merged commit 4993dc6 into KhronosGroup:SYCL-2020 Dec 1, 2023
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants