Reduction functors (`cuco::static_reduction_map` refactoring 1/N) #187

sleeepyjack · 2022-07-11T13:23:02Z

This PR is part 1/N of the refactoring effort for PR #98

New design for reduction functors that can be used by `cuco::static_reduction_map`.

Implements the following ideas from @jrhemstad (link):

Here's what I was thinking. A person has 3 options for the ReductionOp

Use one of the provided cuco::reduce_* types.

No additional work should be required. Partial specialization could/should remove the ReductionOp argument from the constructor

Provide a unsynchronized binary callable T F(T, T) and Identity value

This needs to be wrapped by custom_op to apply F in a CAS loop

Ideally we could detect this kind of callable and implicitly wrap it in custom_op

Provide a synchronized binary callable T F(atomic_ref<T, Scope>, T) and Identity value

User responsible for correct synchronization through atomic_ref

Examples:
// 1.
// no need to provide `reduce_add{}` 
// No need to provide identity value
cuco::static_reduction_map<cuco::reduce_add<int>, int, int> add_map{capacity, empty_key, alloc}; 

// 2. Unsynchronized binary callable must be wrapped in `custom_op`
struct unsync_add{ 
   int identity = 0; // Must provide identity value
   int operator()(int a, int b){ return a + b; }
};

// internally should wrap `unsync_add` in `custom_op`
cuco::static_reduction_map<unsync_add, int, int> custom_unsync_add_map(capacity, empty_key, unsync_add{}, alloc);

// 3.
stuct sync_add{
   int identity = 0; // Must provide identity value
   template <thread_scope Scope>
   int operator()(atomic_ref<int, Scope> a, int b){ return a.fetch_add(b, memory_order_relaxed); }
};

cuco::static_reduction_map<sync_add, int, int> custom_sync_add_map(capacity, empty_key, sync_add{}, alloc);

Includes changes from PR #186

tests/static_reduction_map/reduction_functors_test.cu

PointKernel · 2022-07-11T16:33:55Z

@sleeepyjack which version of clang-format are you using locally?

…ed as reduction functors.

sleeepyjack · 2022-07-11T16:37:59Z

@sleeepyjack which version of clang-format are you using locally?

@PointKernel

$ clang-format --version
Ubuntu clang-format version 14.0.0-1ubuntu1

but I don't have pre-commit setup locally since I thought our CI does this task automatically.

PointKernel · 2022-07-11T16:56:03Z

@sleeepyjack which version of clang-format are you using locally?

@PointKernel
$ clang-format --version
Ubuntu clang-format version 14.0.0-1ubuntu1
but I don't have pre-commit setup locally since I thought our CI does this task automatically.

No worries. CI will indeed fix style issues. I'm just curious about which version introduces the formatting difference.

…urn statement in an `if constexpr else` statement.

sleeepyjack · 2022-07-12T13:09:53Z

I marked this PR as a draft since I might need to add some more changes while developing the other parts of the static_reduction_map. It's good to go for review though.

include/cuco/reduction_functors.cuh

tests/static_reduction_map/reduction_functors_test.cu

include/cuco/reduction_functors.cuh

include/cuco/detail/reduction_functor_impl.cuh

include/cuco/reduction_functors.cuh

sleeepyjack · 2022-07-12T15:25:10Z

Thanks @PointKernel for the review! I have incorporated your suggestions into the newest commits.

PointKernel · 2022-07-12T22:14:30Z

rerun tests

…ect.

sleeepyjack · 2022-08-01T10:08:34Z

include/cuco/reduction_functors.cuh

+  static constexpr bool atomic_const_invocable_ =
+    cuda::std::is_invocable_r_v<value_type,
+                                Func,
+                                cuda::atomic<value_type, cuda::thread_scope_system> const&,
+                                value_type> ||
+    cuda::std::is_invocable_r_v<value_type,
+                                Func,
+                                cuda::atomic<value_type, cuda::thread_scope_device> const&,
+                                value_type> ||
+    cuda::std::is_invocable_r_v<value_type,
+                                Func,
+                                cuda::atomic<value_type, cuda::thread_scope_block> const&,
+                                value_type> ||
+    cuda::std::is_invocable_r_v<value_type,
+                                Func,
+                                cuda::atomic<value_type, cuda::thread_scope_thread> const&,
+                                value_type>;


Listing all possible values for cuda::thread_scope is very ugly. Does anyone have a better solution?

sleeepyjack · 2023-04-06T02:07:08Z

Closing this for now as the design will change a lot with the new refactoring

sleeepyjack added 2 commits July 11, 2022 12:46

Update Catch2 to v2.13.9.

2998d9f

Added reduction functors that can be used by static_reduction_map.

2a8a50f

sleeepyjack force-pushed the feature/reduction_functors branch from ef15401 to 2a8a50f Compare July 11, 2022 13:43

[pre-commit.ci] auto code formatting

e4adb05

sleeepyjack commented Jul 11, 2022

View reviewed changes

tests/static_reduction_map/reduction_functors_test.cu Outdated Show resolved Hide resolved

sleeepyjack added 2 commits July 11, 2022 16:34

Prevent extended __host__ __device__/__device__ lambdas from being us…

bb78c49

…ed as reduction functors.

Remove spurious include.

905edd8

sleeepyjack force-pushed the feature/reduction_functors branch from 16e4ed2 to 905edd8 Compare July 11, 2022 16:38

[pre-commit.ci] auto code formatting

7d3aff3

PointKernel added the type: feature request New feature request label Jul 11, 2022

Workaround for compiler warning where nvcc is not able to see the ret…

9cbe890

…urn statement in an `if constexpr else` statement.

sleeepyjack force-pushed the feature/reduction_functors branch from 31a85e6 to 9cbe890 Compare July 12, 2022 10:13

sleeepyjack marked this pull request as draft July 12, 2022 13:06

PointKernel reviewed Jul 12, 2022

View reviewed changes

PointKernel added the Needs Review Awaiting reviews before merging label Jul 12, 2022

sleeepyjack added 3 commits July 12, 2022 15:14

Fix includes und use type trait aliases.

cef8906

Reduction ops should use relaxed memory order.

80c1544

Make identity_value ctor explicit.

6fc5ff2

Internally-synced reduction functors cannot have immutable target obj…

99945f4

…ect.

sleeepyjack commented Aug 1, 2022

View reviewed changes

Use human-readable boolean operators.

4435d05

sleeepyjack force-pushed the feature/reduction_functors branch from 60d0aab to 4435d05 Compare August 1, 2022 10:10

sleeepyjack closed this Apr 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduction functors (`cuco::static_reduction_map` refactoring 1/N) #187

Reduction functors (`cuco::static_reduction_map` refactoring 1/N) #187

sleeepyjack commented Jul 11, 2022

PointKernel commented Jul 11, 2022

sleeepyjack commented Jul 11, 2022 •

edited

Loading

PointKernel commented Jul 11, 2022

sleeepyjack commented Jul 12, 2022

sleeepyjack commented Jul 12, 2022

PointKernel commented Jul 12, 2022

sleeepyjack Aug 1, 2022

sleeepyjack commented Apr 6, 2023

Reduction functors (cuco::static_reduction_map refactoring 1/N) #187

Reduction functors (cuco::static_reduction_map refactoring 1/N) #187

Conversation

sleeepyjack commented Jul 11, 2022

This PR is part 1/N of the refactoring effort for PR #98

New design for reduction functors that can be used by cuco::static_reduction_map.

PointKernel commented Jul 11, 2022

sleeepyjack commented Jul 11, 2022 • edited Loading

PointKernel commented Jul 11, 2022

sleeepyjack commented Jul 12, 2022

sleeepyjack commented Jul 12, 2022

PointKernel commented Jul 12, 2022

sleeepyjack Aug 1, 2022

Choose a reason for hiding this comment

sleeepyjack commented Apr 6, 2023

Reduction functors (`cuco::static_reduction_map` refactoring 1/N) #187

Reduction functors (`cuco::static_reduction_map` refactoring 1/N) #187

New design for reduction functors that can be used by `cuco::static_reduction_map`.

sleeepyjack commented Jul 11, 2022 •

edited

Loading