Implement internal iteration for `cartesian_product` #92

brycelelbach · 2023-07-11T22:38:57Z

This PR implements internal iteration for cartesian_product[_with], which enables compiler vectorization of multidimensional iteration patterns constructed with cartesian_product[_with] (such as this one).

Core tasks:

codecov · 2023-07-11T22:51:22Z

Codecov Report

Patch coverage: 88.63% and project coverage change: -0.09% ⚠️

Comparison is base (1864f09) 97.67% compared to head (0d3116f) 97.58%.
Report is 3 commits behind head on main.

❗ Current head 0d3116f differs from pull request most recent head abba492. Consider uploading reports for the commit abba492 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #92      +/-   ##
==========================================
- Coverage   97.67%   97.58%   -0.09%     
==========================================
  Files          66       66              
  Lines        2236     2278      +42     
==========================================
+ Hits         2184     2223      +39     
- Misses         52       55       +3

Files Changed	Coverage Δ
include/flux/core/numeric.hpp	`83.33% <66.66%> (-6.67%)`	⬇️
include/flux/op/cartesian_product.hpp	`98.87% <96.87%> (-1.13%)`	⬇️

... and 11 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tcbrindle · 2023-07-12T12:12:23Z

Thanks very much for the drive-by fixes, they look great!

In for_each_while_impl, we need to eventually call the predicate with a tuple of elements, not a tuple of cursors. It looks like you are building up a tuple of elements (despite the param name), but constructing the tuple on line 271 doesn't look right to me -- I think it should be element_t rather than cursor_t?

(This happens to work for the particular case of a cartesian product of iotas, because for an iota sequence the element type and the cursor type happen to be the same.)

Other than that I think this looks like a reasonable approach -- I wish the keep_going variable wasn't necessary at every level, but I can't think of a way to get rid of it, and if it doesn't affect vectorisation then I guess we're okay.

brycelelbach · 2023-07-12T13:31:24Z

In for_each_while_impl, we need to eventually call the predicate with a tuple of elements, not a tuple of cursors. It looks like you are building up a tuple of elements (despite the param name), but constructing the tuple on line 271 doesn't look right to me -- I think it should be element_t rather than cursor_t?

(This happens to work for the particular case of a cartesian product of iotas, because for an iota sequence the element type and the cursor type happen to be the same.)

Ah-ha, I think I suffered from a misunderstanding here. I believe you are correct.

I'm surprised the tests didn't uncover this, but I just realized we don't have any tests that actually call for_each on cartesian_product. I was just testing with my benchmark.

test/test_cartesian_product.cpp

brycelelbach · 2023-07-12T13:43:12Z

Other than that I think this looks like a reasonable approach -- I wish the keep_going variable wasn't necessary at every level, but I can't think of a way to get rid of it, and if it doesn't affect vectorisation then I guess we're okay.

I was a little bit concerned that the control flow of this would cause issues, but it seems to be fine.

The one idea I had for working around this was to not use flux::for_each_while to iterate each of the underlying sequences, however that seems like a gross violation of the basic principles here. If we did that, it would break more deeply nested internal iteration (for example, cartesian_product(filter(...), ...)).

brycelelbach · 2023-07-12T14:20:43Z

I'm surprised the tests didn't uncover this, but I just realized we don't have any tests that actually call for_each on cartesian_product. I was just testing with my benchmark.

Ironically, the unpack test that you added does a for_each on cartesian_product, but it's on ints so it doesn't catch this, and I didn't run it before committing anyways.

brycelelbach · 2023-07-12T15:11:43Z

Ah-ha, I think I suffered from a misunderstanding here. I believe you are correct.

I'm surprised the tests didn't uncover this, but I just realized we don't have any tests that actually call for_each on cartesian_product. I was just testing with my benchmark.

I'm having trouble writing a test that covers this issue. Any ideas?

EDIT: All the cartesian_product tests currently use things that are convertible to the index types, which is probably why I'm not seeing the failures. I can probably engineer something with a mock type.

example/word_count.cpp

include/flux/op/cartesian_product.hpp

brycelelbach · 2023-07-12T15:36:41Z

In for_each_while_impl, we need to eventually call the predicate with a tuple of elements, not a tuple of cursors. It looks like you are building up a tuple of elements (despite the param name), but constructing the tuple on line 271 doesn't look right to me -- I think it should be element_t rather than cursor_t?

(This happens to work for the particular case of a cartesian product of iotas, because for an iota sequence the element type and the cursor type happen to be the same.)

Tests added in a240dcc, fixed in 02e73b3.

tcbrindle · 2023-07-12T15:36:41Z

Ah-ha, I think I suffered from a misunderstanding here. I believe you are correct.
I'm surprised the tests didn't uncover this, but I just realized we don't have any tests that actually call for_each on cartesian_product. I was just testing with my benchmark.

I'm having trouble writing a test that covers this issue. Any ideas?

I think what you want is some sequences where the element type and cursor type are distinct and can't be converted to one another? What about something like:

auto seq1 = flux::ints(1).take(3);
auto seq2 = std::array<std::string_view, 2>{"aaa", "bbb"};

auto cart = flux::cartesian_product(seq1, seq2);

auto cur = flux::find_if(cart, flux::unpack([](auto i, auto s) { return i == 2 && s == "bbb"; }));

STATIC_CHECK(std::get<0>(cart[cur]) == 2);
STATIC_CHECK(std::get<1>(cart[cur]) == "bbb");

test/test_cartesian_product.cpp

benchmark/multidimensional_memset_benchmark_kernels.cpp

include/flux/core/numeric.hpp

include/flux/op/cartesian_product.hpp

test/test_cartesian_product.cpp

benchmark/multidimensional_memset_benchmark_kernels.cpp

tcbrindle

Numbers on my laptop:

With Clang 16:

relative	ns/op	op/s	err%	total	benchmark
100.0%	637,413.32	1,568.84	5.8%	0.30	〰️ `memset_2d_reference` (Unstable with ~43.5 iters. Increase `minEpochIterations` to e.g. 435)
16.1%	3,953,071.73	252.97	0.8%	1.89	`memset_2d_std_cartesian_product_iota`
117.2%	543,979.24	1,838.31	1.0%	0.26	`memset_2d_flux_cartesian_product_iota`
100.0%	434,128.10	2,303.47	2.2%	0.21	`memset_diagonal_2d_reference`
16.3%	2,659,737.14	375.98	0.3%	1.28	`memset_diagonal_2d_std_cartesian_product_iota_filter`
100.3%	432,654.76	2,311.31	1.9%	0.21	`memset_diagonal_2d_flux_cartesian_product_iota_filter`

With GCC 13.1:

relative	ns/op	op/s	err%	total	benchmark
100.0%	910,021.74	1,098.87	5.9%	0.44	〰️ `memset_2d_reference` (Unstable with ~43.5 iters. Increase `minEpochIterations` to e.g. 435)
100.6%	904,318.18	1,105.81	3.2%	0.44	`memset_2d_std_cartesian_product_iota`
105.0%	866,658.54	1,153.86	1.7%	0.41	`memset_2d_flux_cartesian_product_iota`
100.0%	585,775.00	1,707.14	1.7%	0.28	`memset_diagonal_2d_reference`
26.1%	2,246,652.17	445.11	0.5%	1.07	`memset_diagonal_2d_std_cartesian_product_iota_filter`
34.0%	1,723,046.51	580.37	1.0%	0.83	`memset_diagonal_2d_flux_cartesian_product_iota_filter`

It looks like GCC isn't able to vectorise the filter case, but Clang is doing great.

include/flux/op/cartesian_product.hpp

compile-time reporting of test failures.

… from stdin and will hang.

…access `inc` to fix bugs.

`distance_t` instead of `long`, as `distance_t` isn't always `long` (for example on MSVC).

…s that a tuple of elements not a tuple of cursors is passed to the `for_each` operation.

…nstead of a tuple of cursors to the predicate.

* Change `cartesian_product::ra_inc` to not treat cursors as numeric types. * Add new `num::checked_(div|mod)` and associated divide-by-zero policies.

@brevzin

… builds the tuple in one pass, as suggested by @brevzin.

…sts.

* Add 2D diagonal memset benchmark, which evaluates the performance of the composition of `cartesian_product` of `iota`s and `filter`ing. * Add a reference implementation of C++23's `cartesian_product`. * Add comparisons against C++ Standard Library ranges to all of the memset benchmarks.

…lementation details to avoid confusing with `flux::drop`.

Co-authored-by: Tristan Brindle <t.c.brindle@gmail.com>

…or and `keep_going` variable as parameters instead of returning them.

include/flux/op/cartesian_product.hpp

tcbrindle reviewed Jul 12, 2023

View reviewed changes

test/test_cartesian_product.cpp Outdated Show resolved Hide resolved

tcbrindle reviewed Jul 12, 2023

View reviewed changes

example/word_count.cpp Outdated Show resolved Hide resolved

include/flux/op/cartesian_product.hpp Outdated Show resolved Hide resolved

include/flux/op/cartesian_product.hpp Outdated Show resolved Hide resolved

brevzin reviewed Jul 12, 2023

View reviewed changes

include/flux/op/cartesian_product.hpp Outdated Show resolved Hide resolved

include/flux/op/cartesian_product.hpp Outdated Show resolved Hide resolved

brycelelbach force-pushed the cartesian_product_internal_iteration branch from 02e73b3 to d1e7921 Compare July 12, 2023 19:31

brycelelbach commented Jul 12, 2023

View reviewed changes

test/test_cartesian_product.cpp Outdated Show resolved Hide resolved

brycelelbach requested review from tcbrindle and brevzin July 12, 2023 19:34

brycelelbach commented Jul 12, 2023

View reviewed changes

benchmark/multidimensional_memset_benchmark_kernels.cpp Outdated Show resolved Hide resolved

tcbrindle reviewed Jul 12, 2023

View reviewed changes

benchmark/multidimensional_memset_benchmark_kernels.cpp Outdated Show resolved Hide resolved

tcbrindle reviewed Jul 12, 2023

View reviewed changes

brycelelbach requested a review from tcbrindle July 13, 2023 15:46

brycelelbach force-pushed the cartesian_product_internal_iteration branch 2 times, most recently from f9fb22c to 0d3116f Compare July 13, 2023 15:56

tcbrindle reviewed Jul 13, 2023

View reviewed changes

include/flux/op/cartesian_product.hpp Show resolved Hide resolved

include/flux/op/cartesian_product.hpp Outdated Show resolved Hide resolved

tcbrindle mentioned this pull request Jul 14, 2023

Update the internal iteration protocol #100

Closed

28 tasks

brycelelbach added 7 commits July 31, 2023 08:48

Change STATIC_CHECK to throw instead of returning false to improve

fbb25d4

compile-time reporting of test failures.

Don't add the word_count example to CTest, because it expects input…

cc6067a

… from stdin and will hang.

cartesian_product: Implement internal iteration and rewrite random-…

c3d7880

…access `inc` to fix bugs.

Fix element type tests for cartesian_product of ints tests to use

070d607

`distance_t` instead of `long`, as `distance_t` isn't always `long` (for example on MSVC).

Add cartesian_product for_each tests, including one that validate…

be018dd

…s that a tuple of elements not a tuple of cursors is passed to the `for_each` operation.

Fix cartesian_product::for_each_while to pass a tuple of elements i…

ec1831e

…nstead of a tuple of cursors to the predicate.

Add multidimensional memset vectorization benchmark.

53d051e

brycelelbach and others added 9 commits July 31, 2023 08:48

* Change cartesian_product::ra_inc to use checked numeric operations.

8cb17a1

* Change `cartesian_product::ra_inc` to not treat cursors as numeric types. * Add new `num::checked_(div|mod)` and associated divide-by-zero policies.

Add cartesian_product for_each_while short circuiting test.

bce0c2f

Replace cartesian_product_partial_cursor_t with a cleaner form that…

fdc6c43

… builds the tuple in one pass, as suggested by @brevzin.

Update copyright notices.

b54ee9c

Add cartesian_product empty sequence, 1D, and higher dimensional te…

4406237

…sts.

s/drop/cartesian_product_tuple_drop/ in the cartesian_product imp…

fe8ff5d

…lementation details to avoid confusing with `flux::drop`.

Correct element type tests for const 1D cartesian_products.

42381ac

Co-authored-by: Tristan Brindle <t.c.brindle@gmail.com>

Refactor cartesian_product's for_each_while_impl to pass the curs…

abba492

…or and `keep_going` variable as parameters instead of returning them.

brycelelbach force-pushed the cartesian_product_internal_iteration branch from 0d3116f to abba492 Compare July 31, 2023 17:37

brycelelbach requested a review from tcbrindle July 31, 2023 17:38

tcbrindle approved these changes Aug 1, 2023

View reviewed changes

include/flux/op/cartesian_product.hpp Show resolved Hide resolved

include/flux/op/cartesian_product.hpp Show resolved Hide resolved

brycelelbach merged commit d906839 into tcbrindle:main Aug 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement internal iteration for `cartesian_product` #92

Implement internal iteration for `cartesian_product` #92

brycelelbach commented Jul 11, 2023 •

edited

Loading

codecov bot commented Jul 11, 2023 •

edited

Loading

tcbrindle commented Jul 12, 2023

brycelelbach commented Jul 12, 2023

brycelelbach commented Jul 12, 2023

brycelelbach commented Jul 12, 2023

brycelelbach commented Jul 12, 2023 •

edited

Loading

brycelelbach commented Jul 12, 2023

tcbrindle commented Jul 12, 2023

tcbrindle left a comment

Implement internal iteration for cartesian_product #92

Implement internal iteration for cartesian_product #92

Conversation

brycelelbach commented Jul 11, 2023 • edited Loading

codecov bot commented Jul 11, 2023 • edited Loading

Codecov Report

tcbrindle commented Jul 12, 2023

brycelelbach commented Jul 12, 2023

brycelelbach commented Jul 12, 2023

brycelelbach commented Jul 12, 2023

brycelelbach commented Jul 12, 2023 • edited Loading

brycelelbach commented Jul 12, 2023

tcbrindle commented Jul 12, 2023

tcbrindle left a comment

Choose a reason for hiding this comment

Implement internal iteration for `cartesian_product` #92

Implement internal iteration for `cartesian_product` #92

brycelelbach commented Jul 11, 2023 •

edited

Loading

codecov bot commented Jul 11, 2023 •

edited

Loading

brycelelbach commented Jul 12, 2023 •

edited

Loading