-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement internal iteration for cartesian_product
#92
Implement internal iteration for cartesian_product
#92
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #92 +/- ##
==========================================
- Coverage 97.67% 97.58% -0.09%
==========================================
Files 66 66
Lines 2236 2278 +42
==========================================
+ Hits 2184 2223 +39
- Misses 52 55 +3
☔ View full report in Codecov by Sentry. |
Thanks very much for the drive-by fixes, they look great! In (This happens to work for the particular case of a cartesian product of iotas, because for an iota sequence the element type and the cursor type happen to be the same.) Other than that I think this looks like a reasonable approach -- I wish the |
Ah-ha, I think I suffered from a misunderstanding here. I believe you are correct. I'm surprised the tests didn't uncover this, but I just realized we don't have any tests that actually call |
I was a little bit concerned that the control flow of this would cause issues, but it seems to be fine. The one idea I had for working around this was to not use |
Ironically, the |
I'm having trouble writing a test that covers this issue. Any ideas? EDIT: All the |
|
I think what you want is some sequences where the element type and cursor type are distinct and can't be converted to one another? What about something like: auto seq1 = flux::ints(1).take(3);
auto seq2 = std::array<std::string_view, 2>{"aaa", "bbb"};
auto cart = flux::cartesian_product(seq1, seq2);
auto cur = flux::find_if(cart, flux::unpack([](auto i, auto s) { return i == 2 && s == "bbb"; }));
STATIC_CHECK(std::get<0>(cart[cur]) == 2);
STATIC_CHECK(std::get<1>(cart[cur]) == "bbb"); |
02e73b3
to
d1e7921
Compare
f9fb22c
to
0d3116f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numbers on my laptop:
With Clang 16:
relative | ns/op | op/s | err% | total | benchmark |
---|---|---|---|---|---|
100.0% | 637,413.32 | 1,568.84 | 5.8% | 0.30 | 〰️ memset_2d_reference (Unstable with ~43.5 iters. Increase minEpochIterations to e.g. 435) |
16.1% | 3,953,071.73 | 252.97 | 0.8% | 1.89 | memset_2d_std_cartesian_product_iota |
117.2% | 543,979.24 | 1,838.31 | 1.0% | 0.26 | memset_2d_flux_cartesian_product_iota |
100.0% | 434,128.10 | 2,303.47 | 2.2% | 0.21 | memset_diagonal_2d_reference |
16.3% | 2,659,737.14 | 375.98 | 0.3% | 1.28 | memset_diagonal_2d_std_cartesian_product_iota_filter |
100.3% | 432,654.76 | 2,311.31 | 1.9% | 0.21 | memset_diagonal_2d_flux_cartesian_product_iota_filter |
With GCC 13.1:
relative | ns/op | op/s | err% | total | benchmark |
---|---|---|---|---|---|
100.0% | 910,021.74 | 1,098.87 | 5.9% | 0.44 | 〰️ memset_2d_reference (Unstable with ~43.5 iters. Increase minEpochIterations to e.g. 435) |
100.6% | 904,318.18 | 1,105.81 | 3.2% | 0.44 | memset_2d_std_cartesian_product_iota |
105.0% | 866,658.54 | 1,153.86 | 1.7% | 0.41 | memset_2d_flux_cartesian_product_iota |
100.0% | 585,775.00 | 1,707.14 | 1.7% | 0.28 | memset_diagonal_2d_reference |
26.1% | 2,246,652.17 | 445.11 | 0.5% | 1.07 | memset_diagonal_2d_std_cartesian_product_iota_filter |
34.0% | 1,723,046.51 | 580.37 | 1.0% | 0.83 | memset_diagonal_2d_flux_cartesian_product_iota_filter |
It looks like GCC isn't able to vectorise the filter case, but Clang is doing great.
compile-time reporting of test failures.
… from stdin and will hang.
…access `inc` to fix bugs.
`distance_t` instead of `long`, as `distance_t` isn't always `long` (for example on MSVC).
…s that a tuple of elements not a tuple of cursors is passed to the `for_each` operation.
…nstead of a tuple of cursors to the predicate.
* Change `cartesian_product::ra_inc` to not treat cursors as numeric types. * Add new `num::checked_(div|mod)` and associated divide-by-zero policies.
… builds the tuple in one pass, as suggested by @brevzin.
* Add 2D diagonal memset benchmark, which evaluates the performance of the composition of `cartesian_product` of `iota`s and `filter`ing. * Add a reference implementation of C++23's `cartesian_product`. * Add comparisons against C++ Standard Library ranges to all of the memset benchmarks.
…lementation details to avoid confusing with `flux::drop`.
Co-authored-by: Tristan Brindle <t.c.brindle@gmail.com>
…or and `keep_going` variable as parameters instead of returning them.
0d3116f
to
abba492
Compare
This PR implements internal iteration for
cartesian_product[_with]
, which enables compiler vectorization of multidimensional iteration patterns constructed withcartesian_product[_with]
(such as this one).Core tasks:
cartesian_product
.cartesian_product
internal iteration to invoke the predicate with a tuple of elements, not a tuple of indices (review feedback).cartesian_product_with
.cartesian_product[_with]
random access indexing.cartesian_product[_with]
in general.cartesian_product
with 3 underlying sequences.cartesian_product
with less than 2 and more than 3 underlying sequences.cartesian_product
random access indexing.cartesian_product
for_each_while
short circuiting (review feedback).cartesian_product[_with]
withfor_each[_while]
.cartesian_product
ofints
to fix MSVC build (review feedback).cartesian_product
andiota
.cartesian_product
ofiota
multi-dimensional iteration pattern.STATIC_CHECK
to throw instead of returning to improve compile-time diagnostics.word_count
example CTest as it will always hang waiting for stdin.