Add permutations and permutations_sized adaptors #230

nwad123 · 2025-01-26T23:27:10Z

This PR adds the permutations and permutations_sized adaptors, as described in #138.

The permutations adaptor takes in any input sequence and produces an ouput
of all possible permutations of the input. If the input sequence has a length of $l$,
the output sequence is length $l!$. The output permutations are in lexographical
order according to the original order of the input. If the original input is in order,
the output permutations will also be in order.

auto x = flux::ints()         // 0, 1, 2, 3, ...
              .take(3)        // 0, 1, 2
              .permutations() // [0, 1, 2], [0, 2, 1],
                              // [1, 0, 2], [1, 2, 0],
                              // [2, 0, 1], [2, 1, 0]
              // ...

The permutations_sized adaptor differs in operation in that it is parameterized
with a additional input -- SubsequenceSize -- that defines how long each output
permutation should be. If the length of the input sequence is $l$ and the value of
SubsequenceSize is represented by $r$ then the length of the output is calculated
by:

$$ \frac{l!}{(l-r)!} $$

auto x = flux::ints()                  // 0, 1, 2, 3, ...
              .take(3)                 // 0, 1, 2
              .permutations_sized<2>() // [0, 1], [0, 2],
                                       // [1, 0], [1, 2],
                                       // [2, 0], [2, 1]
              // ...

The implementation of both adaptors is spread across three files:

permutations_base.cpp: Contains shared functionality between
the two types of permutations adaptors.
permutations.cpp: The permutations adaptor.
permutations_sized.cpp: The sized permutations adaptor.

The implementations are based off the implementation of both
Rust and
Python's itertools
permutations adaptors.

The operations of both adaptors is similar. First, they cache the input sequence so that the values can
be repeatedly copied into each output element. Then, each cursor contains a vector, indices containing
the permuted indices of the input vector. These indices are used along with the cached input sequence
to generate the permutation at each element (see example below). When read_at is called, the cached
input sequence is copied into the output vector using the indices stored in the cursor for each value.
When a cursor is advanced, the next permutations of the indices is generated, and then when read_at
is called again, the next permutation of the input is returned.

Here's an example demonstrating the steps. Note that this example is slight pseudocode in order to
better show the steps that are happening.

// Sequence 
auto permuted_sequence = flux::ref({2, 1, 3}).permutations();

// Cursor 
auto c = flux::first(permuted_sequence);

assert(c.indices == [0, 1, 2]);
assert(permuted_sequence.cache == [2, 1, 3]);

// Read
auto x = flux::read_at(permuted_sequence, c);
         // index into the cached values using the cursor indices:
         // x[0] == permuted_sequence.cache[c.indices[0]];
         // x[1] == permuted_sequence.cache[c.indices[1]];
         // x[2] == permuted_sequence.cache[c.indices[2]];

assert(x == [2, 1, 3]);

// Advance
flux::inc(permuted_sequence, c);
      // c.indices = std::next_permutation(c.indices)

assert(c.indices = [0, 2, 1]);

// Read again 
auto y = flux::read_at(permuted_sequence, c);
         // y[0] == permuted_sequence.cache[c.indices[0]];
         // y[1] == permuted_sequence.cache[c.indices[1]];
         // y[2] == permuted_sequence.cache[c.indices[2]];

assert(y == [2, 3, 1]);

I'm marking this as a draft because I'm looking for feedback on several design decisions I made. Once
these are answered I can finish up the documentation, testing, and implementation of these adaptors and
hopefully have them merged in. Also, there are optimizations that could be implemented for different
types of input sequences that I plan on adding over time. Here are the specific areas of feedback I'm
looking for:

Does the decision to split the functionality of the permutations and permutations_sized into
two adaptors make sense?
Are vectors a good element type for the output? I've thought about using an array if we
know the size of the input at compile time, but I started with a vector because it's more flexible to
use. Also, if we have long input sequences, I wasn't sure if it would be good to use a stack-based
element type for the output.
Are the cursors too expensive to create to justify satisfying flux::regular_cursor for the cursors?
The documentation says that "cursors may be copied within algorithm implementations, so copying should
be 'cheap'" (flux::regular_cursor docs).

I'm also happy to hear any other feedback that you have! I'm still pretty new to contributing to C++
projects, but I'm enjoying what I'm learning. Also, December and January were busy with other tasks, but
I should be able to contribute more time to these adaptors for the next several weeks.

laid out basics of permutation adaptor, still researching impl strategies
Renamed branch to follow other branch names
Removed unnecessary comments
Added permutations adaptor and started tests
Added string-based permutations comparison test
Fixed string based comparison test
Removed unused imports
Added sized permutations adaptor
Fixed sized template parameters on sized permutations
Added sized permutations test
Renamed 'sized_permutations' to 'permutations_sized'
Added outline for documentation
Renamed and refactored for first draft

…gies

codecov · 2025-01-27T00:22:20Z

Codecov Report

Attention: Patch coverage is 96.19048% with 4 lines in your changes missing coverage. Please review.

Project coverage is 98.20%. Comparing base (f539c11) to head (0b65e24).
Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
include/flux/adaptor/permutations_sized.hpp	95.91%	2 Missing ⚠️
include/flux/adaptor/permutations.hpp	97.67%	1 Missing ⚠️
include/flux/adaptor/permutations_base.hpp	92.30%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #230      +/-   ##
==========================================
- Coverage   98.60%   98.20%   -0.40%     
==========================================
  Files          69       72       +3     
  Lines        2578     2737     +159     
==========================================
+ Hits         2542     2688     +146     
- Misses         36       49      +13

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

tcbrindle · 2025-01-27T13:07:08Z

Hi @nwad123, thanks so much for working on this!

I haven't looked too much at the code yet as you've said it's still a draft, but please let me know if you have any questions about how things work in Flux :)

Regarding the design of the adaptors, I think it would be better to focus only on fixed-size pemutations<N> for now. The "full length" version would need to perform allocations in various places (both in the cursor ops and on each call to read_at) which makes me a bit worried about hidden performance traps -- we don't do any allocations in any other adaptors in the library. People have also noted performance problems with the Rust version.

For the fixed-size version, because we know a priori how many terms we're dealing with we can use a stack array for storage, as we do with the cartesian_power<N> adaptor (in fact, permutations<N> is a subset of this). I appreciate the concerns about stack size, but in general N isn't likely to be all that large, and cursors should be small objects anyway, so storing them in an array should be fine.

To answer your design questions specifically:

Does the decision to split the functionality of the permutations and permutations_sized into
two adaptors make sense?

Yes, but I'd recommend focusing on the sized variant only for now.

Are vectors a good element type for the output?

No :)

I've thought about using an array if we know the size of the input at compile time, but I started with a vector because it's more flexible to use. Also, if we have long input sequences, I wasn't sure if it would be good to use a stack-based
element type for the output.

Vector has two problems: firstly the allocations we've discussed, and secondly it cannot hold references, so you'd need to copy each input element into the vector for each permutation which is a lot of copying! So I'd rather not do that. Unfortunately arrays can't hold references either, so they're out too.

I think it would be best if permutations<N>::read_at() returned an N-tuple std::tuple<element_t<S>, element_t<S>, ...>, which is what the cartesian_power<N> adaptor yields. (There's some code in cartesian_base.cpp which will give you an alias to such a tuple type). This will correctly handle references and object types, at the cost of being a bit unwieldy to program with in C++.

Are the cursors too expensive to create to justify satisfying flux::regular_cursor for the cursors?
The documentation says that "cursors may be copied within algorithm implementations, so copying should
be 'cheap'" (flux::regular_cursor docs).

I would say that we definitely don't want a cursor which holds a vector, but a cursor that holds a fixed-size array of cursors is fine (we do it in several other places in the library).

Regarding caching the input data: if we require that the input is a multipass sequence (which is reasonable, as we're going to be performing multiple passes over the elements!) then I don't think we'd need to take an initial copy of the input?

I hope this helps! I have to say you've chosen a bit of a monster as your first Flux adaptor :) But I'm happy to help as much as I can.

nwad123 · 2025-01-28T20:28:04Z

Thanks! This helps greatly. I'll let you know how working on this goes. Haha, in regards to this being a tricky adaptor, I am realizing that more and more as I work on it, but it's great fun.

nwad123 and others added 15 commits November 26, 2024 23:04

laid out basics of permutation adaptor, still researching impl strate…

94fa017

…gies

Renamed branch to follow other branch names

a2db007

Merge branch 'tcbrindle:main' into pr/permutations

2bebc2a

Removed unnecessary comments

ecdb648

Added permutations adaptor and started tests

0ecd86e

Added string-based permutations comparison test

ed291fe

Fixed string based comparison test

81472e2

Removed unused imports

6da6562

Added sized permutations adaptor

3483301

Fixed sized template parameters on sized permutations

d877876

Added sized permutations test

5ec9003

Renamed 'sized_permutations' to 'permutations_sized'

349f299

Added outline for documentation

477d9b7

Renamed and refactored for first draft

62dfd56

Fixed braced-initializer cursor initialization

0b65e24

nwad123 added 4 commits January 28, 2025 13:31

Removed dynamically sized permutations adaptor

21dbe3f

Renamed permutations_sized to permutations

c908145

Added todo notes for next steps

ffcc80f

Removed code using std::vector

359b5bc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add permutations and permutations_sized adaptors #230

Add permutations and permutations_sized adaptors #230

nwad123 commented Jan 26, 2025

codecov bot commented Jan 27, 2025

tcbrindle commented Jan 27, 2025 •

edited

Loading

nwad123 commented Jan 28, 2025

Add permutations and permutations_sized adaptors #230

Are you sure you want to change the base?

Add permutations and permutations_sized adaptors #230

Conversation

nwad123 commented Jan 26, 2025

codecov bot commented Jan 27, 2025

Codecov Report

tcbrindle commented Jan 27, 2025 • edited Loading

nwad123 commented Jan 28, 2025

tcbrindle commented Jan 27, 2025 •

edited

Loading