-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add permutations and permutations_sized adaptors #230
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #230 +/- ##
==========================================
- Coverage 98.60% 98.20% -0.40%
==========================================
Files 69 72 +3
Lines 2578 2737 +159
==========================================
+ Hits 2542 2688 +146
- Misses 36 49 +13 ☔ View full report in Codecov by Sentry. |
Hi @nwad123, thanks so much for working on this! I haven't looked too much at the code yet as you've said it's still a draft, but please let me know if you have any questions about how things work in Flux :) Regarding the design of the adaptors, I think it would be better to focus only on fixed-size For the fixed-size version, because we know a priori how many terms we're dealing with we can use a stack array for storage, as we do with the To answer your design questions specifically:
Yes, but I'd recommend focusing on the sized variant only for now.
No :)
Vector has two problems: firstly the allocations we've discussed, and secondly it cannot hold references, so you'd need to copy each input element into the vector for each permutation which is a lot of copying! So I'd rather not do that. Unfortunately arrays can't hold references either, so they're out too. I think it would be best if
I would say that we definitely don't want a cursor which holds a vector, but a cursor that holds a fixed-size array of cursors is fine (we do it in several other places in the library). Regarding caching the input data: if we require that the input is a multipass sequence (which is reasonable, as we're going to be performing multiple passes over the elements!) then I don't think we'd need to take an initial copy of the input? I hope this helps! I have to say you've chosen a bit of a monster as your first Flux adaptor :) But I'm happy to help as much as I can. |
Thanks! This helps greatly. I'll let you know how working on this goes. Haha, in regards to this being a tricky adaptor, I am realizing that more and more as I work on it, but it's great fun. |
This PR adds the
permutations
andpermutations_sized
adaptors, as described in #138.The$l$ ,$l!$ . The output permutations are in lexographical
permutations
adaptor takes in any input sequence and produces an ouputof all possible permutations of the input. If the input sequence has a length of
the output sequence is length
order according to the original order of the input. If the original input is in order,
the output permutations will also be in order.
The$l$ and the value of
$r$ then the length of the output is calculated
permutations_sized
adaptor differs in operation in that it is parameterizedwith a additional input --
SubsequenceSize
-- that defines how long each outputpermutation should be. If the length of the input sequence is
SubsequenceSize
is represented byby:
The implementation of both adaptors is spread across three files:
permutations_base.cpp: Contains shared functionality between
the two types of permutations adaptors.
permutations.cpp: The permutations adaptor.
permutations_sized.cpp: The sized permutations adaptor.
The implementations are based off the implementation of both
Rust and
Python's itertools
permutations
adaptors.The operations of both adaptors is similar. First, they cache the input sequence so that the values can
be repeatedly copied into each output element. Then, each cursor contains a vector,
indices
containingthe permuted indices of the input vector. These indices are used along with the cached input sequence
to generate the permutation at each element (see example below). When
read_at
is called, the cachedinput sequence is copied into the output vector using the indices stored in the cursor for each value.
When a cursor is advanced, the next permutations of the
indices
is generated, and then whenread_at
is called again, the next permutation of the input is returned.
Here's an example demonstrating the steps. Note that this example is slight pseudocode in order to
better show the steps that are happening.
I'm marking this as a draft because I'm looking for feedback on several design decisions I made. Once
these are answered I can finish up the documentation, testing, and implementation of these adaptors and
hopefully have them merged in. Also, there are optimizations that could be implemented for different
types of input sequences that I plan on adding over time. Here are the specific areas of feedback I'm
looking for:
Does the decision to split the functionality of the
permutations
andpermutations_sized
intotwo adaptors make sense?
Are vectors a good element type for the output? I've thought about using an array if we
know the size of the input at compile time, but I started with a vector because it's more flexible to
use. Also, if we have long input sequences, I wasn't sure if it would be good to use a stack-based
element type for the output.
Are the cursors too expensive to create to justify satisfying
flux::regular_cursor
for the cursors?The documentation says that "cursors may be copied within algorithm implementations, so copying should
be 'cheap'" (
flux::regular_cursor
docs).I'm also happy to hear any other feedback that you have! I'm still pretty new to contributing to C++
projects, but I'm enjoying what I'm learning. Also, December and January were busy with other tasks, but
I should be able to contribute more time to these adaptors for the next several weeks.