-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Crypto] improve shuffle test #4844
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #4844 +/- ##
==========================================
- Coverage 55.76% 55.76% -0.01%
==========================================
Files 955 955
Lines 88867 88867
==========================================
- Hits 49555 49554 -1
- Misses 35574 35578 +4
+ Partials 3738 3735 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
// input `perm` is assumed to be a correct permutation of the set [0,n-1] | ||
// (not checked in this function). | ||
func EncodePermutation(perm []int) int { | ||
r := make([]int, len(perm)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r := make([]int, len(perm)) | |
// We first construct r, which contains elements with the following constraints: | |
// r = { r(0) ∈ [0,n-1], r(1) ∈ [0,n-2], ... , r(n-1) ∈ [0,1], r(n) ∈ [0] } | |
r := make([]int, len(perm)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggested comment is the start of explaining what a Lehmer code is. However it doesn't fully explain how r
is obtained, or why there is a bijection between all permutations and the set [0,n-1] x [0,n-2] x ... x [0,1] x [0]
, so IMO it doesn't add much information.
In the original comment I only mentioned "compute Lehmer code", maybe I can add a link to what Lehmer code is (that contains all the details).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
} | ||
// Convert to an integer following the factorial number system | ||
m := 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
m := 0 | |
// We construct `m` by taking the sum of all elements of `r`, each multiplied by the factorial of their inverted index: | |
// m = ∑ r(i) * (n-i)!, i ∈ [0,n-1] | |
m := 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is more explanatory but it doesn't explain why it makes sense to compute m
(why there is a bijection between [0,n-1] x [0,n-2] x ... x [0,1] x [0]
and `[0, n! -1] ).
I suggest to add a link to the full explanation instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BasicDistributionTest(t, factN, 1, permEncoding) | ||
}) | ||
|
||
t.Run("shuffle a same permutation", func(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we have this test, which looks identical to the previous one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- the first test shuffles an array multiple times. The resulting array from the previous shuffle is shuffled again. The distribution is computed based on all intermediate arrays from each loop.
- the second test shuffles the same array [0,1, .. , n-1] multiple times. The resulting array after each loop is used for the distribution, but it is then ignored. The next loop shuffle is computed using the same starting array.
The shuffling should be uniform in both scenarios. Some buggy implementations may be uniform in one scenario and not the other.
Improve a statistical test that evaluates the uniformity of a random shuffling.
The previous and new versions are both sanity checks of uniformity that can detect bugs in the implementation, but are not extended statistical tests.
The previous version chooses a random element of the original array, and tracks the distribution of indices where this element falls in the shuffled array. If the shuffling is uniform, the distribution of indices is uniform too. Uniformity is simply evaluated by comparing the standard deviation to a low threshold.
The new version uses a known bijection between the set of all permutations (shuffles) and a set of integers, using the Lehmer code the factorial number radix. If the shuffling is uniform, the distribution of the permutation encodings is uniform too. Uniformity is still evaluated using the same basic standard deviation.
The new version evaluates the permutation as a whole instead of isolating one element only.
The drawback is that the evaluated distribution in the new version is of size
n!
(compared ton
in the previous version), so the tested shuffled array has to remain small.Side change:
Export the
EncodePermutation
function so that other permutation tests can use the tool.