Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Crypto] improve shuffle test #4844

Merged
merged 8 commits into from
Nov 8, 2023
Merged

[Crypto] improve shuffle test #4844

merged 8 commits into from
Nov 8, 2023

Conversation

tarakby
Copy link
Contributor

@tarakby tarakby commented Oct 19, 2023

Improve a statistical test that evaluates the uniformity of a random shuffling.
The previous and new versions are both sanity checks of uniformity that can detect bugs in the implementation, but are not extended statistical tests.

The previous version chooses a random element of the original array, and tracks the distribution of indices where this element falls in the shuffled array. If the shuffling is uniform, the distribution of indices is uniform too. Uniformity is simply evaluated by comparing the standard deviation to a low threshold.
The new version uses a known bijection between the set of all permutations (shuffles) and a set of integers, using the Lehmer code the factorial number radix. If the shuffling is uniform, the distribution of the permutation encodings is uniform too. Uniformity is still evaluated using the same basic standard deviation.

The new version evaluates the permutation as a whole instead of isolating one element only.
The drawback is that the evaluated distribution in the new version is of size n! (compared to n in the previous version), so the tested shuffled array has to remain small.

Side change:
Export the EncodePermutation function so that other permutation tests can use the tool.

@codecov-commenter
Copy link

codecov-commenter commented Oct 19, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (6591e54) 55.76% compared to head (9ce23e0) 55.76%.
Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4844      +/-   ##
==========================================
- Coverage   55.76%   55.76%   -0.01%     
==========================================
  Files         955      955              
  Lines       88867    88867              
==========================================
- Hits        49555    49554       -1     
- Misses      35574    35578       +4     
+ Partials     3738     3735       -3     
Flag Coverage Δ
unittests 55.76% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 10 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

// input `perm` is assumed to be a correct permutation of the set [0,n-1]
// (not checked in this function).
func EncodePermutation(perm []int) int {
r := make([]int, len(perm))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
r := make([]int, len(perm))
// We first construct r, which contains elements with the following constraints:
// r = { r(0) ∈ [0,n-1], r(1) ∈ [0,n-2], ... , r(n-1) ∈ [0,1], r(n) ∈ [0] }
r := make([]int, len(perm))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggested comment is the start of explaining what a Lehmer code is. However it doesn't fully explain how r is obtained, or why there is a bijection between all permutations and the set [0,n-1] x [0,n-2] x ... x [0,1] x [0], so IMO it doesn't add much information.

In the original comment I only mentioned "compute Lehmer code", maybe I can add a link to what Lehmer code is (that contains all the details).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
}
// Convert to an integer following the factorial number system
m := 0
Copy link
Member

@jordanschalm jordanschalm Oct 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
m := 0
// We construct `m` by taking the sum of all elements of `r`, each multiplied by the factorial of their inverted index:
// m = ∑ r(i) * (n-i)!, i ∈ [0,n-1]
m := 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is more explanatory but it doesn't explain why it makes sense to compute m (why there is a bijection between [0,n-1] x [0,n-2] x ... x [0,1] x [0] and `[0, n! -1] ).
I suggest to add a link to the full explanation instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BasicDistributionTest(t, factN, 1, permEncoding)
})

t.Run("shuffle a same permutation", func(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we have this test, which looks identical to the previous one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • the first test shuffles an array multiple times. The resulting array from the previous shuffle is shuffled again. The distribution is computed based on all intermediate arrays from each loop.
  • the second test shuffles the same array [0,1, .. , n-1] multiple times. The resulting array after each loop is used for the distribution, but it is then ignored. The next loop shuffle is computed using the same starting array.

The shuffling should be uniform in both scenarios. Some buggy implementations may be uniform in one scenario and not the other.

@tarakby tarakby enabled auto-merge November 7, 2023 01:18
@tarakby tarakby added this pull request to the merge queue Nov 7, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 7, 2023
@tarakby tarakby added this pull request to the merge queue Nov 7, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 7, 2023
@tarakby tarakby added this pull request to the merge queue Nov 7, 2023
@tarakby tarakby removed this pull request from the merge queue due to a manual request Nov 7, 2023
@tarakby tarakby added this pull request to the merge queue Nov 7, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 7, 2023
@gomisha gomisha added this pull request to the merge queue Nov 7, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 7, 2023
@tarakby tarakby added this pull request to the merge queue Nov 8, 2023
Merged via the queue into master with commit fe6714e Nov 8, 2023
36 checks passed
@tarakby tarakby deleted the tarak/improve-shuffle-test branch November 8, 2023 01:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants