-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for subsampling without replacement #43
Conversation
CC @wasade |
Computing PCOA of EMP with n=200 subsampling without replacement:
Using ssu with 100 subsampling:
|
// use persistent buffer to minimize allocation costs | ||
std::vector<uint64_t> data; // original values | ||
std::vector<uint32_t> sample_out; // random output buffer | ||
std::vector<uint32_t> data_out; // computed values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean the max count for a given feature in a given sample is 2**32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorta....
This is the output counts, not the input counts.
Because n is 32-bit, and the output cannot be larger than n.
Input counts are 64-bit.
No description provided.