Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for subsampling without replacement #43

Merged
merged 13 commits into from
Jun 9, 2023

Conversation

sfiligoi
Copy link
Collaborator

@sfiligoi sfiligoi commented Jun 8, 2023

No description provided.

@sfiligoi
Copy link
Collaborator Author

sfiligoi commented Jun 8, 2023

CC @wasade

@sfiligoi
Copy link
Collaborator Author

sfiligoi commented Jun 9, 2023

Computing PCOA of EMP with n=200 subsampling without replacement:
Using the biom-format subsampling of biom file, then ssu (once):

PC1    955.209717
PC2    565.497986
PC3    417.900940
PC4    332.818481

Using ssu with 100 subsampling:

PC1 mean:           956.358329 std:             1.165118
PC2 mean:           565.383715 std:             1.068292
PC3 mean:           417.587330 std:             1.433543
PC4 mean:           331.851700 std:             4.373472

// use persistent buffer to minimize allocation costs
std::vector<uint64_t> data; // original values
std::vector<uint32_t> sample_out; // random output buffer
std::vector<uint32_t> data_out; // computed values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean the max count for a given feature in a given sample is 2**32?

Copy link
Collaborator Author

@sfiligoi sfiligoi Jun 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorta....
This is the output counts, not the input counts.
Because n is 32-bit, and the output cannot be larger than n.
Input counts are 64-bit.

@wasade wasade merged commit ba11b1b into biocore:main Jun 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants