Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for subsampling with replacement #33

Merged
merged 41 commits into from
Mar 6, 2023

Conversation

sfiligoi
Copy link
Collaborator

No description provided.

@sfiligoi sfiligoi requested a review from wasade February 17, 2023 01:34
Copy link
Member

@wasade wasade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor questions but please note the questions on the RNG and the filtering

src/biom_inmem.hpp Outdated Show resolved Hide resolved
src/biom_subsampled.cpp Show resolved Hide resolved
src/biom_subsampled.cpp Outdated Show resolved Hide resolved
src/biom_subsampled.cpp Outdated Show resolved Hide resolved
src/biom_subsampled.cpp Outdated Show resolved Hide resolved
}
}

void linked_sparse_transposed::transposed_subsample_with_replacement(const uint32_t n, const uint32_t random_seed) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In looking close at this, I realized there is an added constraint when using "with replacement": it is necessary to filter out any sample whose counts are below the sampling depth. I realize this is not done by default in the biom.Table.subsample method but it arguably should be. The current work around is that we filter the table (i.e., biom.Table.filter) in advance of a with replacement call, and that is still viable to do here. What I recommend we do is get this merged first, then consider how to address filtering in a subsequent PR, does that sound good?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this just implements subsampling with replacement as per biom.Table.
If you want a different semantics, we can change it to whatever is the most useful.
We do have the samples_count already, so it would be cheap.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wasade I think I have implemented what you requested.

@wasade wasade merged commit 5dfd954 into biocore:main Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants