-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sampling with replacement to Table.subsample #774
Comments
Quick question on reproducibility Looking at the tests for Table.subsample I don't see any uses of numpy random seeds. And I'm having trouble making consistent unittests when using @wasade , any thoughts on setting random seeds? |
No strong feelings
…On Sun, May 6, 2018, 10:25 PM Jamie Morton ***@***.***> wrote:
Quick question on reproducibility
Looking at the tests for [Table.subsample]
https://github.com/biocore/biom-format/blob/master/tests/test_table.py#L2619)
I don't see any uses of numpy random seeds. And I'm having trouble making
consistent unittests when using Table.subsample.
@wasade <https://github.com/wasade> , any thoughts on setting random
seeds?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#774 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAc8sqk_5fLfaYPxHI0CIPCy6e74l_zZks5tv9q0gaJpZM4Tdjjf>
.
|
Do either of you have a pre-PR branch for this that I could try to pick up and carry forward? |
@stevendbrown, thank you for the inquiry! If you have bandwidth, we'd love a PR. The change should be relatively small, as it should just require a branch and call to |
@wasade OK, I made a basic implementation but I did it in the Cython code since that logic is already worked out, and doesn't require using dense data. I'm checking my work now to make sure I didn't bungle something non-obviously, but on the surface it looks pretty clean. Are there reasons I shouldn't do this and should do it instead using pure Python (e.g. maybe writing tests is harder)? |
No reason not too! Just as a heads up, appropriate unit tests will necessary for merge. One example for |
Note that the numpy implementation of multinomial is already written in C -- not entirely sure how much faster it will be in a cython implementation ... |
@mortonjt Agreed. For me it's less about speed and more about piggyback on the existing Cython code to manipulate sparse data as input to |
👍 Look forward to seeing your PR!
…On Tue, Sep 4, 2018, 1:49 PM Steven Brown ***@***.***> wrote:
@mortonjt <https://github.com/mortonjt> Agreed. For me it's less about
speed and more about piggyback on the existing Cython code to manipulate
sparse data as input to multinomial. It's the "record-keeping" code
around the subsampling of which I'm trying to take advantage, rather than
recapitulating the existing Cython gymnastics (e.g. looping over idxptr)
for handling the sparse matrix data a level up in table.py. I'll get this
into a PR and maybe we can discuss options there?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#774 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AD_a3a0Bgh5xvxdRR4GL1MLxBd2SgI-bks5uXudKgaJpZM4Tdjjf>
.
|
Presumably this can be closed? |
Yes, thanks. Used to the issues getting closed automagically :) |
This could likely adapt the strategy used in scikit-bio and is important for instances where the sum of a vector is massive.
Cc @mortonjt (couldn't assign for some reason)
The text was updated successfully, but these errors were encountered: