-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In memory one off #2
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
src/biom.cpp
Outdated
obs_ids = std::vector<std::string>(); | ||
obs_ids.reserve(n_obs); | ||
|
||
for(int i = 0; i < n_obs; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance note:
If n_objs and n_samples is large, we may want to create the various vector in parallel.
src/biom.cpp
Outdated
sample_ids = std::vector<std::string>(); | ||
sample_ids.reserve(n_samples); | ||
obs_ids = std::vector<std::string>(); | ||
obs_ids.reserve(n_obs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance note:
Instead of doing reserse + for{push_back},
You probably should do
resize+memcopy
Likely way faster.
(buffer is guaranteed to be consecutive in a vector, so can be used as a "simple C array", too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure how to resolve this if using std::string
as the individual char*
s would not be instantiated std::string
. For the moment, I've wrapped this in a parallel block. I think the right thing to do here would be to remove use of std::string
, which I'll examine when looking at removing copies.
Good call. I originally did resize, but was still pushing back (!). I didn’t realize memcpy worked with vector, I’ll look at that
… On Mar 17, 2022, at 8:43 AM, Igor Sfiligoi ***@***.***> wrote:
@sfiligoi commented on this pull request.
In src/biom.cpp <#2 (comment)>:
> +biom::biom(char** obs_ids_in,
+ char** samp_ids_in,
+ const int32_t* indices,
+ const int32_t* indptr,
+ const double* data,
+ const int n_obs,
+ const int n_samples,
+ const int nnz) : has_hdf5_backing(false) {
+ this->nnz = nnz;
+ this->n_samples = n_samples;
+ this->n_obs = n_obs;
+
+ sample_ids = std::vector<std::string>();
+ sample_ids.reserve(n_samples);
+ obs_ids = std::vector<std::string>();
+ obs_ids.reserve(n_obs);
Performance note:
Instead of doing reserse + for{push_back},
You probably should do
resize+memcopy
Likely way faster.
—
Reply to this email directly, view it on GitHub <#2 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADTZMQT6I3XUKM4ENJJUJDVANHKFANCNFSM5QRD2YIQ>.
Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you authored the thread.
|
for(unsigned int i = 0; i < n_obs; i++) { | ||
free(obs_indices_resident[i]); | ||
free(obs_data_resident[i]); | ||
// not using const on indices/indptr/data as the pointers are being borrowed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...we probably could still use const but i think it would require updating the object definition to change the member variables themselves to be const
@@ -309,7 +427,12 @@ unsigned int biom::get_sample_data_direct(const std::string &id, uint32_t *& cur | |||
} | |||
|
|||
double* biom::get_sample_counts() { | |||
double *sample_counts = (double*)calloc(sizeof(double), n_samples); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use directly calloc?
It is shorter and probably faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, left over from when attempting to parallelize this
Looks OK, but see my 3 minor comments above. |
Thanks! Just addressed I believe Once green, I'll merge and see if/whats needed for bioconda |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now
Should be internal — thanks
… On Mar 16, 2022, at 10:23 AM, Igor Sfiligoi ***@***.***> wrote:
@sfiligoi commented on this pull request.
In src/api.hpp <#2 (comment)>:
> + * alpha <double> GUniFrac alpha, only relevant if method == generalized.
+ * bypass_tips <bool> disregard tips, reduces compute by about 50%
+ * threads <uint> the number of threads to use.
+ * result <mat_t**> the resulting distance matrix in condensed form, this is initialized within the method so using **
+ *
+ * one_off_inmem returns the following error codes:
+ *
+ * okay : no problems encountered
+ * unknown_method : the requested method is unknown.
+ * table_empty : the table does not have any entries
+ */
+EXTERN ComputeStatus one_off_inmem(support_biom_t *table_data, support_bptree_t *tree_data,
+ const char* unifrac_method, bool variance_adjust, double alpha,
+ bool bypass_tips, unsigned int threads, mat_t** result);
+
+/* define this thing... */
Do we really need to define it here?
Or would it be better as an internal function, only?
—
Reply to this email directly, view it on GitHub <#2 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADTZMSI7365S4Y3N6R3QD3VAIKJVANCNFSM5QRD2YIQ>.
Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you authored the thread.
|
This pull request provides a new API method to allow computation on existing
biom
andBPTree
structures, and additionally exposes a means to construct abiom
from in memory objects. Note thatBPTree
already has support instantiation from in memory constructs.