Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In memory one off #2

Merged
merged 31 commits into from
Mar 21, 2022
Merged

In memory one off #2

merged 31 commits into from
Mar 21, 2022

Conversation

wasade
Copy link
Member

@wasade wasade commented Mar 12, 2022

This pull request provides a new API method to allow computation on existing biom and BPTree structures, and additionally exposes a means to construct a biom from in memory objects. Note that BPTree already has support instantiation from in memory constructs.

src/api.cpp Outdated Show resolved Hide resolved
src/api.hpp Outdated Show resolved Hide resolved
src/biom.hpp Outdated Show resolved Hide resolved
src/api.hpp Outdated Show resolved Hide resolved
src/biom.cpp Outdated Show resolved Hide resolved
src/biom.cpp Outdated Show resolved Hide resolved
src/biom.cpp Outdated Show resolved Hide resolved
src/api.hpp Outdated Show resolved Hide resolved
src/api.hpp Outdated Show resolved Hide resolved
src/Makefile Outdated Show resolved Hide resolved
src/Makefile Outdated Show resolved Hide resolved
src/api.hpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@sfiligoi sfiligoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

src/biom.cpp Outdated
obs_ids = std::vector<std::string>();
obs_ids.reserve(n_obs);

for(int i = 0; i < n_obs; i++) {
Copy link
Collaborator

@sfiligoi sfiligoi Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance note:
If n_objs and n_samples is large, we may want to create the various vector in parallel.

src/biom.cpp Outdated
sample_ids = std::vector<std::string>();
sample_ids.reserve(n_samples);
obs_ids = std::vector<std::string>();
obs_ids.reserve(n_obs);
Copy link
Collaborator

@sfiligoi sfiligoi Mar 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance note:
Instead of doing reserse + for{push_back},
You probably should do
resize+memcopy
Likely way faster.
(buffer is guaranteed to be consecutive in a vector, so can be used as a "simple C array", too)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure how to resolve this if using std::string as the individual char*s would not be instantiated std::string. For the moment, I've wrapped this in a parallel block. I think the right thing to do here would be to remove use of std::string, which I'll examine when looking at removing copies.

@wasade
Copy link
Member Author

wasade commented Mar 17, 2022 via email

src/api.cpp Show resolved Hide resolved
src/api.cpp Outdated Show resolved Hide resolved
for(unsigned int i = 0; i < n_obs; i++) {
free(obs_indices_resident[i]);
free(obs_data_resident[i]);
// not using const on indices/indptr/data as the pointers are being borrowed
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...we probably could still use const but i think it would require updating the object definition to change the member variables themselves to be const

src/biom.cpp Outdated Show resolved Hide resolved
@@ -309,7 +427,12 @@ unsigned int biom::get_sample_data_direct(const std::string &id, uint32_t *& cur
}

double* biom::get_sample_counts() {
double *sample_counts = (double*)calloc(sizeof(double), n_samples);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use directly calloc?
It is shorter and probably faster.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, left over from when attempting to parallelize this

src/test_su.cpp Outdated Show resolved Hide resolved
src/test_su.cpp Outdated Show resolved Hide resolved
@sfiligoi
Copy link
Collaborator

Looks OK, but see my 3 minor comments above.

@wasade
Copy link
Member Author

wasade commented Mar 21, 2022

Thanks! Just addressed I believe

Once green, I'll merge and see if/whats needed for bioconda

Copy link
Collaborator

@sfiligoi sfiligoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now

@wasade wasade merged commit 864cc46 into main Mar 21, 2022
@wasade
Copy link
Member Author

wasade commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants