Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zero-copy array access to ReferenceSequence #1989

Open
jeromekelleher opened this issue Dec 2, 2021 · 1 comment
Open

Add zero-copy array access to ReferenceSequence #1989

jeromekelleher opened this issue Dec 2, 2021 · 1 comment
Labels
Python API Issue is about the Python API

Comments

@jeromekelleher
Copy link
Member

The ReferenceSequence.data attribute returns the reference sequence data as a string. For large references we almost definitely don't want to do this, as this will create a new Python string and copy of the data. So, it would be good to have a numpy array view of the data.

We should see first how we might use this, though. The only place we're using this at the moment is in the alignments method. In this case we can definitely sidestep the full Python string because we're immediately turning the data into a numpy array here. So, it'll be quite easy to have an internal API using something like data_array which is a view.

However, it might not be worth doing this because we'll have to implement alignments in C fairly soon anyway (#1589

If it's easy I'll implement the data_array when we're in read-only mode for #1935, which is soon on the menu.

In general, I don't think we'll be accessing the data attribute directly much, as we'll want to present a higher-level interface in Python (for example, we implement __getitem__ to support pulling out a slice of a reference, which can operate on either the data or url - see #1988)

@jeromekelleher jeromekelleher added the Python API Issue is about the Python API label Dec 2, 2021
@jeromekelleher
Copy link
Member Author

Thinking about the alignments issue some more, I'm not sure this is worth optimising for. We're currently storing n copies of the reference sequence anyway which are the alignments, so avoiding one more copy would be a very minor optimisation.

@jeromekelleher jeromekelleher added this to the Python upcoming milestone Dec 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Python API Issue is about the Python API
Projects
None yet
Development

No branches or pull requests

1 participant