You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The ReferenceSequence.data attribute returns the reference sequence data as a string. For large references we almost definitely don't want to do this, as this will create a new Python string and copy of the data. So, it would be good to have a numpy array view of the data.
We should see first how we might use this, though. The only place we're using this at the moment is in the alignments method. In this case we can definitely sidestep the full Python string because we're immediately turning the data into a numpy array here. So, it'll be quite easy to have an internal API using something like data_array which is a view.
However, it might not be worth doing this because we'll have to implement alignments in C fairly soon anyway (#1589
If it's easy I'll implement the data_array when we're in read-only mode for #1935, which is soon on the menu.
In general, I don't think we'll be accessing the data attribute directly much, as we'll want to present a higher-level interface in Python (for example, we implement __getitem__ to support pulling out a slice of a reference, which can operate on either the data or url - see #1988)
The text was updated successfully, but these errors were encountered:
Thinking about the alignments issue some more, I'm not sure this is worth optimising for. We're currently storing n copies of the reference sequence anyway which are the alignments, so avoiding one more copy would be a very minor optimisation.
The ReferenceSequence.data attribute returns the reference sequence data as a string. For large references we almost definitely don't want to do this, as this will create a new Python string and copy of the data. So, it would be good to have a numpy array view of the data.
We should see first how we might use this, though. The only place we're using this at the moment is in the
alignments
method. In this case we can definitely sidestep the full Python string because we're immediately turning the data into a numpy array here. So, it'll be quite easy to have an internal API using something likedata_array
which is a view.However, it might not be worth doing this because we'll have to implement
alignments
in C fairly soon anyway (#1589If it's easy I'll implement the
data_array
when we're in read-only mode for #1935, which is soon on the menu.In general, I don't think we'll be accessing the
data
attribute directly much, as we'll want to present a higher-level interface in Python (for example, we implement__getitem__
to support pulling out a slice of a reference, which can operate on either thedata
orurl
- see #1988)The text was updated successfully, but these errors were encountered: