You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
Sorry if there was an easier way to extract data that I haven't seen but:
Is your feature request related to a problem? Please describe.
The current way strawC reports data requires heavy conversion before being useful, while the normal straw reports a list of lists, strawC reports it as objects that can't be accessed easily.
While I see that the extraction itself is many times faster than the normal version the added overhead to covert the data makes it slower or the same speed as normal straw.
673 ms ± 19.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Describe the solution you'd like
Is it possible to report the data either like the normal straw, or as a numpy array, or even directly as a scipy sparse matrix?
If I understand correctly it is possible to use numpy structures in c++ in pybind, maybe a version designed like that?
Thanks!
The text was updated successfully, but these errors were encountered:
This is a very good idea. It should be possible to extract as a numpy array
or scipy sparse. We probably won't be able to get to this for a few weeks
and would welcome any contributions from the community.
On Thu, Jun 4, 2020 at 5:27 AM chenfu shi ***@***.***> wrote:
Hello,
Sorry if there was an easier way to extract data that I haven't seen but:
*Is your feature request related to a problem? Please describe.*
The current way strawC reports data requires heavy conversion before being
useful, while the normal straw reports a list of lists, strawC reports it
as objects that can't be accessed easily.
While I see that the extraction itself is many times faster than the
normal version the added overhead to covert the data makes it slower or the
same speed as normal straw.
%%timeit
data = strawC.strawC('NONE', hic_folder+files[1], 'chr22', 'chr22', 'BP', 10000)
extract = lambda x: (x.binX, x.binY, x.counts)
converted_data = np.array(list(map(extract, data)), dtype = np.int64)
matrix = scipy.sparse.coo_matrix((converted_data[:,2],(converted_data[:,0]//10000,converted_data[:,1]//10000)))
707 ms ± 10.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
data = straw.straw('NONE', hic_folder+files[1], 'chr22', 'chr22', 'BP', 10000)
matrix = scipy.sparse.coo_matrix((data[2],(np.array(data[0])//10000,np.array(data[1])//10000)))
673 ms ± 19.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
*Describe the solution you'd like*
Is it possible to report the data either like the normal straw, or as a
numpy array, or even directly as a scipy sparse matrix?
If I understand correctly it is possible to use numpy structures in c++ in
pybind, maybe a version designed like that?
Thanks!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#50>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAK2EWYNMHZUMWE7GMLX6VLRU5SHTANCNFSM4NSOE6QA>
.
Hello,
Sorry if there was an easier way to extract data that I haven't seen but:
Is your feature request related to a problem? Please describe.
The current way strawC reports data requires heavy conversion before being useful, while the normal straw reports a list of lists, strawC reports it as objects that can't be accessed easily.
While I see that the extraction itself is many times faster than the normal version the added overhead to covert the data makes it slower or the same speed as normal straw.
707 ms ± 10.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
673 ms ± 19.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Describe the solution you'd like
Is it possible to report the data either like the normal straw, or as a numpy array, or even directly as a scipy sparse matrix?
If I understand correctly it is possible to use numpy structures in c++ in pybind, maybe a version designed like that?
Thanks!
The text was updated successfully, but these errors were encountered: