python strawC should report data in a more usable format #50

ChenfuShi · 2020-06-04T09:27:07Z

Hello,
Sorry if there was an easier way to extract data that I haven't seen but:

Is your feature request related to a problem? Please describe.
The current way strawC reports data requires heavy conversion before being useful, while the normal straw reports a list of lists, strawC reports it as objects that can't be accessed easily.
While I see that the extraction itself is many times faster than the normal version the added overhead to covert the data makes it slower or the same speed as normal straw.

%%timeit
data = strawC.strawC('NONE', hic_folder+files[1], 'chr22', 'chr22', 'BP', 10000)
extract = lambda x: (x.binX, x.binY, x.counts)
converted_data = np.array(list(map(extract, data)), dtype = np.int64)
matrix = scipy.sparse.coo_matrix((converted_data[:,2],(converted_data[:,0]//10000,converted_data[:,1]//10000)))

707 ms ± 10.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
data = straw.straw('NONE', hic_folder+files[1], 'chr22', 'chr22', 'BP', 10000)
matrix = scipy.sparse.coo_matrix((data[2],(np.array(data[0])//10000,np.array(data[1])//10000)))

673 ms ± 19.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Describe the solution you'd like
Is it possible to report the data either like the normal straw, or as a numpy array, or even directly as a scipy sparse matrix?
If I understand correctly it is possible to use numpy structures in c++ in pybind, maybe a version designed like that?

Thanks!

nchernia · 2020-06-04T11:22:51Z

This is a very good idea. It should be possible to extract as a numpy array or scipy sparse. We probably won't be able to get to this for a few weeks and would welcome any contributions from the community.

…

On Thu, Jun 4, 2020 at 5:27 AM chenfu shi ***@***.***> wrote: Hello, Sorry if there was an easier way to extract data that I haven't seen but: *Is your feature request related to a problem? Please describe.* The current way strawC reports data requires heavy conversion before being useful, while the normal straw reports a list of lists, strawC reports it as objects that can't be accessed easily. While I see that the extraction itself is many times faster than the normal version the added overhead to covert the data makes it slower or the same speed as normal straw. %%timeit data = strawC.strawC('NONE', hic_folder+files[1], 'chr22', 'chr22', 'BP', 10000) extract = lambda x: (x.binX, x.binY, x.counts) converted_data = np.array(list(map(extract, data)), dtype = np.int64) matrix = scipy.sparse.coo_matrix((converted_data[:,2],(converted_data[:,0]//10000,converted_data[:,1]//10000))) 707 ms ± 10.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) %%timeit data = straw.straw('NONE', hic_folder+files[1], 'chr22', 'chr22', 'BP', 10000) matrix = scipy.sparse.coo_matrix((data[2],(np.array(data[0])//10000,np.array(data[1])//10000))) 673 ms ± 19.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) *Describe the solution you'd like* Is it possible to report the data either like the normal straw, or as a numpy array, or even directly as a scipy sparse matrix? If I understand correctly it is possible to use numpy structures in c++ in pybind, maybe a version designed like that? Thanks! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#50>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAK2EWYNMHZUMWE7GMLX6VLRU5SHTANCNFSM4NSOE6QA> .

-- Neva Cherniavsky Durand, Ph.D. Pronouns: she, her, hers Assistant Professor, Aiden Lab www.aidenlab.org

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python strawC should report data in a more usable format #50

python strawC should report data in a more usable format #50

ChenfuShi commented Jun 4, 2020

nchernia commented Jun 4, 2020 via email

python strawC should report data in a more usable format #50

python strawC should report data in a more usable format #50

Comments

ChenfuShi commented Jun 4, 2020

nchernia commented Jun 4, 2020 via email