No way of manipulating data tables in memory #273

as2875 · 2020-07-20T15:41:49Z

When converting to Frictionless from another data format (e.g. HDF5), scripts have to

import the package descriptor,
make a package with the descriptor,

datapackage_path = pkg_resources.resource_filename(
    __package__,
    "datapackage.json")
package = datapackage.Package(
    base_path=base,
    descriptor=datapackage_path)

make some CSV files with the right names, and finally
call package.save.

It would be useful to skip the stage with writing CSV files to disk. If this functionality exists and I am missing something, please let me know, it would be very useful.

Please preserve this line to notify @roll (lead of this repository)

The text was updated successfully, but these errors were encountered:

lwinfree · 2020-07-20T19:58:53Z

hi @roll when you are back next week, can you please look at this?

roll · 2020-07-27T05:33:47Z

Hi @as2875,

could you please elaborate a little bit? You mean you don't want package.save to save the data, only the descriptor?

as2875 · 2020-07-27T09:35:19Z

Hi @roll. I mean that I don't want to write CSV files to the disk, just the final zipped data package. Say I have some tables stored in Python data structures in memory. Rather than write the tables to CSV files, then call package.save, I would like to create some Resource objects, point them to the data structures, and then call package.save.

The example is from converting multiple HDF5 files to Frictionless data packages. At the moment, I have to create a package, read in the HDF5 datasets, store them as 2-D lists, write the contents of the lists to CSV files, call package.save, then delete the original CSV files. I want to skip the operations involving CSV files.

roll · 2020-07-27T10:48:54Z

Thanks. I think it's not possible at the moment. I've marked it as feature request

as2875 · 2020-07-27T10:53:02Z

Thanks @roll. This would make data conversion pipelines and parallel processing a lot simpler.

roll · 2020-07-27T10:54:59Z

@as2875
BTW there is dataflows - https://github.com/datahq/dataflows, I'm wondering you can achieve this goal using a flow

as2875 · 2020-07-27T11:08:19Z

Thanks for the suggestion @roll. dataflows looks promising.

roll · 2020-09-26T12:03:57Z

MERGED into frictionlessdata/frictionless-py#439

More info about Frictionless Framework

roll added the question label Jul 27, 2020

roll added feature and removed question labels Jul 27, 2020

roll mentioned this issue Sep 26, 2020

Options to zip data package with remote/inline resources frictionlessdata/frictionless-py#439

Closed

roll closed this as completed Sep 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No way of manipulating data tables in memory #273

No way of manipulating data tables in memory #273

as2875 commented Jul 20, 2020

lwinfree commented Jul 20, 2020

roll commented Jul 27, 2020

as2875 commented Jul 27, 2020

roll commented Jul 27, 2020

as2875 commented Jul 27, 2020

roll commented Jul 27, 2020

as2875 commented Jul 27, 2020

roll commented Sep 26, 2020 •

edited

Loading

No way of manipulating data tables in memory #273

No way of manipulating data tables in memory #273

Comments

as2875 commented Jul 20, 2020

lwinfree commented Jul 20, 2020

roll commented Jul 27, 2020

as2875 commented Jul 27, 2020

roll commented Jul 27, 2020

as2875 commented Jul 27, 2020

roll commented Jul 27, 2020

as2875 commented Jul 27, 2020

roll commented Sep 26, 2020 • edited Loading

roll commented Sep 26, 2020 •

edited

Loading