Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No way of manipulating data tables in memory #273

Closed
as2875 opened this issue Jul 20, 2020 · 8 comments
Closed

No way of manipulating data tables in memory #273

as2875 opened this issue Jul 20, 2020 · 8 comments
Labels

Comments

@as2875
Copy link

as2875 commented Jul 20, 2020

@lwinfree, @sje30

When converting to Frictionless from another data format (e.g. HDF5), scripts have to

  1. import the package descriptor,
  2. make a package with the descriptor,
datapackage_path = pkg_resources.resource_filename(
    __package__,
    "datapackage.json")
package = datapackage.Package(
    base_path=base,
    descriptor=datapackage_path)
  1. make some CSV files with the right names, and finally
  2. call package.save.

It would be useful to skip the stage with writing CSV files to disk. If this functionality exists and I am missing something, please let me know, it would be very useful.


Please preserve this line to notify @roll (lead of this repository)

@lwinfree
Copy link
Member

hi @roll when you are back next week, can you please look at this?

@roll roll added the question label Jul 27, 2020
@roll
Copy link
Member

roll commented Jul 27, 2020

Hi @as2875,

could you please elaborate a little bit? You mean you don't want package.save to save the data, only the descriptor?

@as2875
Copy link
Author

as2875 commented Jul 27, 2020

Hi @roll. I mean that I don't want to write CSV files to the disk, just the final zipped data package. Say I have some tables stored in Python data structures in memory. Rather than write the tables to CSV files, then call package.save, I would like to create some Resource objects, point them to the data structures, and then call package.save.

The example is from converting multiple HDF5 files to Frictionless data packages. At the moment, I have to create a package, read in the HDF5 datasets, store them as 2-D lists, write the contents of the lists to CSV files, call package.save, then delete the original CSV files. I want to skip the operations involving CSV files.

@roll roll added feature and removed question labels Jul 27, 2020
@roll
Copy link
Member

roll commented Jul 27, 2020

Thanks. I think it's not possible at the moment. I've marked it as feature request

@as2875
Copy link
Author

as2875 commented Jul 27, 2020

Thanks @roll. This would make data conversion pipelines and parallel processing a lot simpler.

@roll
Copy link
Member

roll commented Jul 27, 2020

@as2875
BTW there is dataflows - https://github.com/datahq/dataflows, I'm wondering you can achieve this goal using a flow

@as2875
Copy link
Author

as2875 commented Jul 27, 2020

Thanks for the suggestion @roll. dataflows looks promising.

@roll
Copy link
Member

roll commented Sep 26, 2020

MERGED into frictionlessdata/frictionless-py#439

More info about Frictionless Framework

@roll roll closed this as completed Sep 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants