Allow tarball as input for pygac-fdr-run #4

carloshorn · 2020-05-20T11:32:29Z

It would be nice, if we could process entire tarballs filled with gzipped files.

The adaptation of the pygac-fdr-run script to open the tarballs could be similar to https://github.com/pytroll/pygac/blob/master/bin/pygac-run.

Then, we can pass the open file object as reader keyword argument to the scene constructor.

The text was updated successfully, but these errors were encountered:

mraspaud · 2020-05-20T12:53:14Z

So the expected output would be multiple netcdf files, right ?

sfinkens · 2020-05-22T09:19:06Z

Yes I think this should be possible (with an update of the satpy reader to make use of that new argument)

carloshorn · 2020-08-04T14:31:26Z

Hi @sfinkens and @mraspaud,

I have a hot fix for this issue, but I wanted to discuss the general concept with you.

I did a little change to the pygac-fdr-run script to open tarballs and pass the file objects as reader key word argument, furthermore, I drop the gzip suffix from the filename to avoid trouble with satpy. In pygac.reader I added the file object as reader argument and attribute and in the klm/pod reader read method, I check if a file object is passed as argument or if the reader has a file object as attribute else use the filename as path to the file.

It works without too many changes, but actually, I don't like it... I think the confusion results from using filename and file location synonymously. This results in satpy having too many expectations on the filename (has to be a path and needs to follow some pattern to find a dedicated reader), instead of allowing the user to pass a file location (either path or file object) together with a user defined reader (the user gets what he orders, the right choice is the user responsibility)... Getting additional help in choosing the reader based on filename heuristic should be an additional feature that should only be offered on explicit user demand. Definitely a ticket that I should open on satpy, but I don't know its code well and it could take long to get it working for all readers. However, I could imagine many use cases where a file does only exist in memory and you don't want to dump it on disk.

Should I push my hot fix to pygac and pygac-fdr (maybe some dev branches that never find their way into the master), or tackle the issue on satpy which should keep pygac unchanged, but containing the risk that I don't have a clue on how long it could take?
What do you think? Any estimates on the workload from your side?

sfinkens · 2020-08-05T10:11:48Z

@carloshorn Good question. I can see the advantages of your proposal. Just a couple of thoughts from the top of my head why satpy is the way it is. Probably @mraspaud can explain this better than me, but I'll try.

The majority of satpy readers use dask to read the data in chunks from the files. Having all the data in memory is a less common use case I would say. That's why the strong dependence on filenames is kind of natural to satpy.
Furthermore, there are cases where data from the same instrument comes in a variety of different formats with varying contents. Here satpy uses the file name to determine the file type and provide the user information on the expected datasets in the files. I can imagine determining the file type from file object(s) could be hard in some cases. It would certainly be a lot of work to update all the readers in this regard.

If there is a consensus to move in this direction, I'd estimate it would take several weeks (including discussion, testing etc) to get this done.

sfinkens · 2020-08-05T10:15:02Z

Using some dev branches would have the disadvantage, that we cannot reference a proper software version in the global attributes of the netcdf files.

carloshorn · 2020-12-15T09:00:12Z

Related: pytroll/pygac#92
Once merged, the only thing left is creating a tarball filesystem and use a PathLike object as filename argument.

carloshorn mentioned this issue Aug 5, 2020

Allow reading files passing file objects pytroll/satpy#1299

Open

sfinkens added the hacktoberfest label Oct 6, 2020

carloshorn mentioned this issue Dec 15, 2020

add tarfile support #82

Merged

mraspaud closed this as completed in #82 Jan 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow tarball as input for pygac-fdr-run #4

Allow tarball as input for pygac-fdr-run #4

carloshorn commented May 20, 2020

mraspaud commented May 20, 2020

sfinkens commented May 22, 2020

carloshorn commented Aug 4, 2020

sfinkens commented Aug 5, 2020

sfinkens commented Aug 5, 2020

carloshorn commented Dec 15, 2020

Allow tarball as input for pygac-fdr-run #4

Allow tarball as input for pygac-fdr-run #4

Comments

carloshorn commented May 20, 2020

mraspaud commented May 20, 2020

sfinkens commented May 22, 2020

carloshorn commented Aug 4, 2020

sfinkens commented Aug 5, 2020

sfinkens commented Aug 5, 2020

carloshorn commented Dec 15, 2020