-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GLAD dataset adapter #61
Comments
So will we have also gdp, gdp6h, parcels etc adapters? |
Yes, and anything else that we want and that our users ask for, assuming it's in scope. |
Hi guys, this is an interesting projects I've recently came across. I found that the ragged array data structure could also be applied to Lagrangian type of data like tropical cyclones best track datasets (see my repo). I've once tried to design a data struct (basically a wrapper of pandas.DataFrame) and adapt it to the GDP drifter dataset (6hr version, not hourly, see here). Since your ragged data struct follows the CF convention, I feel that it is much better to use this data struct to refactor my repo for tropical cyclone. A much further thought is: is it possible to isolate the lagrangian data struct as a standalone package, like Once very large dataset is being handled, how about the efficiency of ragged array? Pandas and xarray has many capabilities to deal with huge datasets (like out-of-core computation). Since the doc is still in development, I cannot know many details of your designs. Just some thoughts here with this great package. |
The main class of the package is designed to be used with any datasets. Look at the example notebooks here, https://github.com/Cloud-Drift/clouddrift-examples/tree/main/notebooks, in particular I think the numerical data could be adapted to your needs! Happy to help if you have any questions. PS: we are changing the name of the class from |
Thanks @miniufo for your interest and ideas. To clarify the You're welcome to use clouddrift's Alternatively, we can also implement these adapters directly in clouddrift; we could work on that together if you'd like. |
@philippemiron Thanks for pointing me to the notebooks. I've spent some times trying with the I feel a little confused why we need a internal Just try to understand your design. I do like to help if I can. |
This is correct. Most of the analysis functions are based on
The idea of the In your case, if I understand correctly, you can probably just reshape the data, and create a Once you have this object, there are functions to easily convert to either an
|
I haven't found a way to download the dataset (https://data.gulfresearchinitiative.org/data/R1.x134.073:0004) from the code. This is because there is no static dataset URL, but instead it's resolved dynamically via JavaScript (and quite likely server calls). We have a few options:
2 would allow for a better user experience. Since the dataset is DOI'd and finalized, we could serve a copy from a place we control without worry that the upstream dataset may change. @selipot do we have an S3 bucket for the project that we could use? |
We do not have a bucket but we could create one. We need to figure out the cost? |
S3 Standard is $0.023 per GB, so for GLAD that would be $0.00345 per download, or 290 downloads per $1. |
I now see that @philippemiron already had extracted a static URL from the backend in the GLAD example notebook. I'll check that it still works and we'll just use that if so. |
It works; all good. |
I think I looked at the Developer tools -> Network tabs at the time to find this direct link...! Glad to see it still works! |
@philippemiron that's smart, I haven't thought of that, only looked in page source. :) |
Part of #53.
Can be adapted from clouddrift-examples/data/glad.py into clouddrift/adapters.py.
The text was updated successfully, but these errors were encountered: