-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Beam-Refactor] Edits to intro_tutorial_part[*]
#485
[Beam-Refactor] Edits to intro_tutorial_part[*]
#485
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Thanks a lot @norlandrhagen. You're asking exactly the right question here. In pangeo-forge/pangeo-forge-runner#48, the working idea is that Something @yuvipanda has said for a while, though, is that we should eventually probably totally forgo the manual running of recipes using the recipe objects themselves, in favor of having a unified CLI that does production deployment and manual testing runs. So if that tool were called (as Yuvi once suggested to me, simply $ pf run recipe.py --target=/local/dir At the risk of "mission creep" and piling on further blockers to the goal of Getting beam-refactor merged, I do think we may want to be asking ourselves at this point, what minimal feature set is required, not just to merge Per @sharkinsspatial's comment at yesterday's meeting, API churn makes it challenging for developers to contribute, and also I'd say it limits user adoption, insofar as a constantly changing API makes it hard to become a long-term dedicated user. In that vein, I would be hesitant to, for the sake of expedience, release This recasts your question in a broader frame, which perhaps is unhelpful to the more specific question of how you can be helpful immediately. In terms of actual documentation rewrites today, I'd say the notebooks in https://github.com/pangeo-forge/pangeo-forge-recipes/tree/master/docs/pangeo_forge_recipes/tutorials/xarray_zarr present an easier entry point because they do not touch this (harder) question of the relationship to the Pangeo Forge Cloud contribution flow. |
Hi @cisaacstern, I created #483 last week which is relevant for your last point. I was hoping to add a new CCMP recipe tutorial here that includes writing a custom processor, i.e. a replacement for the |
If you're running PFR outside of PF Cloud, you will always have to provide a target for the pipeline to store data to. The way we handle this now is by automatically creating a temporary directory for the outputs if it is not specified. That's why there is no In retrospect, I'm not sure this is a good idea. It makes the tutorials simpler, at the expense of user confusion. @norlandrhagen - I'm curious what you would like to have happen. Where would you like your data to end up for this tutorial? |
Thanks for the detailed explanation @cisaacstern! Just to clarify my understanding a bit.
transforms = (
beam.Create(pattern.items())
| OpenURLWithFSSpec()
| OpenWithXarray(file_type=pattern.file_type)
| StoreToZarr(
combine_dims=pattern.combine_dim_keys,
)
)
transforms
@rabernat In the Thanks again for the information! Haven't look around at the beam-refactor branch much at all, so I have a lot of learning to do. |
Yes. With the caveat that pangeo-forge/pangeo-forge-runner#48 (which defines this behavior) hasn't been merged yet, so is still subject to change.
Also yes. And sounds like the only caveat here is just to coordinate with @derekocallaghan on #483 to avoid duplicating effort. |
I've created #487 today that contains an initial port of the NetCDF Zarr Multi-Variable Sequential Recipe: NOAA World Ocean Atlas notebook to use the In the meantime, the ported NOAA World Ocean Atlas notebook in #487 may be useful as an example for porting other tutorial notebooks. It's possible these may take a number of iterations depending on API changes. |
Hi there,
I tried running the
beam-refactor
intro_tutorial
docs and ran into a few issues. I made some edits to the docs (removing old imports etc.), however, inpart3
for the examplerecipe.py
, I'm not sure what to specify totarget_path
. Should theStoreToZarr
transform be removed in a production recipe?Are there any examples of
recipes
that have been successfully ran using the beam runner?Thanks!
@rabernat or @cisaacstern