-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manually running beam-refactor
on Dataflow
#450
Comments
Adapting kwargs from here: job_name = # define manually
temp_gcs_location = # (TBD) GCS bucket path
container_image = # (TBD) URI of image I'll build for this experiment
dict(
runner="DataflowRunner",
project="pangeo-forge-4967",
job_name=job_name,
temp_location=temp_gcs_location,
use_public_ips=False,
region="us-central1",
experiments=["use_runner_v2"],
sdk_container_image=container_image,
save_main_session=True,
pickle_library="cloudpickle",
# for NOAA OISST machine can probably be smaller,
# but this should be fine for now
machine_type="n1-highmem-2",
) (Recording this here so others can see the thought process, and as a convenient place to keep notes for myself.) |
I just built: FROM pangeo/forge:5e51a29
RUN mamba run -n notebook pip install -U git+https://github.com/pangeo-forge/pangeo-forge-recipes.git@beam-refactor where I then ran the example code linked in the first comment above in a python interpreter inside a container started with that image, and confirmed that the NOAA OISST example runs successfully on it. From this point forward, we can be confident that any issues we encounter (perhaps there won't even be any 🤷 ) are Dataflow-specific. Next step is pushing this image to gcr.io and then deploying the example recipe to Dataflow. |
Completed following the method used in |
🎉 TL;DR: it works! Details below... Deployment and worker envs must be the same, so I deployed the job from a container running the $ docker run -it \
> -v '${DATAFLOW_KEYFILE}':"/opt/storage_key.json" \
> -e GOOGLE_APPLICATION_CREDENTIALS="/opt/storage_key.json" \
> --entrypoint=/bin/bash \
> gcr.io/pangeo-forge-4967/beam-refactor
Within the running container, I:
The job succeeded, and the dataset is openable with import xarray as xr
target_path = "gs://beam-dataflow-test/beam-refactor-oisst-0.zarr"
oisst_zarr = xr.open_dataset(target_path, engine="zarr") I'll make some further notes below re: next steps. cc @rabernat |
Now that the Getting Running
I'm sure I've oversimplified some things, but this is the basic landscape as far as I see it right now. cc @yuvipanda, in case you have suggestions. In particular, wondering what you think re: GCP deployment for |
pangeo-forge/pangeo-forge-runner#48 is what implements this |
beam-refactor
on Dataflowbeam-refactor
on Dataflow
I've updated this issue's title to reflect what is documented here: a process for manually running |
As documented in the notebook linked in #445 (comment), the
beam-refactor
runs end-to-end for a real world application (NOAA OISST) when run locally with Beam's DirectRunner. I'll now run this same recipe on Dataflow, to hopefully surface any DataflowRunner-specific issues, in preparation for mergingbeam-refactor
. I've opened this issue to track my work plan, and record findings. Notes on how I'll be proceeding:pangeo-forge-runner
for our current selection of Dataflow pipeline options (which we know to work, based on recent successful jobs).pangeo-forge-recipes
installed from thebeam-refactor
branch;pip
-installing on-top of our current base container would be lighter-weight, but I prefer to just push a dedicated container for this, because it more closely matches how we've been working on Dataflow thus far.beam-refactor
image. (Do this manually, using the Beam Python SDK.)Next steps, assuming the above works:
pangeo-forge-runner
documenting what will be required to support thebeam-refactor
branch on Pangeo Forge Cloud (main point: no more.to_beam()
compilation).traitlets
pangeo-forge-orchestrator#197 to support multiplepangeo-forge-recipes
parsing environments simultaneously on Pangeo Forge Cloud.Updates to follow on this thread.
The text was updated successfully, but these errors were encountered: