-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GCP] Setup Ray cluster on GCP so that it can read / write to Google Storage (GCS) #35140
Comments
@yuduber can you post some stacktraces? |
This is on a ray node provisioned in our GKE project with proper settings of service account for teh gcs:// bucket. We verified the setting with regular google.cloud.storage.Client [root@/ml-code #]python3
The above exception was the direct cause of the following exception: ray::_get_read_tasks() (pid=799, ip=172.24.0.22)
|
@yuduber Have you tried Ray with Python 3.7+ (or, is the problem specific to when you use Ray with Python 3.6)? |
|
@zhe-thoughts thanks for the reminder. I was able to find the issue, which is because aiohttps need to be >=3.8.4. After modify the library, the gcsfs was able to work properly and get the parquet file info. However, when I use ray.data.read_parquet() to read it, it will throw serialization error. Any idea? cc @jjyao @richardliaw
error from above code: The above exception was the direct cause of the following exception: ray::BaseHorovodWorker.execute() (pid=2269, ip=10.207.72.32, repr=<horovod.ray.worker.BaseHorovodWorker object at 0x7ffb871884a8>) The above exception was the direct cause of the following exception: ray::BaseHorovodWorker.execute() (pid=2269, ip=10.207.72.32, repr=<horovod.ray.worker.BaseHorovodWorker object at 0x7ffb871884a8>) |
It looks like that it's the |
I found the issue. The proper way of setting up the gcsfs, is to first store the token file locally on each distributed node. Then on each node, we need to set the env var GOOGLE_APPLICATION_CREDENTIALS to the token file name. |
Description
We are provisioning Ray cluster on GCP without applying the --autoscaling-config flag. In such a case, how to setup the Ray cluster so that pyarrow is able to read / write to GCS.
We have already made settings on our GCP project, with proper GCS bucket, service account, role, etc. When kubectl attached to the Ray node, we are able to using the service account's key (a json file) to create credentials and access GCS, with scripts like below:
But I haven't got python3.6 to work with gcsfs, or ray.data directly. So what's the proper way to set things up so that ray.data can access Google Storage on GCP?
Link
https://discuss.ray.io/t/google-cloud-storage-access-from-worker/1899/5
ray-project/kuberay#969
https://docs.ray.io/en/master/cluster/vms/user-guides/launching-clusters/gcp.html
https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/gcp/example-full.yaml
ray/python/ray/autoscaler/gcp/example-full.yaml
Line 42 in eacc763
The text was updated successfully, but these errors were encountered: