Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GCP] Setup Ray cluster on GCP so that it can read / write to Google Storage (GCS) #35140

Closed
yuduber opened this issue May 8, 2023 · 9 comments
Assignees
Labels
data Ray Data-related issues triage Needs triage (eg: priority, bug/not-bug, and owning component)

Comments

@yuduber
Copy link
Contributor

yuduber commented May 8, 2023

Description

We are provisioning Ray cluster on GCP without applying the --autoscaling-config flag. In such a case, how to setup the Ray cluster so that pyarrow is able to read / write to GCS.

We have already made settings on our GCP project, with proper GCS bucket, service account, role, etc. When kubectl attached to the Ray node, we are able to using the service account's key (a json file) to create credentials and access GCS, with scripts like below:

# working in py3.6 and py3.7
--
project = "uber-ma"
gcs_bucket = 'pyarrow_data_access_test'
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file('/tmp/training_token.json')
from google.cloud import storage
client = storage.Client(credentials=credentials)
blobs = client.list_blobs(gcs_bucket)
for blob in blobs:
print(blob.name)
 
# working in python3.7
project = "uber-ma"
gcs_bucket = 'pyarrow_data_access_test'
from google.oauth2 import service_account
SCOPES = ['https://www.googleapis.com/auth/cloud-platform']
SCOPES = ['https://www.googleapis.com/auth/devstorage.full_control']
# SCOPES = ["read_only"]
credentials = service_account.Credentials.from_service_account_file('/tmp/training_token.json', scopes=SCOPES)
from gcsfs import GCSFileSystem
fs = GCSFileSystem(project=project, token=credentials)
fs_ls = fs.ls(gcs_bucket)
fs.listdir(fs_ls[0])
# [{'kind': 'storage#object', 'id': 'pyarrow_data_access_test/boston_housing/_SUCCESS/1683348176291089', 'selfLink': 'https://www.googleapis.com/storage/v1/b/pyarrow_data_access_test/o/boston_housing%2F_SUCCESS', 'mediaLink':

But I haven't got python3.6 to work with gcsfs, or ray.data directly. So what's the proper way to set things up so that ray.data can access Google Storage on GCP?

Link

https://discuss.ray.io/t/google-cloud-storage-access-from-worker/1899/5
ray-project/kuberay#969
https://docs.ray.io/en/master/cluster/vms/user-guides/launching-clusters/gcp.html
https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/gcp/example-full.yaml

project_id: null # Globally unique project id

@yuduber yuduber added triage Needs triage (eg: priority, bug/not-bug, and owning component) docs An issue or change related to documentation labels May 8, 2023
@yuduber yuduber changed the title [GCP] Setup Ray cluster on GCP so that it can read / write to google storage [GCP] Setup Ray cluster on GCP so that it can read / write to Google Storage May 8, 2023
@yuduber yuduber changed the title [GCP] Setup Ray cluster on GCP so that it can read / write to Google Storage [GCP] Setup Ray cluster on GCP so that it can read / write to Google Storage (GCS) May 8, 2023
@richardliaw
Copy link
Contributor

@yuduber can you post some stacktraces?

@yuduber
Copy link
Contributor Author

yuduber commented May 8, 2023

This is on a ray node provisioned in our GKE project with proper settings of service account for teh gcs:// bucket. We verified the setting with regular google.cloud.storage.Client

[root@/ml-code #]python3
Python 3.6.9 (default, May 2 2023, 07:12:53)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

project = "uber-ma"
gcs_bucket = 'pyarrow_data_access_test'
gcs_file = 'pyarrow_data_access_test/boston_housing/part-00000-c4c05002-0eb0-4fb6-96e6-5bdbc1cee40d-c000.snappy.parquet'
from google.oauth2 import service_account
SCOPES = ['https://www.googleapis.com/auth/cloud-platform']
credentials = service_account.Credentials.from_service_account_file('/tmp/training_token.json', scopes=SCOPES)

from gcsfs import GCSFileSystem
fs = GCSFileSystem(project=project, token=credentials)
import ray
ds = ray.data.read_parquet(gcs_file, filesystem=fs)
Usage stats collection is enabled by default for nightly wheels. To disable this, run the following command: ray disable-usage-stats before starting Ray. See https://docs.ray.io/en/master/cluster/usage-stats.html for more details.
2023-05-08 21:55:52,378 WARNING services.py:1741 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 67108864 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=5.08gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
2023-05-08 21:55:52,514 INFO worker.py:1524 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/python3.6/site-packages/ray/data/read_api.py", line 397, in read_parquet
**arrow_parquet_args,
File "/usr/lib/python3.6/site-packages/ray/data/read_api.py", line 279, in read_datasource
_wrap_and_register_arrow_serialization_workaround(read_args),
File "/usr/lib/python3.6/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/ray/_private/worker.py", line 2288, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ClientConnectorError): ray::_get_read_tasks() (pid=799, ip=172.24.0.22)
File "/usr/lib/python3.6/asyncio/base_events.py", line 794, in create_connection
raise exceptions[0]
File "/usr/lib/python3.6/asyncio/base_events.py", line 781, in create_connection
yield from self.sock_connect(sock, address)
File "/usr/lib/python3.6/asyncio/selector_events.py", line 439, in sock_connect
return (yield from fut)
File "/usr/lib/python3.6/asyncio/selector_events.py", line 444, in _sock_connect
sock.connect(address)
OSError: [Errno 99] Cannot assign requested address

The above exception was the direct cause of the following exception:

ray::_get_read_tasks() (pid=799, ip=172.24.0.22)
File "/usr/lib/python3.6/site-packages/ray/data/read_api.py", line 1380, in _get_read_tasks
reader = ds.create_reader(**kwargs)
File "/usr/lib/python3.6/site-packages/ray/data/datasource/parquet_datasource.py", line 165, in create_reader
return _ParquetDatasourceReader(**kwargs)
File "/usr/lib/python3.6/site-packages/ray/data/datasource/parquet_datasource.py", line 193, in init
_handle_read_os_error(e, paths)
File "/usr/lib/python3.6/site-packages/ray/data/datasource/file_meta_provider.py", line 357, in _handle_read_os_error
raise error
File "/usr/lib/python3.6/site-packages/ray/data/datasource/parquet_datasource.py", line 190, in init
paths, **dataset_kwargs, filesystem=filesystem, use_legacy_dataset=False
File "/usr/lib/python3.6/site-packages/pyarrow/parquet.py", line 1322, in new
metadata_nthreads=metadata_nthreads
File "/usr/lib/python3.6/site-packages/pyarrow/parquet.py", line 1698, in init
if filesystem.get_file_info(path_or_paths).is_file:
File "pyarrow/_fs.pyx", line 439, in pyarrow._fs.FileSystem.get_file_info
File "pyarrow/error.pxi", line 143, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/_fs.pyx", line 1101, in pyarrow._fs._cb_get_file_info
File "/usr/lib/python3.6/site-packages/pyarrow/fs.py", line 307, in get_file_info
info = self.fs.info(path)
File "/usr/lib/python3.6/site-packages/fsspec/asyn.py", line 91, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/usr/lib/python3.6/site-packages/fsspec/asyn.py", line 71, in sync
raise return_result
File "/usr/lib/python3.6/site-packages/fsspec/asyn.py", line 25, in _runner
result[0] = await coro
File "/usr/lib/python3.6/site-packages/gcsfs/core.py", line 674, in _info
return await self._get_object(path)
File "/usr/lib/python3.6/site-packages/gcsfs/core.py", line 452, in _get_object
res = await self._call("GET", "b/{}/o/{}", bucket, key, json_out=True)
File "/usr/lib/python3.6/site-packages/gcsfs/core.py", line 387, in _call
method, path, *args, **kwargs
File "/usr/lib/python3.6/site-packages/decorator.py", line 221, in fun
return await caller(func, *(extras + args), **kw)
File "/usr/lib/python3.6/site-packages/gcsfs/retry.py", line 147, in retry_request
raise e
File "/usr/lib/python3.6/site-packages/gcsfs/retry.py", line 115, in retry_request
return await func(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/gcsfs/core.py", line 370, in _request
timeout=self.requests_timeout,
File "/usr/lib/python3.6/site-packages/aiohttp/client.py", line 1083, in aenter
self._resp = await self._coro
File "/usr/lib/python3.6/site-packages/aiohttp/client.py", line 493, in _request
timeout=real_timeout
File "/usr/lib/python3.6/site-packages/aiohttp/connector.py", line 528, in connect
proto = await self._create_connection(req, traces, timeout)
File "/usr/lib/python3.6/site-packages/aiohttp/connector.py", line 869, in _create_connection
req, traces, timeout)
File "/usr/lib/python3.6/site-packages/aiohttp/connector.py", line 1023, in _create_direct_connection
raise last_exc
File "/usr/lib/python3.6/site-packages/aiohttp/connector.py", line 1005, in _create_direct_connection
req=req, client_error=client_error)
File "/usr/lib/python3.6/site-packages/aiohttp/connector.py", line 953, in _wrap_create_connection
raise client_error(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host storage.googleapis.com:443 ssl:default [Cannot assign requested address]

(_get_read_tasks pid=799) _request out of retries on exception: Cannot connect to host storage.googleapis.com:443 ssl:default [Cannot assign requested address]
(_get_read_tasks pid=799) Traceback (most recent call last):
(_get_read_tasks pid=799) File "/usr/lib/python3.6/site-packages/aiohttp/connector.py", line 946, in _wrap_create_connection
(_get_read_tasks pid=799) return await self._loop.create_connection(*args, **kwargs) # type: ignore # noqa
(_get_read_tasks pid=799) File "/usr/lib/python3.6/asyncio/base_events.py", line 794, in create_connection
(_get_read_tasks pid=799) raise exceptions[0]
(_get_read_tasks pid=799) File "/usr/lib/python3.6/asyncio/base_events.py", line 781, in create_connection
(_get_read_tasks pid=799) yield from self.sock_connect(sock, address)
(_get_read_tasks pid=799) File "/usr/lib/python3.6/asyncio/selector_events.py", line 439, in sock_connect
(_get_read_tasks pid=799) return (yield from fut)
(_get_read_tasks pid=799) File "/usr/lib/python3.6/asyncio/selector_events.py", line 444, in _sock_connect
(_get_read_tasks pid=799) sock.connect(address)
(_get_read_tasks pid=799) OSError: [Errno 99] Cannot assign requested address
(_get_read_tasks pid=799)
(_get_read_tasks pid=799) The above exception was the direct cause of the following exception:
(_get_read_tasks pid=799)
(_get_read_tasks pid=799) Traceback (most recent call last):
(_get_read_tasks pid=799) File "/usr/lib/python3.6/site-packages/gcsfs/retry.py", line 115, in retry_request
(_get_read_tasks pid=799) return await func(*args, **kwargs)
(_get_read_tasks pid=799) File "/usr/lib/python3.6/site-packages/gcsfs/core.py", line 370, in _request
(_get_read_tasks pid=799) timeout=self.requests_timeout,
(_get_read_tasks pid=799) File "/usr/lib/python3.6/site-packages/aiohttp/client.py", line 1083, in aenter
(_get_read_tasks pid=799) self._resp = await self._coro
(_get_read_tasks pid=799) File "/usr/lib/python3.6/site-packages/aiohttp/client.py", line 493, in _request
(_get_read_tasks pid=799) timeout=real_timeout
(_get_read_tasks pid=799) File "/usr/lib/python3.6/site-packages/aiohttp/connector.py", line 528, in connect
(_get_read_tasks pid=799) proto = await self._create_connection(req, traces, timeout)
(_get_read_tasks pid=799) File "/usr/lib/python3.6/site-packages/aiohttp/connector.py", line 869, in _create_connection
(_get_read_tasks pid=799) req, traces, timeout)
(_get_read_tasks pid=799) File "/usr/lib/python3.6/site-packages/aiohttp/connector.py", line 1023, in _create_direct_connection
(_get_read_tasks pid=799) raise last_exc
(_get_read_tasks pid=799) File "/usr/lib/python3.6/site-packages/aiohttp/connector.py", line 1005, in _create_direct_connection
(_get_read_tasks pid=799) req=req, client_error=client_error)
(_get_read_tasks pid=799) File "/usr/lib/python3.6/site-packages/aiohttp/connector.py", line 953, in _wrap_create_connection
(_get_read_tasks pid=799) raise client_error(req.connection_key, exc) from exc
(_get_read_tasks pid=799) aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host storage.googleapis.com:443 ssl:default [Cannot assign requested address]

@zhe-thoughts
Copy link
Collaborator

@yuduber Have you tried Ray with Python 3.7+ (or, is the problem specific to when you use Ray with Python 3.6)?

@zhe-thoughts zhe-thoughts added data Ray Data-related issues and removed docs An issue or change related to documentation labels May 10, 2023
@zhe-thoughts
Copy link
Collaborator

Assigning to @raulchen for now but let's wait for more information from @yuduber first

@yuduber
Copy link
Contributor Author

yuduber commented May 11, 2023

@yuduber Have you tried Ray with Python 3.7+ (or, is the problem specific to when you use Ray with Python 3.6)?
this is happening under python3.6

@zhe-thoughts
Copy link
Collaborator

@yuduber I see. We are actually going to EOL support for Python 3.6 soon. cc @jjyao for details

@yuduber
Copy link
Contributor Author

yuduber commented May 12, 2023

@zhe-thoughts thanks for the reminder. I was able to find the issue, which is because aiohttps need to be >=3.8.4. After modify the library, the gcsfs was able to work properly and get the parquet file info. However, when I use ray.data.read_parquet() to read it, it will throw serialization error. Any idea? cc @jjyao @richardliaw

from ray.util import inspect_serializability

gcs_files = [
    'pyarrow_data_access_test/boston_housing/part-00000-c4c05002-0eb0-4fb6-96e6-5bdbc1cee40d-c000.snappy.parquet',
    'pyarrow_data_access_test/boston_housing/part-00001-c4c05002-0eb0-4fb6-96e6-5bdbc1cee40d-c000.snappy.parquet'
]

def ray_read_parqeut():
    project = "uber-ma"
    gcs_bucket = 'pyarrow_data_access_test'
    from google.oauth2 import service_account
    SCOPES = ['https://www.googleapis.com/auth/cloud-platform']
    credentials = service_account.Credentials.from_service_account_file('/tmp/training_token.json', scopes=SCOPES)
    from gcsfs import GCSFileSystem
    fs = GCSFileSystem(project=project, token=credentials)
    print(fs.ls(gcs_bucket)) # properly generated output of bucket content indicating GCSFileSystem working fine
    import ray
    df = ray.data.read_parquet(gcs_files, filesystem=fs)
    print(f'1st try fs:{fs} df_head:{df.head()}')

inspect_serializability(ray_read_parqeut, name="test_ray_read_parqeut")
=========================================================================
Checking Serializability of <function ray_read_parqeut at 0x7fab721a91e0>
=========================================================================
(True, set())

ray_context.run(ray_read_parquet)  
# ray_context is just a wrapper for a generic @ray.remote decorated function, 
# which will decorate ray_read_parquet with ray.remote and execute it on remote ray cluster head node.

error from above code:
(BaseHorovodWorker pid=2268) import cryptography.exceptions
ray_dl ray_executor.run throw:<class 'types.RayTaskError(TypeError)'>:ray::StaticAdapter.run() (pid=2219, ip=10.207.72.32, repr=<horovod.ray.runner.StaticAdapter object at 0x7f86714ff3c8>)
File "/usr/lib/python3.6/site-packages/horovod/ray/runner.py", line 599, in run
return ray.get(self._run_remote(fn=f))
ray.exceptions.RayTaskError(TypeError): ray::BaseHorovodWorker.execute() (pid=2269, ip=10.207.72.32, repr=<horovod.ray.worker.BaseHorovodWorker object at 0x7ffb871884a8>)
return self._serialize_to_msgpack(value)
File "/usr/lib/python3.6/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 73, in dumps
cp.dump(obj)
File "/usr/lib/python3.6/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 627, in dump
return Pickler.dump(self, obj)
TypeError: can't pickle _cffi_backend.FFI objects

The above exception was the direct cause of the following exception:

ray::BaseHorovodWorker.execute() (pid=2269, ip=10.207.72.32, repr=<horovod.ray.worker.BaseHorovodWorker object at 0x7ffb871884a8>)
File "/usr/lib/python3.6/site-packages/horovod/ray/runner.py", line 598, in
f = lambda w: fn(*args, **kwargs)
File "/ml-code/data/michelangelo/rayre/ray_dl.py", line 300, in handled_func
raise e
File "/ml-code/data/michelangelo/rayre/ray_dl.py", line 292, in handled_func
ret = fn(*args, **kwargs)
File "/ml-code/data/michelangelo/rayre/ray_base.py", line 1026, in handled_func
raise e
File "/ml-code/data/michelangelo/rayre/ray_base.py", line 1021, in handled_func
ret = func(*args, **kwargs)
File "/ml-code/data/michelangelo/rayre/ray_base.py", line 1043, in handled_func
raise e
File "/ml-code/data/michelangelo/rayre/ray_base.py", line 1036, in handled_func
return func(*args, **kwargs)
File "", line 15, in ray_read_parqeut
File "/usr/lib/python3.6/site-packages/ray/data/read_api.py", line 397, in read_parquet
**arrow_parquet_args,
File "/usr/lib/python3.6/site-packages/ray/data/read_api.py", line 279, in read_datasource
_wrap_and_register_arrow_serialization_workaround(read_args),
File "/usr/lib/python3.6/site-packages/ray/remote_function.py", line 122, in _remote_proxy
return self._remote(args=args, kwargs=kwargs, **self._default_options)
File "/usr/lib/python3.6/site-packages/ray/remote_function.py", line 402, in _remote
return invocation(args, kwargs)
File "/usr/lib/python3.6/site-packages/ray/remote_function.py", line 389, in invocation
serialized_runtime_env_info or "{}",
TypeError: Could not serialize the argument {'paths': ['pyarrow_data_access_test/boston_housing/part-00000-c4c05002-0eb0-4fb6-96e6-5bdbc1cee40d-c000.snappy.parquet', 'pyarrow_data_access_test/boston_housing/part-00001-c4c05002-0eb0-4fb6-96e6-5bdbc1cee40d-c000.snappy.parquet'], 'filesystem': <gcsfs.core.GCSFileSystem object at 0x7ffb82193fd0>, 'columns': None, 'meta_provider': <ray.data.datasource.file_meta_provider.DefaultParquetMetadataProvider object at 0x7ffb89362320>} for a task or actor ray.data.read_api._get_read_tasks. Check https://docs.ray.io/en/master/ray-core/objects/serialization.html#troubleshooting for more information.
Traceback (most recent call last):
File "/ml-code/data/michelangelo/rayre/ray_dl.py", line 315, in run
ret = self.ray_executor.run(handled_func, args, kwargs)
File "/usr/lib/python3.6/site-packages/horovod/ray/runner.py", line 376, in run
return self.maybe_call_ray(self.adapter.run, **kwargs)
File "/usr/lib/python3.6/site-packages/horovod/ray/runner.py", line 419, in _maybe_call_ray
return ray.get(driver_func.remote(*args, **kwargs))
File "/usr/lib/python3.6/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
return getattr(ray, func.name)(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/ray/util/client/api.py", line 42, in get
return self.worker.get(vals, timeout=timeout)
File "/usr/lib/python3.6/site-packages/ray/util/client/worker.py", line 434, in get
res = self._get(to_get, op_timeout)
File "/usr/lib/python3.6/site-packages/ray/util/client/worker.py", line 462, in _get
raise err
types.RayTaskError(TypeError): ray::StaticAdapter.run() (pid=2219, ip=10.207.72.32, repr=<horovod.ray.runner.StaticAdapter object at 0x7f86714ff3c8>)
File "/usr/lib/python3.6/site-packages/horovod/ray/runner.py", line 599, in run
return ray.get(self._run_remote(fn=f))
ray.exceptions.RayTaskError(TypeError): ray::BaseHorovodWorker.execute() (pid=2269, ip=10.207.72.32, repr=<horovod.ray.worker.BaseHorovodWorker object at 0x7ffb871884a8>)
return self._serialize_to_msgpack(value)
File "/usr/lib/python3.6/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 73, in dumps
cp.dump(obj)
File "/usr/lib/python3.6/site-packages/ray/cloudpickle/cloudpickle_fast.py", line 627, in dump
return Pickler.dump(self, obj)
TypeError: can't pickle _cffi_backend.FFI objects

The above exception was the direct cause of the following exception:

ray::BaseHorovodWorker.execute() (pid=2269, ip=10.207.72.32, repr=<horovod.ray.worker.BaseHorovodWorker object at 0x7ffb871884a8>)
File "/usr/lib/python3.6/site-packages/horovod/ray/runner.py", line 598, in
f = lambda w: fn(*args, **kwargs)
File "/ml-code/data/michelangelo/rayre/ray_dl.py", line 300, in handled_func
raise e
File "/ml-code/data/michelangelo/rayre/ray_dl.py", line 292, in handled_func
ret = fn(*args, **kwargs)
File "/ml-code/data/michelangelo/rayre/ray_base.py", line 1026, in handled_func
raise e
File "/ml-code/data/michelangelo/rayre/ray_base.py", line 1021, in handled_func
ret = func(*args, **kwargs)
File "/ml-code/data/michelangelo/rayre/ray_base.py", line 1043, in handled_func
raise e
File "/ml-code/data/michelangelo/rayre/ray_base.py", line 1036, in handled_func
return func(*args, **kwargs)
File "", line 15, in ray_read_parqeut
File "/usr/lib/python3.6/site-packages/ray/data/read_api.py", line 397, in read_parquet
**arrow_parquet_args,
File "/usr/lib/python3.6/site-packages/ray/data/read_api.py", line 279, in read_datasource
_wrap_and_register_arrow_serialization_workaround(read_args),
File "/usr/lib/python3.6/site-packages/ray/remote_function.py", line 122, in _remote_proxy
return self._remote(args=args, kwargs=kwargs, **self._default_options)
File "/usr/lib/python3.6/site-packages/ray/remote_function.py", line 402, in _remote
return invocation(args, kwargs)
File "/usr/lib/python3.6/site-packages/ray/remote_function.py", line 389, in invocation
serialized_runtime_env_info or "{}",
TypeError: Could not serialize the argument {'paths': ['pyarrow_data_access_test/boston_housing/part-00000-c4c05002-0eb0-4fb6-96e6-5bdbc1cee40d-c000.snappy.parquet', 'pyarrow_data_access_test/boston_housing/part-00001-c4c05002-0eb0-4fb6-96e6-5bdbc1cee40d-c000.snappy.parquet'], 'filesystem': <gcsfs.core.GCSFileSystem object at 0x7ffb82193fd0>, 'columns': None, 'meta_provider': <ray.data.datasource.file_meta_provider.DefaultParquetMetadataProvider object at 0x7ffb89362320>} for a task or actor ray.data.read_api._get_read_tasks. Check https://docs.ray.io/en/master/ray-core/objects/serialization.html#troubleshooting for more information.

@raulchen
Copy link
Contributor

It looks like that it's the fs object that is not serializable. Can you also try inspect_serializability(fs)?

@yuduber
Copy link
Contributor Author

yuduber commented May 15, 2023

I found the issue. The proper way of setting up the gcsfs, is to first store the token file locally on each distributed node. Then on each node, we need to set the env var GOOGLE_APPLICATION_CREDENTIALS to the token file name.
Then directly use gcsfs without explicitly specify the token, jut like this:
import gcsfs
fs = gcsfs.GCSFileSystem(project=your_project_name)
fs_ls = fs.ls(your_bucket_name)
print(f'fs_ls:{fs_ls}')
import ray
ds = ray.data.read_parquet(fs_ls[0], filesystem=fs)

@yuduber yuduber closed this as completed May 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data Ray Data-related issues triage Needs triage (eg: priority, bug/not-bug, and owning component)
Projects
None yet
Development

No branches or pull requests

4 participants