|
1 | 1 | Lance ❤️ Ray
|
2 | 2 | --------------------
|
3 | 3 |
|
4 |
| -Ray effortlessly scale up ML workload to large distributed compute environment. |
| 4 | +`Ray <https://www.anyscale.com/product/open-source/ray>`_ effortlessly scale up ML workload to large distributed |
| 5 | +compute environment. |
5 | 6 |
|
6 |
| -`Ray Data <https://docs.ray.io/en/latest/data/data.html>`_ can be directly written in Lance format by using the |
7 |
| -:class:`lance.ray.sink.LanceDatasink` class. For example: |
| 7 | +Lance format is one of the official `Ray data sources <https://docs.ray.io/en/latest/data/api/input_output.html#lance>`_: |
8 | 8 |
|
9 |
| -.. code-block:: bash |
| 9 | +* Lance Data Source :py:meth:`ray.data.read_lance` |
| 10 | +* Lance Data Sink :py:meth:`ray.data.Dataste.write_lance` |
10 | 11 |
|
11 |
| - pip install pylance[ray] |
| 12 | +.. testsetup:: |
12 | 13 |
|
| 14 | + shutil.rmtree("./alice_bob_and_charlie.lance", ignore_errors=True) |
13 | 15 |
|
14 |
| -``Ray Data Dataset`` can be written to Lance format using the following code: |
15 |
| - |
16 |
| -.. code-block:: python |
| 16 | +.. testcode:: |
17 | 17 |
|
18 | 18 | import ray
|
19 |
| - from lance.ray.sink import LanceDatasink |
20 | 19 |
|
21 | 20 | ray.init()
|
22 | 21 |
|
23 |
| - sink = LanceDatasink("s3://bucket/to/data.lance") |
24 |
| - ray.data.range(10).map( |
25 |
| - lambda x: {"id": x["id"], "str": f"str-{x['id']}"} |
26 |
| - ).write_datasink(sink) |
| 22 | + data = [ |
| 23 | + {"id": 1, "name": "alice"}, |
| 24 | + {"id": 2, "name": "bob"}, |
| 25 | + {"id": 3, "name": "charlie"} |
| 26 | + ] |
| 27 | + ray.data.from_items(data).write_lance("./alice_bob_and_charlie.lance") |
| 28 | + |
| 29 | + # It can be read via lance directly |
| 30 | + tbl = lance.dataset("./alice_bob_and_charlie.lance").to_table() |
| 31 | + assert tbl == pa.Table.from_pylist(data) |
27 | 32 |
|
| 33 | + # Or via Ray.data.read_lance |
| 34 | + pd_df = ray.data.read_lance("./alice_bob_and_charlie.lance").to_pandas() |
| 35 | + assert tbl == pa.Table.from_pandas(pd_df) |
0 commit comments