Skip to content

Commit eddb670

Browse files
authored
docs: update ray integration and move schema evolution doc to a separate doc (#3530)
* Move `object store config` into a new page * Update ray doc to include official lance sink / source * Move `schema evolution` to separate doc
1 parent c12fc3b commit eddb670

File tree

8 files changed

+1069
-1059
lines changed

8 files changed

+1069
-1059
lines changed

docs/conf.py

+1
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ def setup(app):
5555
"numpy": ("https://numpy.org/doc/stable/", None),
5656
"pyarrow": ("https://arrow.apache.org/docs/", None),
5757
"pandas": ("https://pandas.pydata.org/pandas-docs/stable/", None),
58+
"ray": ("https://docs.ray.io/en/latest/", None),
5859
}
5960

6061

docs/index.rst

+3-1
Original file line numberDiff line numberDiff line change
@@ -43,14 +43,16 @@ Preview releases receive the same level of testing as regular releases.
4343
:maxdepth: 2
4444

4545
Quickstart <./notebooks/quickstart>
46-
./read_and_write
46+
./introduction/read_and_write
47+
./introduction/schema_evolution
4748

4849
.. toctree::
4950
:caption: Advanced Usage
5051
:maxdepth: 1
5152

5253
Lance Format Spec <./format>
5354
Blob API <./blob>
55+
Object Store Configuration <./object_store>
5456
Performance Guide <./performance>
5557
Tokenizer <./tokenizer>
5658
Extension Arrays <./arrays>

docs/integrations/ray.rst

+21-13
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,35 @@
11
Lance ❤️ Ray
22
--------------------
33

4-
Ray effortlessly scale up ML workload to large distributed compute environment.
4+
`Ray <https://www.anyscale.com/product/open-source/ray>`_ effortlessly scale up ML workload to large distributed
5+
compute environment.
56

6-
`Ray Data <https://docs.ray.io/en/latest/data/data.html>`_ can be directly written in Lance format by using the
7-
:class:`lance.ray.sink.LanceDatasink` class. For example:
7+
Lance format is one of the official `Ray data sources <https://docs.ray.io/en/latest/data/api/input_output.html#lance>`_:
88

9-
.. code-block:: bash
9+
* Lance Data Source :py:meth:`ray.data.read_lance`
10+
* Lance Data Sink :py:meth:`ray.data.Dataste.write_lance`
1011

11-
pip install pylance[ray]
12+
.. testsetup::
1213

14+
shutil.rmtree("./alice_bob_and_charlie.lance", ignore_errors=True)
1315

14-
``Ray Data Dataset`` can be written to Lance format using the following code:
15-
16-
.. code-block:: python
16+
.. testcode::
1717

1818
import ray
19-
from lance.ray.sink import LanceDatasink
2019

2120
ray.init()
2221

23-
sink = LanceDatasink("s3://bucket/to/data.lance")
24-
ray.data.range(10).map(
25-
lambda x: {"id": x["id"], "str": f"str-{x['id']}"}
26-
).write_datasink(sink)
22+
data = [
23+
{"id": 1, "name": "alice"},
24+
{"id": 2, "name": "bob"},
25+
{"id": 3, "name": "charlie"}
26+
]
27+
ray.data.from_items(data).write_lance("./alice_bob_and_charlie.lance")
28+
29+
# It can be read via lance directly
30+
tbl = lance.dataset("./alice_bob_and_charlie.lance").to_table()
31+
assert tbl == pa.Table.from_pylist(data)
2732

33+
# Or via Ray.data.read_lance
34+
pd_df = ray.data.read_lance("./alice_bob_and_charlie.lance").to_pandas()
35+
assert tbl == pa.Table.from_pandas(pd_df)

0 commit comments

Comments
 (0)