Skip to content

Commit 43cf8a1

Browse files
committed
Merge branch 'main' of https://github.com/lancedb/lance into add-avg-loss
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
2 parents 52b27c7 + 15420d5 commit 43cf8a1

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+3063
-1606
lines changed

Cargo.lock

+39-25
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

README.md

+13-8
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,13 @@
33

44
<img width="257" alt="Lance Logo" src="https://user-images.githubusercontent.com/917119/199353423-d3e202f7-0269-411d-8ff2-e747e419e492.png">
55

6-
**Modern columnar data format for ML. Convert from Parquet in 2-lines of code for 100x faster random access, a vector index, data versioning, and more.<br/>**
7-
**Compatible with pandas, DuckDB, Polars, and pyarrow with more integrations on the way.**
6+
**Modern columnar data format for ML. Convert from Parquet in 2-lines of code for 100x faster random access, zero-cost schema evolution, rich secondary indices, versioning, and more.<br/>**
7+
**Compatible with Pandas, DuckDB, Polars, Pyarrow, and Ray with more integrations on the way.**
88

99
<a href="https://lancedb.github.io/lance/">Documentation</a> •
1010
<a href="https://blog.lancedb.com/">Blog</a> •
1111
<a href="https://discord.gg/zMM32dvNtd">Discord</a> •
12-
<a href="https://twitter.com/lancedb">Twitter</a>
12+
<a href="https://x.com/lancedb">X</a>
1313

1414
[CI]: https://github.com/lancedb/lance/actions/workflows/rust.yml
1515
[CI Badge]: https://github.com/lancedb/lance/actions/workflows/rust.yml/badge.svg
@@ -44,7 +44,7 @@ The key features of Lance include:
4444

4545
* **Zero-copy, automatic versioning:** manage versions of your data without needing extra infrastructure.
4646

47-
* **Ecosystem integrations:** Apache Arrow, Pandas, Polars, DuckDB and more on the way.
47+
* **Ecosystem integrations:** Apache Arrow, Pandas, Polars, DuckDB, Ray, Spark and more on the way.
4848

4949
> [!TIP]
5050
> Lance is in active development and we welcome contributions. Please see our [contributing guide](docs/contributing.rst) for more information.
@@ -66,7 +66,7 @@ pip install --pre --extra-index-url https://pypi.fury.io/lancedb/ pylance
6666
> [!TIP]
6767
> Preview releases are released more often than full releases and contain the
6868
> latest features and bug fixes. They receive the same level of testing as full releases.
69-
> We guarantee they will remain published and available for download for at
69+
> We guarantee they will remain published and available for download for at
7070
> least 6 months. When you want to pin to a specific version, prefer a stable release.
7171
7272
**Converting to Lance**
@@ -186,8 +186,8 @@ Support both CPUs (``x86_64`` and ``arm``) and GPU (``Nvidia (cuda)`` and ``Appl
186186

187187
**Fast updates** (ROADMAP): Updates will be supported via write-ahead logs.
188188

189-
**Rich secondary indices** (ROADMAP):
190-
- Inverted index for fuzzy search over many label / annotation fields.
189+
**Rich secondary indices**: Support `BTree`, `Bitmap`, `Full text search`, `Label list`,
190+
`NGrams`, and more.
191191

192192
## Benchmarks
193193

@@ -253,11 +253,16 @@ A comparison of different data formats in each stage of ML development cycle.
253253

254254
Lance is currently used in production by:
255255
* [LanceDB](https://github.com/lancedb/lancedb), a serverless, low-latency vector database for ML applications
256+
* [LanceDB Enterprise](https://docs.lancedb.com/enterprise/introduction), hyperscale LanceDB with enterprise SLA.
257+
* Leading multimodal Gen AI companies for training over petabyte-scale multimodal data.
256258
* Self-driving car company for large-scale storage, retrieval and processing of multi-modal data.
257259
* E-commerce company for billion-scale+ vector personalized search.
258260
* and more.
259261

260-
## Presentations and Talks
262+
## Presentations, Blogs and Talks
261263

264+
* [Designing a Table Format for ML Workloads](https://blog.lancedb.com/designing-a-table-format-for-ml-workloads/), Feb 2025.
265+
* [Transforming Multimodal Data Management with LanceDB, Ray Summit](https://www.youtube.com/watch?v=xmTFEzAh8ho), Oct 2024.
266+
* [Lance v2: A columnar container format for modern data](https://blog.lancedb.com/lance-v2/), Apr 2024.
262267
* [Lance Deep Dive](https://drive.google.com/file/d/1Orh9rK0Mpj9zN_gnQF1eJJFpAc6lStGm/view?usp=drive_link). July 2023.
263268
* [Lance: A New Columnar Data Format](https://docs.google.com/presentation/d/1a4nAiQAkPDBtOfXFpPg7lbeDAxcNDVKgoUkw3cUs2rE/edit#slide=id.p), [Scipy 2022, Austin, TX](https://www.scipy2022.scipy.org/posters). July, 2022.

deny.toml

+1
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,7 @@ ignore = [
8383
{ id = "RUSTSEC-2021-0153", reason = "`encoding` is used by lindera" },
8484
{ id = "RUSTSEC-2024-0384", reason = "`instant` is used by tantivy" },
8585
{ id = "RUSTSEC-2024-0436", reason = "`paste` is used by datafusion" },
86+
{ id = "RUSTSEC-2025-0014", reason = "`humantime` is used by object_store" },
8687
]
8788
# If this is true, then cargo deny will use the git executable to fetch advisory database.
8889
# If this is false, then it uses a built-in git library.

docs/api/api.rst

+3-2
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ APIs
22
----
33

44
.. toctree::
5+
:maxdepth: 1
56

6-
Rust <https://docs.rs/crate/lance/latest>
7-
Python <./python.rst>
7+
Rust <https://docs.rs/crate/lance/latest>
8+
Python <./python.rst>

docs/arrays.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ a 32-bit float: ~1e-38 to ~1e38. By comparison, a 16-bit float has a range of
1414
~5.96e-8 to 65504.
1515

1616
Lance provides an Arrow extension array (:class:`lance.arrow.BFloat16Array`)
17-
and a Pandas extension array (:class:`lance.pandas.BFloat16Dtype`) for BFloat16.
17+
and a Pandas extension array (:class:`~lance._arrow.PandasBFloat16Type`) for BFloat16.
1818
These are compatible with the `ml_dtypes <https://github.com/jax-ml/ml_dtypes>`_
1919
bfloat16 NumPy extension array.
2020

@@ -31,7 +31,7 @@ the array:
3131
2 3.40625
3232
dtype: lance.bfloat16
3333

34-
To create an an arrow array, use the :func:`lance.arrow.bfloat16_array` function:
34+
To create an Arrow array, use the :func:`lance.arrow.bfloat16_array` function:
3535

3636
.. code-block:: python
3737

docs/blob.rst

-5
Original file line numberDiff line numberDiff line change
@@ -10,11 +10,6 @@ Lance provides a high-level API to store and retrieve large binary objects (blob
1010
Lance serves large binary data using :py:class:`lance.BlobFile`, which
1111
is a file-like object that lazily reads large binary objects.
1212

13-
.. autoclass:: lance.BlobFile
14-
:members:
15-
:show-inheritance:
16-
:noindex:
17-
1813
To fetch blobs from a Lance dataset, you can use :py:meth:`lance.dataset.LanceDataset.take_blobs`.
1914

2015
For example, it's easy to use `BlobFile` to extract frames from a video file without

docs/conf.py

+44-21
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,10 @@
11
# Configuration file for the Sphinx documentation builder.
22

3-
import shutil
4-
from datetime import datetime
5-
6-
7-
def run_apidoc(_):
8-
from sphinx.ext.apidoc import main
9-
10-
shutil.rmtree("api/python", ignore_errors=True)
11-
main(["-f", "-o", "api/python", "../python/python/lance"])
12-
13-
14-
def setup(app):
15-
app.connect("builder-inited", run_apidoc)
16-
173

184
# -- Project information -----------------------------------------------------
195

206
project = "Lance"
21-
copyright = f"{datetime.today().year}, Lance Developer"
7+
copyright = "%Y, Lance Developer"
228
author = "Lance Developer"
239

2410

@@ -29,7 +15,8 @@ def setup(app):
2915
# ones.
3016
extensions = [
3117
"breathe",
32-
"sphinx_copybutton",
18+
"sphinx_immaterial",
19+
"sphinx_immaterial.apidoc.python.apigen",
3320
"sphinx.ext.autodoc",
3421
"sphinx.ext.doctest",
3522
"sphinx.ext.githubpages",
@@ -56,25 +43,61 @@ def setup(app):
5643
"numpy": ("https://numpy.org/doc/stable/", None),
5744
"pyarrow": ("https://arrow.apache.org/docs/", None),
5845
"pandas": ("https://pandas.pydata.org/pandas-docs/stable/", None),
46+
"ray": ("https://docs.ray.io/en/latest/", None),
5947
}
6048

49+
python_apigen_modules = {
50+
"lance": "api/python/",
51+
}
52+
object_description_options = [
53+
(
54+
"py:.*",
55+
dict(
56+
include_object_type_in_xref_tooltip=False,
57+
include_in_toc=False,
58+
include_fields_in_toc=False,
59+
),
60+
),
61+
]
6162

6263
# -- Options for HTML output -------------------------------------------------
6364

64-
html_theme = "piccolo_theme"
65+
html_theme = "sphinx_immaterial"
6566

6667
# Add any paths that contain custom static files (such as style sheets) here,
6768
# relative to this directory. They are copied after the builtin static files,
6869
# so a file named "default.css" will overwrite the builtin "default.css".
6970
html_static_path = ["_static"]
7071

7172
html_favicon = "_static/favicon_64x64.png"
72-
# html_logo = "_static/high-res-icon.png"
73+
html_logo = "_static/high-res-icon.png"
7374
html_theme_options = {
74-
"source_url": "https://github.com/lancedb/lance",
75-
"source_icon": "github",
75+
"icon": {
76+
"repo": "fontawesome/brands/github",
77+
"edit": "material/file-edit-outline",
78+
},
79+
"site_url": "https://github.com/lancedb/lance",
80+
"repo_url": "https://github.com/lancedb/lance",
81+
"repo_name": "Lance",
82+
"features": [
83+
"navigation.expand",
84+
# "navigation.tabs",
85+
"content.tabs.link",
86+
"content.code.copy",
87+
],
88+
"social": [
89+
{
90+
"icon": "fontawesome/brands/github",
91+
"link": "https://github.com/jbms/sphinx-immaterial",
92+
"name": "Source on github.com",
93+
},
94+
{
95+
"icon": "fontawesome/brands/python",
96+
"link": "https://pypi.org/project/pylance/",
97+
},
98+
],
7699
}
77-
html_css_files = ["custom.css"]
100+
78101

79102
# -- doctest configuration ---------------------------------------------------
80103

docs/index.rst

+25-5
Original file line numberDiff line numberDiff line change
@@ -39,15 +39,35 @@ Preview releases receive the same level of testing as regular releases.
3939

4040

4141
.. toctree::
42-
:maxdepth: 1
42+
:caption: Introduction
43+
:maxdepth: 2
4344

4445
Quickstart <./notebooks/quickstart>
45-
./read_and_write
46-
Lance Formats <./format>
47-
Arrays <./arrays>
46+
./introduction/read_and_write
47+
./introduction/schema_evolution
48+
49+
.. toctree::
50+
:caption: Advanced Usage
51+
:maxdepth: 1
52+
53+
Lance Format Spec <./format>
4854
Blob API <./blob>
49-
Integrations <./integrations/integrations>
55+
Object Store Configuration <./object_store>
5056
Performance Guide <./performance>
57+
Tokenizer <./tokenizer>
58+
Extension Arrays <./arrays>
59+
60+
.. toctree::
61+
:caption: Integrations
62+
63+
Huggingface <./integrations/huggingface>
64+
Tensorflow <./integrations/tensorflow>
65+
PyTorch <./integrations/pytorch>
66+
Ray <./integrations/ray>
67+
68+
.. toctree::
69+
:maxdepth: 1
70+
5171
API References <./api/api>
5272
Contributor Guide <./contributing>
5373
Examples <./examples/examples>

docs/integrations/integrations.rst

-10
This file was deleted.

0 commit comments

Comments
 (0)