Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: do brute force search on unindexed data #3036

Merged
merged 15 commits into from
Oct 31, 2024
Prev Previous commit
Next Next commit
fix
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
BubbleCal committed Oct 24, 2024
commit 79082cb698c7457bb876e722d26253d88c0d7a5d
5 changes: 3 additions & 2 deletions rust/lance-index/src/scalar/inverted/index.rs
Original file line number Diff line number Diff line change
@@ -1007,6 +1007,7 @@ fn do_flat_full_text_search<Offset: OffsetSizeTrait>(
Ok(results)
}

#[allow(clippy::too_many_arguments)]
pub fn flat_bm25_search(
batch: RecordBatch,
doc_col: &str,
@@ -1017,7 +1018,7 @@ pub fn flat_bm25_search(
avgdl: f32,
num_docs: usize,
) -> std::result::Result<RecordBatch, DataFusionError> {
let doc_iter = iter_str_array(&batch[&doc_col]);
let doc_iter = iter_str_array(&batch[doc_col]);
let mut scores = Vec::with_capacity(batch.num_rows());
for doc in doc_iter {
let doc = match doc {
@@ -1057,7 +1058,7 @@ pub fn flat_bm25_search(

let score_col = Arc::new(Float32Array::from(scores)) as ArrayRef;
let batch = batch
.drop_column(&doc_col)?
.drop_column(doc_col)?
.try_with_column(SCORE_FIELD.clone(), score_col)?;
Ok(batch)
}

Unchanged files with check annotations Beta

pub fn remove_stream<'a>(
&'a self,
locations: BoxStream<'a, Result<Path>>,
) -> BoxStream<Result<Path>> {

Check warning on line 638 in rust/lance-io/src/object_store.rs

GitHub Actions / linux-build (nightly)

elided lifetime has a name
self.inner
.delete_stream(locations.err_into::<ObjectStoreError>().boxed())
.err_into::<Error>()
fn cast_dictionary_arrays<'a, T: ArrowDictionaryKeyType + 'static>(
arrays: &'a [&'a ArrayRef],
) -> Vec<&Arc<dyn Array>> {

Check warning on line 489 in rust/lance-file/src/writer/statistics.rs

GitHub Actions / linux-build (nightly)

elided lifetime has a name
arrays
.iter()
.map(|x| x.as_dictionary::<T>().values())
fn search_values<'a>(
&'a self,
values: &'a Vec<ScalarValue>,
) -> BoxStream<Result<RowIdTreeMap>> {

Check warning on line 81 in rust/lance-index/src/scalar/label_list.rs

GitHub Actions / linux-build (nightly)

elided lifetime has a name
futures::stream::iter(values)
.then(move |value| {
let value_query = SargableQuery::Equals(value.clone());