Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: avoid copying of creating memory dist calculator #2219

Merged
merged 4 commits into from
Apr 23, 2024

Conversation

BubbleCal
Copy link
Contributor

No description provided.

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
@codecov-commenter
Copy link

codecov-commenter commented Apr 18, 2024

Codecov Report

Attention: Patch coverage is 85.29412% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 81.01%. Comparing base (99008c6) to head (7257e5b).

Files Patch % Lines
rust/lance-linalg/src/matrix.rs 72.72% 3 Missing ⚠️
rust/lance-index/src/vector/graph/memory.rs 87.50% 1 Missing ⚠️
rust/lance/src/index/vector/hnsw.rs 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2219      +/-   ##
==========================================
- Coverage   81.06%   81.01%   -0.05%     
==========================================
  Files         186      186              
  Lines       53745    53746       +1     
  Branches    53745    53746       +1     
==========================================
- Hits        43566    43545      -21     
- Misses       7704     7710       +6     
- Partials     2475     2491      +16     
Flag Coverage Δ
unittests 81.01% <85.29%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@eddyxu eddyxu changed the title perf: erase copying of creating memory dist calculator perf: avoid copying of creating memory dist calculator Apr 18, 2024
metric_type: self.metric_type,
})
}
}

struct InMemoryDistanceCal {
vectors: Arc<MatrixView<Float32Type>>,
query: Vec<f32>,
query: *const f32,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use &[f32] with lifetime? Using raw pointer means that the lifetime of query is not guaranteed when the distance() method is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i tried this but found that the dist_calc is created by the trait method of VectorStorage, so it's impossible to add lifetime for this.

any better way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we don't care about locality
Arc<[f32]> works

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably make queries some sort of Arc<[T]> early in the query, so it's cheaply clone-able.

Copy link
Contributor

@chebbyChefNEQ chebbyChefNEQ Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit more pedantic. we probably want something like this https://github.com/pytorch/pytorch/blob/main/c10/util/intrusive_ptr.h

but I'm not sure if there is an equivalent that is easily implementable in rust

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the vector in Query is Arc<dyn Array>, changed the type from &[f32] to ArrayRef

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Comment on lines +150 to +162
pub fn row(&self, i: usize) -> Option<ArrayRef> {
assert!(
!self.transpose,
"Centroid is not defined for transposed matrix."
);
if i >= self.num_rows() {
None
} else {
let dim = self.num_columns();
Some(self.data.slice(i * dim, dim))
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used?

Copy link
Contributor Author

@BubbleCal BubbleCal Apr 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, see

    pub fn vector(&self, id: u32) -> ArrayRef {
        self.vectors.row(id as usize).unwrap()
    }

@BubbleCal BubbleCal force-pushed the avoid-copying-query branch from d1da882 to 7257e5b Compare April 23, 2024 13:56
@BubbleCal BubbleCal merged commit 1e08425 into lancedb:main Apr 23, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants