Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: concurrent loading FTS index files #2787

Merged
merged 1 commit into from
Sep 9, 2024

Conversation

BubbleCal
Copy link
Contributor

@BubbleCal BubbleCal commented Aug 26, 2024

get 30% improvement with the concurrent loading,
can help reduce the cold latency of full text search

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
@BubbleCal BubbleCal changed the title perf: concurrent load FTS index files perf: concurrent loading FTS index files Aug 26, 2024
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 67.85714% with 9 lines in your changes missing coverage. Please review.

Project coverage is 79.28%. Comparing base (144d207) to head (7ced7dc).

Files Patch % Lines
rust/lance-index/src/scalar/inverted/index.rs 67.85% 0 Missing and 9 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2787      +/-   ##
==========================================
- Coverage   79.28%   79.28%   -0.01%     
==========================================
  Files         227      227              
  Lines       68269    68290      +21     
  Branches    68269    68290      +21     
==========================================
+ Hits        54126    54142      +16     
  Misses      11019    11019              
- Partials     3124     3129       +5     
Flag Coverage Δ
unittests 79.28% <67.85%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@BubbleCal BubbleCal marked this pull request as ready for review August 26, 2024 13:12
Copy link
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could simplify this a bit. If you found there was some speedup from doing the spawn, consider looking if there was CPU-bound work you can instead submit to the CPU threadpool as a task. Checkout Weston's PR documenting our threadpools here: https://github.com/lancedb/lance/pull/2773/files

Comment on lines +232 to +255
let tokens_fut = tokio::spawn({
let store = store.clone();
async move {
let token_reader = store.open_index_file(TOKENS_FILE).await?;
let tokens = TokenSet::load(token_reader).await?;
Result::Ok(tokens)
}
});
let invert_list_fut = tokio::spawn({
let store = store.clone();
async move {
let invert_list_reader = store.open_index_file(INVERT_LIST_FILE).await?;
let invert_list = InvertedListReader::new(invert_list_reader)?;
Result::Ok(Arc::new(invert_list))
}
});
let docs_fut = tokio::spawn({
let store = store.clone();
async move {
let docs_reader = store.open_index_file(DOCS_FILE).await?;
let docs = DocSet::load(docs_reader).await?;
Result::Ok(docs)
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we shouldn't need to spawn these.

Suggested change
let tokens_fut = tokio::spawn({
let store = store.clone();
async move {
let token_reader = store.open_index_file(TOKENS_FILE).await?;
let tokens = TokenSet::load(token_reader).await?;
Result::Ok(tokens)
}
});
let invert_list_fut = tokio::spawn({
let store = store.clone();
async move {
let invert_list_reader = store.open_index_file(INVERT_LIST_FILE).await?;
let invert_list = InvertedListReader::new(invert_list_reader)?;
Result::Ok(Arc::new(invert_list))
}
});
let docs_fut = tokio::spawn({
let store = store.clone();
async move {
let docs_reader = store.open_index_file(DOCS_FILE).await?;
let docs = DocSet::load(docs_reader).await?;
Result::Ok(docs)
}
});
let tokens_fut = store.open_index_file(TOKENS_FILE)
.and_then(|token_reader| TokenSet::load(token_reader));
let invert_list_fut = store.open_index_file(INVERT_LIST_FILE)
.and_then(|invert_list_reader| InvertedListReader::new(invert_list_reader))
.map_ok(Arc::new);
let docs_fut = store.open_index_file(DOCS_FILE)
.and_then(|docs_reader| DocSet::load(docs_reader));

Comment on lines +257 to +259
let tokens = tokens_fut.await??;
let inverted_list = invert_list_fut.await??;
let docs = docs_fut.await??;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can await multiple futures at the same time with try_join!():

Suggested change
let tokens = tokens_fut.await??;
let inverted_list = invert_list_fut.await??;
let docs = docs_fut.await??;
let (tokens, inverted_list, docs) = try_join!(tokens_fut, invert_list_fut, docs_fut)?;

This has the upside that it will fail on the first failure of any of the three.

Copy link
Contributor Author

@BubbleCal BubbleCal Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes your solution is what I tried, then I encountered that "FnOnce is not general enough" error, so decided to use spawn

It's a good idea to split the IO/CPU operations, now the load methods mixed them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay this is fine then.

@BubbleCal BubbleCal mentioned this pull request Aug 27, 2024
26 tasks
@BubbleCal
Copy link
Contributor Author

BubbleCal commented Aug 27, 2024

I did try submitting the IO/CPU operations into diff runtime (tokio::spawn/spawn_cpu), but it results in no perf improvement (say no obvious diff from serial execution).
But this PR can really benefit from parallelism (20% faster), my guess is that spawn_cpu would execute the CPU operations on a diff thread then it leads to cache miss.

@wjones127 any recommendations?

@wjones127
Copy link
Contributor

But this PR can really benefit from parallelism (20% faster), my guess is that spawn_cpu would execute the CPU operations on a diff thread then it leads to cache miss.

@wjones127 any recommendations?

Might be good reason to use block_in_place: https://docs.rs/tokio/latest/tokio/task/fn.block_in_place.html

@BubbleCal BubbleCal merged commit 6016917 into lancedb:main Sep 9, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants