-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: concurrent loading FTS index files #2787
Conversation
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2787 +/- ##
==========================================
- Coverage 79.28% 79.28% -0.01%
==========================================
Files 227 227
Lines 68269 68290 +21
Branches 68269 68290 +21
==========================================
+ Hits 54126 54142 +16
Misses 11019 11019
- Partials 3124 3129 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could simplify this a bit. If you found there was some speedup from doing the spawn
, consider looking if there was CPU-bound work you can instead submit to the CPU threadpool as a task. Checkout Weston's PR documenting our threadpools here: https://github.com/lancedb/lance/pull/2773/files
let tokens_fut = tokio::spawn({ | ||
let store = store.clone(); | ||
async move { | ||
let token_reader = store.open_index_file(TOKENS_FILE).await?; | ||
let tokens = TokenSet::load(token_reader).await?; | ||
Result::Ok(tokens) | ||
} | ||
}); | ||
let invert_list_fut = tokio::spawn({ | ||
let store = store.clone(); | ||
async move { | ||
let invert_list_reader = store.open_index_file(INVERT_LIST_FILE).await?; | ||
let invert_list = InvertedListReader::new(invert_list_reader)?; | ||
Result::Ok(Arc::new(invert_list)) | ||
} | ||
}); | ||
let docs_fut = tokio::spawn({ | ||
let store = store.clone(); | ||
async move { | ||
let docs_reader = store.open_index_file(DOCS_FILE).await?; | ||
let docs = DocSet::load(docs_reader).await?; | ||
Result::Ok(docs) | ||
} | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we shouldn't need to spawn these.
let tokens_fut = tokio::spawn({ | |
let store = store.clone(); | |
async move { | |
let token_reader = store.open_index_file(TOKENS_FILE).await?; | |
let tokens = TokenSet::load(token_reader).await?; | |
Result::Ok(tokens) | |
} | |
}); | |
let invert_list_fut = tokio::spawn({ | |
let store = store.clone(); | |
async move { | |
let invert_list_reader = store.open_index_file(INVERT_LIST_FILE).await?; | |
let invert_list = InvertedListReader::new(invert_list_reader)?; | |
Result::Ok(Arc::new(invert_list)) | |
} | |
}); | |
let docs_fut = tokio::spawn({ | |
let store = store.clone(); | |
async move { | |
let docs_reader = store.open_index_file(DOCS_FILE).await?; | |
let docs = DocSet::load(docs_reader).await?; | |
Result::Ok(docs) | |
} | |
}); | |
let tokens_fut = store.open_index_file(TOKENS_FILE) | |
.and_then(|token_reader| TokenSet::load(token_reader)); | |
let invert_list_fut = store.open_index_file(INVERT_LIST_FILE) | |
.and_then(|invert_list_reader| InvertedListReader::new(invert_list_reader)) | |
.map_ok(Arc::new); | |
let docs_fut = store.open_index_file(DOCS_FILE) | |
.and_then(|docs_reader| DocSet::load(docs_reader)); |
let tokens = tokens_fut.await??; | ||
let inverted_list = invert_list_fut.await??; | ||
let docs = docs_fut.await??; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can await multiple futures at the same time with try_join!()
:
let tokens = tokens_fut.await??; | |
let inverted_list = invert_list_fut.await??; | |
let docs = docs_fut.await??; | |
let (tokens, inverted_list, docs) = try_join!(tokens_fut, invert_list_fut, docs_fut)?; |
This has the upside that it will fail on the first failure of any of the three.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes your solution is what I tried, then I encountered that "FnOnce
is not general enough" error, so decided to use spawn
It's a good idea to split the IO/CPU operations, now the load
methods mixed them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay this is fine then.
I did try submitting the IO/CPU operations into diff runtime ( @wjones127 any recommendations? |
Might be good reason to use |
get 30% improvement with the concurrent loading,
can help reduce the cold latency of full text search