-
-
Notifications
You must be signed in to change notification settings - Fork 709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
demux #1150
demux #1150
Conversation
5aa717a
to
c02100e
Compare
Codecov Report
@@ Coverage Diff @@
## main #1150 +/- ##
==========================================
+ Coverage 93.87% 93.98% +0.10%
==========================================
Files 203 204 +1
Lines 34010 34534 +524
==========================================
+ Hits 31928 32456 +528
+ Misses 2082 2078 -4
Continue to review full report at Codecov.
|
Here are some benchmarks: Merge
Demux into 4 segments, evenly distributed.
Demux into 8 segments, evenly distributed.
Demux to 16
The time is not linear to the number of segments. With 16 segments, there is an adverse effect of duplicated data, so size of 16 segments is larger than a single segment. That is not yet the case with 4 segments. |
I am not sure flamegraph and bench with the fst dictionary make any sense. Demux is a feature that only targets quickwit. |
Here are the numbers with the sstable Merge
Demux into 4 segments
Demux into 8 segments
Demux into 16 segments
|
One hotspot is 10% on is_deleted, which can likely be much improved by having a smarter iterator than: /// Returns an iterator that will iterate over the alive document ids
pub fn doc_ids_alive(&self) -> impl Iterator<Item = DocId> + '_ {
(0u32..self.max_doc).filter(move |doc| !self.is_deleted(*doc))
} |
I like the idea of having a simple&inefficient implementation of Demux, at least for the moment. Thank for the bench too. The PR does not seem to be clean enough for review. Please do one quick pass over it. Also can you put that in tantivy-quickwit instead of tantivy. (The delete bitset work should be in tantivy however) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments.
45fd78b
to
3f4afbc
Compare
Add support for demux operation (Reorder data from a list of segments to a new list of segments) by leveraging custom delete bitsets on merging.