Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Honor after value for skipping documents even if queue is not full for PagingFieldCollector #12334

Merged
merged 1 commit into from
May 31, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -91,8 +91,8 @@ public abstract class NumericLeafComparator implements LeafFieldComparator {
// if skipping functionality should be enabled on this segment
private final boolean enableSkipping;
private final int maxDoc;
private final byte[] minValueAsBytes;
private final byte[] maxValueAsBytes;
private byte[] minValueAsBytes;
private byte[] maxValueAsBytes;

private DocIdSetIterator competitiveIterator;
private long iteratorCost = -1;
Expand Down Expand Up @@ -128,16 +128,10 @@ public NumericLeafComparator(LeafReaderContext context) throws IOException {
}
this.enableSkipping = true; // skipping is enabled when points are available
this.maxDoc = context.reader().maxDoc();
this.maxValueAsBytes =
reverse == false ? new byte[bytesCount] : topValueSet ? new byte[bytesCount] : null;
this.minValueAsBytes =
reverse ? new byte[bytesCount] : topValueSet ? new byte[bytesCount] : null;
this.competitiveIterator = DocIdSetIterator.all(maxDoc);
} else {
this.enableSkipping = false;
this.maxDoc = 0;
this.maxValueAsBytes = null;
this.minValueAsBytes = null;
}
}

Expand Down Expand Up @@ -191,7 +185,9 @@ public void setHitsThresholdReached() throws IOException {
// update its iterator to include possibly only docs that are "stronger" than the current bottom
// entry
private void updateCompetitiveIterator() throws IOException {
if (enableSkipping == false || hitsThresholdReached == false || queueFull == false) return;
if (enableSkipping == false
|| hitsThresholdReached == false
|| (queueFull == false && topValueSet == false)) return;
// if some documents have missing points, check that missing values prohibits optimization
if ((pointValues.getDocCount() < maxDoc) && isMissingValueCompetitive()) {
return; // we can't filter out documents, as documents with missing values are competitive
Expand All @@ -204,13 +200,21 @@ private void updateCompetitiveIterator() throws IOException {
return;
}
if (reverse == false) {
encodeBottom(maxValueAsBytes);
if (queueFull) { // bottom is avilable only when queue is full
maxValueAsBytes = maxValueAsBytes == null ? new byte[bytesCount] : maxValueAsBytes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the lazy initialization? I thought topValueSet would already be set before the NumericLeafComparator gets constructed. Maybe I'm misunderstanding that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dont know upfront at the time of construction (where currently initialization is done) that would we be needing maxValueAsBytes & minValuesAsBytes both. Like about the case, where we dont have any competitive hit collected in queue hence no bottom but has after value so the topValue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies in advance if I'm misunderstanding, but as the code is currently written, we also don't know if we'll ever need these arrays. If the queue never fills, we could unnecessarily have allocated one of them. I think we still have enough information upfront though to eagerly allocate these like we do today? Is it just a question of being eager vs. lazy with these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also don't know if we'll ever need these arrays. If the queue never fills, we could unnecessarily have allocated one of them.

if queueFull is false, that mean we will be breaching this condition || (queueFull == false && topValueSet == false)) return; only if its after query where topValueSet is set to true. We dont allocate unnecessary here, i.e we initiliaze min/max values only if we are to call encodeTop or encodeBottom for that.

I gave explanation w.r.t code in current PR, but if you were talking in context of existing code, yes, we are allocating unnecessary in case queueFull is always false.

I think we still have enough information upfront though to eagerly allocate these like we do today? Is it just a question of being eager vs. lazy with these?

There is a problem if we follow the same approach, check this code,

} else {
        if (queueFull) { // bottom is avilable only when queue is full
          minValueAsBytes = minValueAsBytes == null ? new byte[bytesCount] : minValueAsBytes;
          encodeBottom(minValueAsBytes);
        }
        if (topValueSet) {
          maxValueAsBytes = maxValueAsBytes == null ? new byte[bytesCount] : maxValueAsBytes;
          encodeTop(maxValueAsBytes);
        }
      }

if queueFUll is always false & there is a topValueSet, we will have minValueAsBytes set as [0, 0, 0, 0, 0, 0, 0, 0, 0] instead null. This will result in incorrect minValueAsBytes in further calculations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that clarify ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. Yes, that helps clarify. Thanks! The difference I missed is that we never start updating the competitive iterator until the queue fills in the current code, so it doesn't matter that these byte arrays are initialized as zeros, but now it does matter because null has meaning in the case of the competitive iterator update. OK got it. Thanks for walking me through it :)

encodeBottom(maxValueAsBytes);
}
if (topValueSet) {
minValueAsBytes = minValueAsBytes == null ? new byte[bytesCount] : minValueAsBytes;
encodeTop(minValueAsBytes);
}
} else {
encodeBottom(minValueAsBytes);
if (queueFull) { // bottom is avilable only when queue is full
minValueAsBytes = minValueAsBytes == null ? new byte[bytesCount] : minValueAsBytes;
encodeBottom(minValueAsBytes);
}
if (topValueSet) {
maxValueAsBytes = maxValueAsBytes == null ? new byte[bytesCount] : maxValueAsBytes;
encodeTop(maxValueAsBytes);
}
}
Expand Down