-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support hits to log count #789
Support hits to log count #789
Conversation
hits = | ||
getHitsFromOffset( | ||
hits, | ||
searchContext.getStartHit(), | ||
Math.max(searchContext.getTopHits(), searchContext.getHitsToLog())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is exactly what you want here. How is hit logging expected to function when startHit
is not 0? Note that topHits
is applied before the startHit
adjustment (unfortunately). If startHit=10, topHits=20, hitsToLog=10, would you expect to log the first 10 hits or the second 10 hits? For the latter, you would need to add startHit
to hitsToLog
before taking the max.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this explanation! I added some unit tests to make sure the code is working as expected now.
int collectHits = request.getTopHits(); | ||
if (request.hasLoggingHits()) { | ||
collectHits = request.getLoggingHits().getHitsToLog(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if hitsToLog
is less than topHits
? This might be better as a max
operation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ I missed adding Max
operation here
@@ -271,14 +271,29 @@ public SearchResponse handle(IndexState indexState, SearchRequest searchRequest) | |||
|
|||
long t0 = System.nanoTime(); | |||
|
|||
hits = getHitsFromOffset(hits, searchContext.getStartHit(), searchContext.getTopHits()); | |||
if (searchContext.getFetchTasks().getHitsLoggerFetchTask() != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't too sure if I should create a new method or just modify getHitsFromOffset
to require hitsToLog as another parameter. I decided to create a new method to make it more clear that the logic is a bit different when there are hits to log. Also, modifying the getHitsFromOffset
would need to update its call in other two classes: InnerHitFetchTask
and TopHitsCollectorManager
.
I don't have a strong opinion on which approach to choose so let me know if you prefer one over the other or if you have other suggestions.
public void testHitsLoggerResponseSizeReductionWithHitsToLogGreaterThanHitsCount() { | ||
SearchRequest request = | ||
SearchRequest.newBuilder() | ||
.setTopHits(10) | ||
.setStartHit(5) | ||
.setIndexName(DEFAULT_TEST_INDEX) | ||
.addRetrieveFields("doc_id") | ||
.setQuery( | ||
Query.newBuilder() | ||
.setTermQuery( | ||
TermQuery.newBuilder() | ||
.setField("vendor_name") | ||
.setTextValue("vendor") | ||
.build()) | ||
.build()) | ||
.setLoggingHits( | ||
LoggingHits.newBuilder().setName("custom_logger").setHitsToLog(6).build()) | ||
.build(); | ||
SearchResponse response = getGrpcServer().getBlockingStub().search(request); | ||
String expectedLogMessage = | ||
"LOGGED doc_id: 4, doc_id: 5, doc_id: 6, doc_id: 7, doc_id: 8, doc_id: 9, "; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really the logging behavior you want? The response hits start with 5, but the logging hits start with 4. Should this be logging docs 5-11?
If hitsToLog
is expected to be relative to startHit
, it will need to have startHit
added when being compared to topHits
(since topHits is currently defined as startHit + hits_to_retrieve).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ I got confused with my own testing set up 🫣 The test above should actually output doc_id: 5, doc_id: 6, doc_id: 7, doc_id: 8, doc_id: 9
because we only have a total of 10 documents. I think I fixed the issue and added more tests. I also changed the doc_id to start from 1 instead of 0. Hopefully the logic and tests make more sense.
This PR introduces a new feature in NRTSearch for HitsLoggerPlugin that allows for logging additional hits beyond what is returned to the client.
Testing