Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support hits to log count #789

Merged
merged 11 commits into from
Jan 6, 2025

Conversation

fragosoluana
Copy link
Member

This PR introduces a new feature in NRTSearch for HitsLoggerPlugin that allows for logging additional hits beyond what is returned to the client.

  • The system can now log more hits than are returned to the client, allowing the client to log a broader set of data without adding too much information in the search response.
  • Users can configure the number of hits to be returned separately from the number of hits to be logged.
  • Existing configurations will continue to function as before unless explicitly updated to use hitsToLog in LoggingHits message.

Testing

  • Unit tests have been added to verify the correct behavior of the response size reduction functionalities.
  • Added examples to demonstrate how to set up and utilize the new feature

Comment on lines 275 to 279
hits =
getHitsFromOffset(
hits,
searchContext.getStartHit(),
Math.max(searchContext.getTopHits(), searchContext.getHitsToLog()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is exactly what you want here. How is hit logging expected to function when startHit is not 0? Note that topHits is applied before the startHit adjustment (unfortunately). If startHit=10, topHits=20, hitsToLog=10, would you expect to log the first 10 hits or the second 10 hits? For the latter, you would need to add startHit to hitsToLog before taking the max.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this explanation! I added some unit tests to make sure the code is working as expected now.

Comment on lines 71 to 74
int collectHits = request.getTopHits();
if (request.hasLoggingHits()) {
collectHits = request.getLoggingHits().getHitsToLog();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if hitsToLog is less than topHits? This might be better as a max operation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ I missed adding Max operation here

@@ -271,14 +271,29 @@ public SearchResponse handle(IndexState indexState, SearchRequest searchRequest)

long t0 = System.nanoTime();

hits = getHitsFromOffset(hits, searchContext.getStartHit(), searchContext.getTopHits());
if (searchContext.getFetchTasks().getHitsLoggerFetchTask() != null) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't too sure if I should create a new method or just modify getHitsFromOffset to require hitsToLog as another parameter. I decided to create a new method to make it more clear that the logic is a bit different when there are hits to log. Also, modifying the getHitsFromOffset would need to update its call in other two classes: InnerHitFetchTask and TopHitsCollectorManager.

I don't have a strong opinion on which approach to choose so let me know if you prefer one over the other or if you have other suggestions.

Comment on lines 273 to 293
public void testHitsLoggerResponseSizeReductionWithHitsToLogGreaterThanHitsCount() {
SearchRequest request =
SearchRequest.newBuilder()
.setTopHits(10)
.setStartHit(5)
.setIndexName(DEFAULT_TEST_INDEX)
.addRetrieveFields("doc_id")
.setQuery(
Query.newBuilder()
.setTermQuery(
TermQuery.newBuilder()
.setField("vendor_name")
.setTextValue("vendor")
.build())
.build())
.setLoggingHits(
LoggingHits.newBuilder().setName("custom_logger").setHitsToLog(6).build())
.build();
SearchResponse response = getGrpcServer().getBlockingStub().search(request);
String expectedLogMessage =
"LOGGED doc_id: 4, doc_id: 5, doc_id: 6, doc_id: 7, doc_id: 8, doc_id: 9, ";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really the logging behavior you want? The response hits start with 5, but the logging hits start with 4. Should this be logging docs 5-11?

If hitsToLog is expected to be relative to startHit, it will need to have startHit added when being compared to topHits (since topHits is currently defined as startHit + hits_to_retrieve).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ I got confused with my own testing set up 🫣 The test above should actually output doc_id: 5, doc_id: 6, doc_id: 7, doc_id: 8, doc_id: 9 because we only have a total of 10 documents. I think I fixed the issue and added more tests. I also changed the doc_id to start from 1 instead of 0. Hopefully the logic and tests make more sense.

@fragosoluana fragosoluana merged commit 3472026 into main Jan 6, 2025
1 check passed
@fragosoluana fragosoluana deleted the luanafragoso_logging_hits_reduce_response_size branch January 6, 2025 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants