Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fvh adaptor to highlight the top boost phrase only #799

Merged
merged 3 commits into from
Jan 17, 2025

Conversation

waziqi89
Copy link
Contributor

@waziqi89 waziqi89 commented Jan 2, 2025

RP-12281
Add top_boost_only toggle to highlight the found highest boosted phrase. This is achieved by adding an adaptor to the FragmentBuilder.

The test case is self-explanatory. Only the multiple occurances of the best match will be highlighted.

@waziqi89 waziqi89 requested a review from taoyyu January 2, 2025 21:58
@waziqi89
Copy link
Contributor Author

waziqi89 commented Jan 2, 2025

The <em>margarita pizza</em> and the <em>marinara pizza</em> in this pizzeria are yummy and inexpensive.

delicious: 4
margarita pizza: 3
marinara pizza: 3
yummy: 2 
  • if the top boost cannot be found, fallback to the next
  • tie matches are both highlighted
  • lower boosted matched aren't highlighted, but account for scoring order

@waziqi89 waziqi89 changed the title fvh adaptor to highlight the top phrase once only. fvh adaptor to highlight the top boost phrase only Jan 9, 2025
@waziqi89 waziqi89 force-pushed the u/waziqi/RP-12281-hl branch from 26c0db4 to 575baa1 Compare January 9, 2025 16:36
@waziqi89 waziqi89 marked this pull request as ready for review January 9, 2025 16:38
@waziqi89 waziqi89 requested a review from sarthakn7 January 9, 2025 18:20
Copy link
Contributor

@sarthakn7 sarthakn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please update the highlighting doc and also first send the PR in main branch before backporting to v0? This would ensure that the proto field numbers are correct.

buffer, index, values, s, fragInfo.getEndOffset(), modifiedStartOffset);
int srcIndex = 0;
double topBoostValue =
fragInfo.getSubInfos().stream().map(SubInfo::getBoost).max(Float::compare).orElse(0f);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use for loop instead of stream to avoid extra object creation. This code is called per document, so it can be called 1000s of times per second and we don't want it to be slow.
You can actually also store the subInfos with the highest boost along with the topBoostValue, and then you won't need to iterate over all of the subInfos again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is called once for each fragment-to-be-created. We have to calculated the topBoostValue for each fragment as not all fragments contains all the desired phrases.

Comment on lines +68 to +70
if (subInfo.getBoost() < topBoostValue) {
continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the modifiedStartOffset still track the correct offset even if some of the subInfos are skipped?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.
This method uses pointer to concatenate the "highlighted" and the other parts. Skipping some of the phrases will still create a complete fragment.

@waziqi89 waziqi89 merged commit fa307ee into Yelp:v0.x Jan 17, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants