- Sponsor
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to customize the ranking rules #128
Conversation
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
This reverts commit 1f28737. Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the general approach of extracting the rankers into proper classes and having separate tests, so I'm 👍 on those! I'm not quite sure about the json_decode()
part. I don't think it can be as fast as exploding because there's a lot more going on when decoding JSON so I'm eager to learn about your performance results.
Right, my intuition says sqlite's json functions are quite optimized, but as always, the only way to know is by properly benchmarking. Maybe I was just hoping to make this as simple as possible :) |
Yeah, they sure are but you are using PHP's |
Just benchmarked this branch against I did find some other thing — apparently sqlite will call the loupe_relevance function twice, once for creating the value itself and once for sorting. I guess this could be optimized away by marking it as deterministic. Not sure how that works in detail, though... |
@Toflar So these are the benchmarks. With 50 runs per branch, and outliers removed. Posted them above as well. This is with the full database of 32K movies. About as close as it gets, I would say. Maybe a 1% decrease in performance?
|
(Just a side-note that I looked a bit into avoiding duplicate work in the |
Co-authored-by: Yanick Witschi <yanick.witschi@terminal42.ch>
Co-authored-by: Yanick Witschi <yanick.witschi@terminal42.ch>
Co-authored-by: Yanick Witschi <yanick.witschi@terminal42.ch>
It's indeed called twice. But how can it have no effect to get rid of those duplicate calls? :D |
I think in the first case it's because sqlite makes no promises about deterministic functions and it's more of an optional thing for it to honor or not. In the second case it's probably because adding another CTE avoids duplicate function calls but makes the query a lot more complex to parse or run? A lot of guessing here on my end 🤠 It'd be interesting to see how much time is spent in the relevance function in relation to total query time. |
That depends a lot on the total matches. If you have a query that matches many documents, for all of them the relevance has to be calculated. If you only have a few, then you won't see any noticeable difference. So if you're checking the deterministic case, I am pretty sure it should help but you have to test with a query that matches many documents 🤔 |
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
Signed-off-by: Philipp Daun <post@philippdaun.net>
This reverts commit 0647317. Signed-off-by: Philipp Daun <post@philippdaun.net>
I've reverted back to simple string concatenation and implode/explode for the function parameters to keep things simple for now. JSON might be just as fast, but it muddies the picture somewhat here. I've optimized a few loops and switched to passing all the arrays by reference. Queries are now about 0.5% faster for simple queries (not sure why), and about 0.75% slower for more complex queries (probably the added static method calls). I think that's about as good as it gets without doing further optimization elsewhere, e.g. in PR #130. Ready for review :) |
Thank you, @daun!! |
Tried my hand at implementing a
withRankingRules
config that mirrors Meilisearch ranking rules customization.The default is:
['words', 'proximity', 'attribute']
, in that order. Same as before. The weight of each rule is implicit and derived from a factor. I've had very similar results to the previous setup with1.0
→0.7
→0.49
→ etc. So that's what I went with for now.Switched to SQLite native json functions in a few places for passing data to the custom relevance function, it's a bit more ergonomic. About as fast as the current implementation, even with json.Reverted back to string concatenation to keep things simple.Usage
To ignore the proximity rule and prefer attribute weighing:
Benchmarks
Queries are about 0.5% faster for simple queries, and about 0.75% slower for more complex queries. I could live with that cost for the added benefit of being able to control relevance ranking.
develop
branchCloses #125