Allow to customize the ranking rules #128

daun · 2024-12-03T19:55:49Z

Tried my hand at implementing a withRankingRules config that mirrors Meilisearch ranking rules customization.

The default is: ['words', 'proximity', 'attribute'], in that order. Same as before. The weight of each rule is implicit and derived from a factor. I've had very similar results to the previous setup with 1.0 → 0.7 → 0.49 → etc. So that's what I went with for now.

~~Switched to SQLite native json functions in a few places for passing data to the custom relevance function, it's a bit more ergonomic. About as fast as the current implementation, even with json.~~ Reverted back to string concatenation to keep things simple.

Usage

To ignore the proximity rule and prefer attribute weighing:

$configuration = Configuration::create()
    ->withSearchableAttributes(['title', 'content'])
    ->withRankingRules([
        'attribute',
        'words',
    ]);

Benchmarks

Queries are about 0.5% faster for simple queries, and about 0.75% slower for more complex queries. I could live with that cost for the added benefit of being able to control relevance ranking.

query	`develop` branch	this pull request
amakin dkywalker	164.1 ms ± 1.8 ms	165.2 ms ± 2.0 ms
famly dadd	676.4 ms ± 41.6 ms	679.9 ms ± 8.7 ms
once twice thrice	261.9 ms ± 2.2 ms	264.3 ms ± 8.1 ms
lighting	195.1 ms ± 5.0 ms	194.1 ms ± 1.8 ms

Closes #125

Signed-off-by: Philipp Daun <post@philippdaun.net>

This reverts commit 1f28737. Signed-off-by: Philipp Daun <post@philippdaun.net>

Signed-off-by: Philipp Daun <post@philippdaun.net>

Toflar

I like the general approach of extracting the rankers into proper classes and having separate tests, so I'm 👍 on those! I'm not quite sure about the json_decode() part. I don't think it can be as fast as exploding because there's a lot more going on when decoding JSON so I'm eager to learn about your performance results.

src/Internal/Search/Sorting/Relevance.php

src/Internal/Search/Ranking/AbstractRanker.php

daun · 2024-12-04T16:32:02Z

I like the general approach of extracting the rankers into proper classes and having separate tests, so I'm 👍 on those! I'm not quite sure about the json_decode() part. I don't think it can be as fast as exploding because there's a lot more going on when decoding JSON so I'm eager to learn about your performance results.

Right, my intuition says sqlite's json functions are quite optimized, but as always, the only way to know is by properly benchmarking. Maybe I was just hoping to make this as simple as possible :)

Toflar · 2024-12-04T16:47:03Z

Yeah, they sure are but you are using PHP's json_decode() which by nature cannot be as quick as explode() :)

daun · 2024-12-04T16:57:23Z

Just benchmarked this branch against develop, it's as fast or faster. So I'd say we shouldn't worry about the json part. I'll add the benchmark results to the pr in a minute, just need to properly format them somehow.

I did find some other thing — apparently sqlite will call the loupe_relevance function twice, once for creating the value itself and once for sorting. I guess this could be optimized away by marking it as deterministic. Not sure how that works in detail, though...

daun · 2024-12-04T17:09:37Z

@Toflar So these are the benchmarks. With 50 runs per branch, and outliers removed. Posted them above as well. This is with the full database of 32K movies. About as close as it gets, I would say. Maybe a 1% decrease in performance?

query	`develop` branch	this pull request
amakin dkywalker	164.1 ms ± 1.8 ms	165.2 ms ± 2.0 ms
famly dadd	676.4 ms ± 41.6 ms	677.3 ms ± 11.3 ms
once twice thrice	260.7 ms ± 3.1 ms	265.2 ms ± 10.1 ms
lighting	198.0 ms ± 4.2 ms	196.8 ms ± 2.4 ms

daun · 2024-12-04T20:24:25Z

(Just a side-note that I looked a bit into avoiding duplicate work in the loupe_relevance function by a) marking the sqlite custom function as deterministic, as well as offloading it into a materialized CTE. Both had no effect or made things worse.)

Co-authored-by: Yanick Witschi <yanick.witschi@terminal42.ch>

Toflar · 2024-12-05T09:29:15Z

It's indeed called twice. But how can it have no effect to get rid of those duplicate calls? :D

daun · 2024-12-05T15:05:27Z

It's indeed called twice. But how can it have no effect to get rid of those duplicate calls? :D

I think in the first case it's because sqlite makes no promises about deterministic functions and it's more of an optional thing for it to honor or not.

In the second case it's probably because adding another CTE avoids duplicate function calls but makes the query a lot more complex to parse or run? A lot of guessing here on my end 🤠

It'd be interesting to see how much time is spent in the relevance function in relation to total query time.

Toflar · 2024-12-05T15:13:49Z

It'd be interesting to see how much time is spent in the relevance function in relation to total query time.

That depends a lot on the total matches. If you have a query that matches many documents, for all of them the relevance has to be calculated. If you only have a few, then you won't see any noticeable difference. So if you're checking the deterministic case, I am pretty sure it should help but you have to test with a query that matches many documents 🤔

Signed-off-by: Philipp Daun <post@philippdaun.net>

This reverts commit 0647317. Signed-off-by: Philipp Daun <post@philippdaun.net>

daun · 2024-12-07T09:54:54Z

I've reverted back to simple string concatenation and implode/explode for the function parameters to keep things simple for now. JSON might be just as fast, but it muddies the picture somewhat here.

I've optimized a few loops and switched to passing all the arrays by reference. Queries are now about 0.5% faster for simple queries (not sure why), and about 0.75% slower for more complex queries (probably the added static method calls). I think that's about as good as it gets without doing further optimization elsewhere, e.g. in PR #130.

Ready for review :)

src/Internal/Search/Ranking/AbstractRanker.php

Toflar · 2024-12-09T16:34:20Z

Thank you, @daun!!

daun added 20 commits December 3, 2024 16:59

Get and set ranking rule config

850a07e

Signed-off-by: Philipp Daun <post@philippdaun.net>

Define ranking rules order factor

52d514f

Signed-off-by: Philipp Daun <post@philippdaun.net>

Create abstract ranker class

175c07c

Signed-off-by: Philipp Daun <post@philippdaun.net>

Define default ranking rules

Loading
Loading status checks…

3d57aa2

Signed-off-by: Philipp Daun <post@philippdaun.net>

Add missing rankers

443904e

Signed-off-by: Philipp Daun <post@philippdaun.net>

Refactor ranking

bf6322c

Signed-off-by: Philipp Daun <post@philippdaun.net>

Fix weight test

22cb2f0

Signed-off-by: Philipp Daun <post@philippdaun.net>

Fix ranking weights

cfeff27

Signed-off-by: Philipp Daun <post@philippdaun.net>

Adjust tests to match new weights

ba7c837

Signed-off-by: Philipp Daun <post@philippdaun.net>

Refactor rankers

c17ad03

Signed-off-by: Philipp Daun <post@philippdaun.net>

Fix static rankers

784f107

Signed-off-by: Philipp Daun <post@philippdaun.net>

Add tests for custom ranking rule order

231bc52

Signed-off-by: Philipp Daun <post@philippdaun.net>

Fix style issues

fb46818

Signed-off-by: Philipp Daun <post@philippdaun.net>

Reorder args

8f91f49

Signed-off-by: Philipp Daun <post@philippdaun.net>

Add test for missing results

bb99b89

Signed-off-by: Philipp Daun <post@philippdaun.net>

Simplify custom function attribute encoding

1f28737

Signed-off-by: Philipp Daun <post@philippdaun.net>

Revert "Simplify custom function attribute encoding"

af43140

This reverts commit 1f28737. Signed-off-by: Philipp Daun <post@philippdaun.net>

Remove test

19affde

Signed-off-by: Philipp Daun <post@philippdaun.net>

Fix error on php 8.2 and below

8fd5c93

Signed-off-by: Philipp Daun <post@philippdaun.net>

Simplify json decode calls

a8c481c

Signed-off-by: Philipp Daun <post@philippdaun.net>

Toflar reviewed Dec 4, 2024

View reviewed changes

src/Internal/Search/Sorting/Relevance.php Outdated Show resolved Hide resolved

src/Internal/Search/Sorting/Relevance.php Outdated Show resolved Hide resolved

src/Internal/Search/Sorting/Relevance.php Outdated Show resolved Hide resolved

Toflar reviewed Dec 4, 2024

View reviewed changes

src/Internal/Search/Ranking/AbstractRanker.php Outdated Show resolved Hide resolved

Merge branch 'develop' into feat/ranking-rules

7783cca

daun and others added 2 commits December 4, 2024 22:10

Update src/Internal/Search/Sorting/Relevance.php

4b14436

Co-authored-by: Yanick Witschi <yanick.witschi@terminal42.ch>

Update src/Internal/Search/Sorting/Relevance.php

a86a1a6

Co-authored-by: Yanick Witschi <yanick.witschi@terminal42.ch>

Update src/Internal/Search/Sorting/Relevance.php

da2c935

Co-authored-by: Yanick Witschi <yanick.witschi@terminal42.ch>

daun added 10 commits December 6, 2024 21:52

Merge branch 'develop' into feat/ranking-rules

9b6520e

Remove unused static cache

e4ad1c4

Signed-off-by: Philipp Daun <post@philippdaun.net>

Revert back to explode

4a8bc57

Signed-off-by: Philipp Daun <post@philippdaun.net>

Pass ranker params by reference

d1f2b1d

Signed-off-by: Philipp Daun <post@philippdaun.net>

Simplify word count calculation

02050e4

Signed-off-by: Philipp Daun <post@philippdaun.net>

Fix style issues

0647317

Signed-off-by: Philipp Daun <post@philippdaun.net>

Fix static cache assignment

9cd5e95

Signed-off-by: Philipp Daun <post@philippdaun.net>

Use loop for total weight calculation

bbeca7b

Signed-off-by: Philipp Daun <post@philippdaun.net>

Reorder calls

02d717f

Signed-off-by: Philipp Daun <post@philippdaun.net>

Revert benchmark changes

27c53f8

This reverts commit 0647317. Signed-off-by: Philipp Daun <post@philippdaun.net>

daun marked this pull request as ready for review December 7, 2024 09:54

Toflar reviewed Dec 9, 2024

View reviewed changes

src/Internal/Search/Ranking/AbstractRanker.php Show resolved Hide resolved

Toflar mentioned this pull request Dec 9, 2024

Check the ranking rule abstraction #135

Closed

Toflar approved these changes Dec 9, 2024

View reviewed changes

Toflar changed the title ~~Customize ranking rules~~ Allow to customize the ranking rules Dec 9, 2024

Toflar merged commit ba31c05 into loupe-php:develop Dec 9, 2024
18 checks passed

daun deleted the feat/ranking-rules branch December 9, 2024 21:05

daun mentioned this pull request Dec 9, 2024

Customize order of ranking rules #125

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow to customize the ranking rules #128

Allow to customize the ranking rules #128

daun commented Dec 3, 2024 •

edited

Loading

Toflar left a comment

daun commented Dec 4, 2024

Toflar commented Dec 4, 2024

daun commented Dec 4, 2024

daun commented Dec 4, 2024

daun commented Dec 4, 2024

Toflar commented Dec 5, 2024

daun commented Dec 5, 2024

Toflar commented Dec 5, 2024

daun commented Dec 7, 2024

Toflar commented Dec 9, 2024

Allow to customize the ranking rules #128

Allow to customize the ranking rules #128

Conversation

daun commented Dec 3, 2024 • edited Loading

Usage

Benchmarks

Toflar left a comment

Choose a reason for hiding this comment

daun commented Dec 4, 2024

Toflar commented Dec 4, 2024

daun commented Dec 4, 2024

daun commented Dec 4, 2024

daun commented Dec 4, 2024

Toflar commented Dec 5, 2024

daun commented Dec 5, 2024

Toflar commented Dec 5, 2024

daun commented Dec 7, 2024

Toflar commented Dec 9, 2024

daun commented Dec 3, 2024 •

edited

Loading