Skip to content

Commit

Permalink
ranking: add more knobs
Browse files Browse the repository at this point in the history
In addition to controlling how much of a role the "page frequency"
plays in ranking pages, let's add more ways to modify the way pages are
ranked.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
  • Loading branch information
dscho committed Jan 6, 2024
1 parent 36492af commit cb0ffa7
Show file tree
Hide file tree
Showing 4 changed files with 60 additions and 31 deletions.
20 changes: 20 additions & 0 deletions docs/content/docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,26 @@ const search = await pagefind.search("term", {
```
{{< /diffcode >}}
It is also possible to control how much the site-wide frequency of a given term is taken into account (by default, terms that appear less often have a higher weight):
{{< diffcode >}}
```js
const search = await pagefind.search("term", {
+ ranking: { siteFrequency: 0.0 }
});
```
{{< /diffcode >}}
Another knob to control the ranking is `wordDistance`, which tells Pagefind how much it should weigh the length difference of the matched word vs the length of the matching search term:
{{< diffcode >}}
```js
const search = await pagefind.search("term", {
+ ranking: { wordDistance: 0.3 }
});
```
{{< /diffcode >}}
## Re-initializing the search API
In some cases you might need to re-initialize Pagefind. For example, if you dynamically change the language of the page without reloading, Pagefind will need to be re-initialized to reflect this langauge change.
Expand Down
67 changes: 36 additions & 31 deletions pagefind_web/src/search.rs
Original file line number Diff line number Diff line change
Expand Up @@ -38,50 +38,55 @@ pub struct BalancedWordScore {
#[derive(Debug, Clone)]
#[wasm_bindgen]
pub struct RankingWeights {
pub word_distance: f32,
pub site_frequency: f32,
pub page_frequency: f32,
}

#[wasm_bindgen]
impl RankingWeights {
#[wasm_bindgen(constructor)]
pub fn new(
word_distance: f32,
site_frequency: f32,
page_frequency: f32,
) -> RankingWeights {
RankingWeights {
word_distance,
site_frequency,
page_frequency,
}
}
}

impl From<VerboseWordLocation> for BalancedWordScore {
fn from(
VerboseWordLocation {
weight,
length_differential,
word_frequency,
word_location,
}: VerboseWordLocation,
) -> Self {
let word_length_bonus = if length_differential > 0 {
(2.0 / length_differential as f32).max(0.2)
} else {
3.0
};

// Starting with the raw user-supplied (or derived) weight of the word,
// we take it to the power of two to make the weight scale non-linear.
// We then multiply it with word_length_bonus, which should be a roughly 0 -> 3 scale of how close
// this was was in length to the target word.
// That result is then multiplied by the word frequency, which is again a roughly 0 -> 2 scale
// of how unique this word is in the entire site. (tf-idf ish)
let balanced_score =
((weight as f32).powi(2) * word_length_bonus) * word_frequency.max(0.5);

Self {
weight,
balanced_score,
word_location,
}
fn calculate_word_score(
VerboseWordLocation {
weight,
length_differential,
word_frequency,
word_location,
}: VerboseWordLocation,
ranking: &RankingWeights,
) -> BalancedWordScore {
let word_length_bonus = ((if length_differential > 0 {
(2.0 / length_differential as f32).max(0.2)
} else {
3.0
}).ln() * (*ranking).word_distance).exp();

// Starting with the raw user-supplied (or derived) weight of the word,
// we take it to the power of two to make the weight scale non-linear.
// We then multiply it with word_length_bonus, which should be a roughly 0 -> 3 scale of how close
// this was was in length to the target word.
// That result is then multiplied by the word frequency, which is again a roughly 0 -> 2 scale
// of how unique this word is in the entire site. (tf-idf ish)
let balanced_score =
((weight as f32).powi(2) * word_length_bonus) * (word_frequency.max(0.5).ln() * (*ranking).site_frequency).exp();

BalancedWordScore {
weight,
balanced_score,
word_location,
}
}

Expand Down Expand Up @@ -321,11 +326,11 @@ impl SearchIndex {
working_word.length_differential = next_word.length_differential;
}
} else {
unique_word_locations.push(working_word.into());
unique_word_locations.push(calculate_word_score(working_word, ranking));
working_word = next_word;
}
}
unique_word_locations.push(working_word.into());
unique_word_locations.push(calculate_word_score(working_word, ranking));
}

let page = &self.pages[page_index];
Expand Down
2 changes: 2 additions & 0 deletions pagefind_web_js/lib/coupled_search.ts
Original file line number Diff line number Diff line change
Expand Up @@ -441,6 +441,8 @@ class PagefindInstance {
}

let ranking = new this.backend.RankingWeights(
options.ranking?.wordDistance ?? 1.0,
options.ranking?.siteFrequency ?? 1.0,
options.ranking?.pageFrequency ?? 1.0,
)
// pointer may have updated from the loadChunk calls
Expand Down
2 changes: 2 additions & 0 deletions pagefind_web_js/types/index.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ declare global {
sort?: Object,
/** Fine-grained ranking weights (range: 0.0 - 1.0) */
ranking?: {
wordDistance?: Number,
siteFrequency?: Number,
pageFrequency?: Number,
},
}
Expand Down

0 comments on commit cb0ffa7

Please sign in to comment.