Skip to content

Commit

Permalink
amend! Add an option to turn off density-weighting
Browse files Browse the repository at this point in the history
Add an option to stop scoring shorter pages higher

When searching, Pagefind applies a heuristic that often works quite well
to boost pages with a higher density, i.e. a higher number of hits
divided by the number of words on the page. This is called "density
weighting".

In some instances, it is desirable, though, to just use the number of
hits directly, without dividing by the number of words on the page.

Let's support this via a new search option `ranking`, which as of right
now contains a single field to specify how much "denser pages" should be
favored.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
  • Loading branch information
dscho committed Jan 6, 2024
1 parent cb0ffa7 commit e56f662
Show file tree
Hide file tree
Showing 2 changed files with 43 additions and 28 deletions.
43 changes: 43 additions & 0 deletions pagefind/features/scoring.feature
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,49 @@ Feature: Result Scoring
Then The selector "[data-count]" should contain "2 result(s)"
Then The selector "[data-result]" should contain "/dog/, /cat/"

Scenario: Ranking can be configured to stop favoring pages with less words
Given I have a "public/index.html" file with the body:
"""
<ul>
<li data-result>
</ul>
"""
Given I have a "public/single-word.html" file with the body:
"""
<p>word</p>
"""
Given I have a "public/three-words.html" file with the body:
"""
<p>I have a word and a word and another word</p>
"""
When I run my program
Then I should see "Running Pagefind" in stdout
When I serve the "public" directory
When I load "/"
When I evaluate:
"""
async function() {
let pagefind = await import("/pagefind/pagefind.js");
let search = await pagefind.search(`word`);
document.querySelector('[data-result]').innerText = search.results.map(r => r.words.length).join(', ');
}
"""
Then There should be no logs
# With density weighting, single-word should be the first hit, otherwise three-words
Then The selector "[data-result]" should contain "1, 3"
When I evaluate:
"""
async function() {
let pagefind = await import("/pagefind/pagefind.js");
let search = await pagefind.search(`word`, { ranking: { pageFrequency: 0.0 } });
document.querySelector('[data-result]').innerText = search.results.map(r => r.words.length).join(', ');
}
"""
Then There should be no logs
Then The selector "[data-result]" should contain "3, 1"

@skip
Scenario: Search terms in close proximity rank higher in results
When I evaluate:
Expand Down
28 changes: 0 additions & 28 deletions pagefind/features/weighting.feature
Original file line number Diff line number Diff line change
Expand Up @@ -224,31 +224,3 @@ Feature: Word Weighting
Then There should be no logs
# Treat the bal value here as a snapshot — update the expected value as needed
Then The selector "p" should contain "weight:1/bal:82.28572/loc:4"

Scenario: Density weighting can be turned off
Given I have a "public/single-word.html" file with the body:
"""
<p>word</p>
"""
Given I have a "public/three-words.html" file with the body:
"""
<p>I have a word and a word and another word</p>
"""
When I run my program
Then I should see "Running Pagefind" in stdout
When I serve the "public" directory
When I load "/"
When I evaluate:
"""
async function() {
let pagefind = await import("/pagefind/pagefind.js");
let search = await pagefind.search(`word`);
let search2 = await pagefind.search(`word`, { ranking: { pageFrequency: 0.0 } });
let counts = [search, search2].map(s => s.results.map(r => r.words.length));
document.querySelector('p').innerText = JSON.stringify(counts);
}
"""
Then There should be no logs
# With density weighting, single-word should be the first hit, otherwise three-words
Then The selector "p" should contain "[[1,3],[3,1]]"

0 comments on commit e56f662

Please sign in to comment.