Easier full-text search possibility #4505

Alkarex · 2022-08-16T09:25:32Z

Contributes to #1331
Avoid concats in searches to make text indexes easier to build

Manual example with PostgreSQL:

CREATE extension pg_trgm;
CREATE index gin_trgm_index_title ON freshrss_entry USING gin(title gin_trgm_ops);
CREATE index gin_trgm_index_content ON freshrss_entry USING gin(content gin_trgm_ops);

Contributes to FreshRSS#1331 Avoid concats in searches to make text indexes easier to build

Alkarex · 2022-08-16T09:26:22Z

Quick example of SELECT LIKE performance gain:

freshrss=# explain analyse select title from freshrss_freshrss_entry where content like '%ambulance%';
                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 Seq Scan on freshrss_freshrss_entry  (cost=0.00..1904.16 rows=1 width=48) (actual time=54.970..405.760 rows=34 loops=1)
   Filter: (content ~~ '%ambulance%'::text)
   Rows Removed by Filter: 12808
 Planning Time: 0.562 ms
 Execution Time: 405.782 ms
(5 rows)

freshrss=# create extension pg_trgm;
CREATE EXTENSION

freshrss=# create index gin_trgm_index_content on freshrss_freshrss_entry using gin(content gin_trgm_ops);
CREATE INDEX

freshrss=# explain analyse select title from freshrss_freshrss_entry where content like '%ambulance%';
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on freshrss_freshrss_entry  (cost=88.01..92.02 rows=1 width=48) (actual time=0.659..16.860 rows=34 loops=1)
   Recheck Cond: (content ~~ '%ambulance%'::text)
   Rows Removed by Index Recheck: 84
   Heap Blocks: exact=105
   ->  Bitmap Index Scan on gin_trgm_index_content  (cost=0.00..88.01 rows=1 width=0) (actual time=0.333..0.334 rows=118 loops=1)
         Index Cond: (content ~~ '%ambulance%'::text)
 Planning Time: 0.676 ms
 Execution Time: 16.898 ms
(8 rows)

freshrss=# explain analyse select title from freshrss_freshrss_entry where title ilike '%ambulance%' OR content ilike '%ambulance%';
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on freshrss_freshrss_entry  (cost=520.02..527.84 rows=2 width=48) (actual time=1.378..60.193 rows=39 loops=1)
   Recheck Cond: (((title)::text ~~* '%ambulance%'::text) OR (content ~~* '%ambulance%'::text))
   Rows Removed by Index Recheck: 79
   Heap Blocks: exact=105
   ->  BitmapOr  (cost=520.02..520.02 rows=2 width=0) (actual time=0.553..0.555 rows=0 loops=1)
         ->  Bitmap Index Scan on gin_trgm_index_title  (cost=0.00..72.01 rows=1 width=0) (actual time=0.069..0.069 rows=8 loops=1)
               Index Cond: ((title)::text ~~* '%ambulance%'::text)
         ->  Bitmap Index Scan on gin_trgm_index_content  (cost=0.00..448.01 rows=1 width=0) (actual time=0.484..0.484 rows=118 loops=1)
               Index Cond: (content ~~* '%ambulance%'::text)
 Planning Time: 1.907 ms
 Execution Time: 60.224 ms
(11 rows)

Alkarex · 2022-08-16T11:36:29Z

From my quick tests, an index on a concatenation of e.title || e.content queried by (e.title || e.content ILIKE '%ambulance%') yields a similar performance than an index on e.title and another index on e.content queried by (e.title ILIKE '%ambulance%' OR e.content ILIKE '%ambulance%'), but having two simple indexes is more versatile.

Alkarex · 2022-08-18T10:06:11Z

For now it is only documented and the desired indexes must be created manually.
Depending on the feedback, if there is a demand, we could either have some by default, or make an option to create some automatically.

Alkarex · 2022-08-18T10:23:53Z

Documentation deployed: https://freshrss.github.io/FreshRSS/en/admins/DatabaseConfig.html

Alkarex · 2022-08-18T22:00:49Z

Quick test on an entry table of 62MB on disk, the indexes created on title and content sum up to 58MB.
So it looks like it more or less doubles the size on disk to have those indexes.

Alwaysin · 2022-08-25T16:00:58Z

Out of curiosity, is MySQL more or less as fast already, or is it "recommended" to convert to PostgreSQL to benefit from these optimizations?
Couldn't the same indexes be applied to MySQL?

Frenzie · 2022-08-25T16:32:59Z

They all support various kinds of indexes but I think trigram indexes are unique to PostgreSQL. (But don't take my word for it; I might be outdated or misremembering.)

Alkarex · 2022-08-25T17:09:39Z

So far, FreshRSS only supports text indexes in PostgreSQL

Alkarex · 2022-08-25T18:01:07Z

P.S. The reason is that PostgreSQL offers (also) indexes that work out-of-the-box with a standard SQL search using LIKE, while MySQL / SQLite only support indexes that require a custom SQL syntax (as far as I can see).
This could be added to FreshRSS but requires overriding the text search part for each database type, with and without index...

#fix FreshRSS#4535 Collation regression from FreshRSS#4505

* Fix MariaDB case sensitivity #fix #4535 Collation regression from #4505 * faster * Minor syntax * Minor parentheses

Easier full-text search possibility

9bbf69b

Contributes to FreshRSS#1331 Avoid concats in searches to make text indexes easier to build

Alkarex added the Search 🔍 label Aug 16, 2022

Alkarex added this to the 1.20.0 milestone Aug 16, 2022

Fix tests

3c2d940

Documentation

a09a347

Alkarex merged commit 4f111c5 into FreshRSS:edge Aug 18, 2022

Alkarex deleted the easier-fulltext-search branch August 18, 2022 10:06

This was referenced Aug 18, 2022

PostgreSQL full text index #1331

Open

SQL : Considérer l'indexation du texte complet #302

Open

Alwaysin mentioned this pull request Aug 25, 2022

MariaDB case-sensitive regression #4535

Closed

Alkarex added a commit to Alkarex/FreshRSS that referenced this pull request Aug 25, 2022

Fix MariaDB case sensitivity

4fd8f24

#fix FreshRSS#4535 Collation regression from FreshRSS#4505

Alkarex mentioned this pull request Aug 25, 2022

Fix MariaDB case sensitivity #4537

Merged

Alkarex added a commit that referenced this pull request Aug 25, 2022

Fix MariaDB case sensitivity (#4537)

03f5a42

* Fix MariaDB case sensitivity #fix #4535 Collation regression from #4505 * faster * Minor syntax * Minor parentheses

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Easier full-text search possibility #4505

Easier full-text search possibility #4505

Alkarex commented Aug 16, 2022

Alkarex commented Aug 16, 2022 •

edited

Loading

Alkarex commented Aug 16, 2022

Alkarex commented Aug 18, 2022

Alkarex commented Aug 18, 2022

Alkarex commented Aug 18, 2022

Alwaysin commented Aug 25, 2022

Frenzie commented Aug 25, 2022

Alkarex commented Aug 25, 2022

Alkarex commented Aug 25, 2022 •

edited

Loading

Easier full-text search possibility #4505

Easier full-text search possibility #4505

Conversation

Alkarex commented Aug 16, 2022

Alkarex commented Aug 16, 2022 • edited Loading

Alkarex commented Aug 16, 2022

Alkarex commented Aug 18, 2022

Alkarex commented Aug 18, 2022

Alkarex commented Aug 18, 2022

Alwaysin commented Aug 25, 2022

Frenzie commented Aug 25, 2022

Alkarex commented Aug 25, 2022

Alkarex commented Aug 25, 2022 • edited Loading

Alkarex commented Aug 16, 2022 •

edited

Loading

Alkarex commented Aug 25, 2022 •

edited

Loading