Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Easier full-text search possibility #4505

Merged
merged 3 commits into from
Aug 18, 2022

Conversation

Alkarex
Copy link
Member

@Alkarex Alkarex commented Aug 16, 2022

Contributes to #1331
Avoid concats in searches to make text indexes easier to build

#302 (comment)

Manual example with PostgreSQL:

CREATE extension pg_trgm;
CREATE index gin_trgm_index_title ON freshrss_entry USING gin(title gin_trgm_ops);
CREATE index gin_trgm_index_content ON freshrss_entry USING gin(content gin_trgm_ops);

Contributes to FreshRSS#1331
Avoid concats in searches to make text indexes easier to build
@Alkarex Alkarex added this to the 1.20.0 milestone Aug 16, 2022
@Alkarex
Copy link
Member Author

Alkarex commented Aug 16, 2022

Quick example of SELECT LIKE performance gain:

freshrss=# explain analyse select title from freshrss_freshrss_entry where content like '%ambulance%';
                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 Seq Scan on freshrss_freshrss_entry  (cost=0.00..1904.16 rows=1 width=48) (actual time=54.970..405.760 rows=34 loops=1)
   Filter: (content ~~ '%ambulance%'::text)
   Rows Removed by Filter: 12808
 Planning Time: 0.562 ms
 Execution Time: 405.782 ms
(5 rows)

freshrss=# create extension pg_trgm;
CREATE EXTENSION

freshrss=# create index gin_trgm_index_content on freshrss_freshrss_entry using gin(content gin_trgm_ops);
CREATE INDEX

freshrss=# explain analyse select title from freshrss_freshrss_entry where content like '%ambulance%';
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on freshrss_freshrss_entry  (cost=88.01..92.02 rows=1 width=48) (actual time=0.659..16.860 rows=34 loops=1)
   Recheck Cond: (content ~~ '%ambulance%'::text)
   Rows Removed by Index Recheck: 84
   Heap Blocks: exact=105
   ->  Bitmap Index Scan on gin_trgm_index_content  (cost=0.00..88.01 rows=1 width=0) (actual time=0.333..0.334 rows=118 loops=1)
         Index Cond: (content ~~ '%ambulance%'::text)
 Planning Time: 0.676 ms
 Execution Time: 16.898 ms
(8 rows)

freshrss=# explain analyse select title from freshrss_freshrss_entry where title ilike '%ambulance%' OR content ilike '%ambulance%';
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on freshrss_freshrss_entry  (cost=520.02..527.84 rows=2 width=48) (actual time=1.378..60.193 rows=39 loops=1)
   Recheck Cond: (((title)::text ~~* '%ambulance%'::text) OR (content ~~* '%ambulance%'::text))
   Rows Removed by Index Recheck: 79
   Heap Blocks: exact=105
   ->  BitmapOr  (cost=520.02..520.02 rows=2 width=0) (actual time=0.553..0.555 rows=0 loops=1)
         ->  Bitmap Index Scan on gin_trgm_index_title  (cost=0.00..72.01 rows=1 width=0) (actual time=0.069..0.069 rows=8 loops=1)
               Index Cond: ((title)::text ~~* '%ambulance%'::text)
         ->  Bitmap Index Scan on gin_trgm_index_content  (cost=0.00..448.01 rows=1 width=0) (actual time=0.484..0.484 rows=118 loops=1)
               Index Cond: (content ~~* '%ambulance%'::text)
 Planning Time: 1.907 ms
 Execution Time: 60.224 ms
(11 rows)

@Alkarex
Copy link
Member Author

Alkarex commented Aug 16, 2022

From my quick tests, an index on a concatenation of e.title || e.content queried by (e.title || e.content ILIKE '%ambulance%') yields a similar performance than an index on e.title and another index on e.content queried by (e.title ILIKE '%ambulance%' OR e.content ILIKE '%ambulance%'), but having two simple indexes is more versatile.

@Alkarex
Copy link
Member Author

Alkarex commented Aug 18, 2022

For now it is only documented and the desired indexes must be created manually.
Depending on the feedback, if there is a demand, we could either have some by default, or make an option to create some automatically.

@Alkarex Alkarex merged commit 4f111c5 into FreshRSS:edge Aug 18, 2022
@Alkarex Alkarex deleted the easier-fulltext-search branch August 18, 2022 10:06
@Alkarex
Copy link
Member Author

Alkarex commented Aug 18, 2022

@Alkarex
Copy link
Member Author

Alkarex commented Aug 18, 2022

Quick test on an entry table of 62MB on disk, the indexes created on title and content sum up to 58MB.
So it looks like it more or less doubles the size on disk to have those indexes.

@Alwaysin
Copy link
Contributor

Out of curiosity, is MySQL more or less as fast already, or is it "recommended" to convert to PostgreSQL to benefit from these optimizations?
Couldn't the same indexes be applied to MySQL?

@Frenzie
Copy link
Member

Frenzie commented Aug 25, 2022

They all support various kinds of indexes but I think trigram indexes are unique to PostgreSQL. (But don't take my word for it; I might be outdated or misremembering.)

@Alkarex
Copy link
Member Author

Alkarex commented Aug 25, 2022

So far, FreshRSS only supports text indexes in PostgreSQL

@Alkarex
Copy link
Member Author

Alkarex commented Aug 25, 2022

P.S. The reason is that PostgreSQL offers (also) indexes that work out-of-the-box with a standard SQL search using LIKE, while MySQL / SQLite only support indexes that require a custom SQL syntax (as far as I can see).
This could be added to FreshRSS but requires overriding the text search part for each database type, with and without index...

Alkarex added a commit to Alkarex/FreshRSS that referenced this pull request Aug 25, 2022
Alkarex added a commit that referenced this pull request Aug 25, 2022
* Fix MariaDB case sensitivity
#fix #4535
Collation regression from #4505

* faster

* Minor syntax

* Minor parentheses
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants