-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search rank subquery generates unuseable query plan under certain conditions #292
Comments
Just as a quick followup -- we have (temporarily?) removed I found this entry in the changelog:
But |
We need the rank to be something we could You're able to do this because you have full control and knowledge of the query you're building. Our users didn't have this level of control because they would get the |
What do your indexes look like? |
In the example above, there is an index on
If I understand correctly, you're saying that without the subquery, pg_search would automatically append |
That was the case before 1.0.0 when there wasn't yet the subquery. Nowadays, the subquery allows us not to have the rank in the So that's why we have the subquery. |
Sorry, fixed my comment. (Edits in italics.) I think we're on the same page now. I'll investigate a bit. |
The main question is, why won't PostgreSQL use the GIN index instead of doing the sequential scan in your original query? I'm also wondering if it matters whether you use a GIN vs. a GIST index, and also what happens if you add a compound index on both project_id and sequential_id. At this point it's not so much a pg_search question as it is a PostgreSQL query optimizer question. It might be worthwhile to read over Slow Query Questions and see if the larger PostgreSQL community has any advice. Maybe there's a better way for us to construct the query. |
Just wanted to say that we had this issue as well since we updated from 0.7.6 to 1.0.5. Our autocomplete query went from 87 ms to 2011 ms. Now I've to say that we don't use a GIN/GIST index because we also use the unaccent module, but we're planning to do that differently. I'll report back if that makes an improvement. |
For me the subquery triggers a scan of all partitions which kills the performance. Is there a way to add a where clause to the subquery? |
The entire search_contents index would have to be scanned, so in most cases the planner would prefer a sequential table scan over using the index. Sequential scan is often cheaper on small tables too. You could play with the random_page_cost setting to test this. Then for every row it returns (50 rows in example), it runs Index Scan on responses. You can see, as the number of search results increases, so will the loops on responses. Adding a constraint of project_id in the subquery helps. However, large result sets can still hurt. Limit and paging the subquery could help. The subquery downside, it’s always going to have to do some kind of Hash Join / Nested Loop / Merge Join operation. It appears the current subquery design does need addressing. |
This sounds like a misdiagnosis. You can order by something without selecting it. I can replace our
|
Similar to @CarlosEspejo's comment, the inability to filter on the subquery renders the performance unusable for our use case. We have lots of records but our interface has a date range filter to restrict to recent records, so we can't have the subquery executing the full text search against all records. Added a simple workaround in a fork to allow a block for subquery chaining, but not confident that would be the best overall way to go about it: dwillett@c6418b5 e.g.
|
For what it's worth, I have the same issue and will probably use the same workaround as @dwillett. It would be great to see a way to filter within the subquery (using an |
Nearly identical to the patch provided by @dwillett, I've updated to apply to latest version of pg_search and added a spec + some documentation.
I’m experiencing the same issue and suspect this might be a common challenge with this gem, especially for multi-tenant applications or any setup that relies on partitioning and aims to limit queries to specific partitions. Thanks for all the work on this gem, I think it’s great! |
I've been troubleshooting an odd issue that came to the fore today. Our
responses
table has 300,000+ entries, but they are always filtered by aproject_id
, which is an indexed column.We started intermittently seeing immensely-long query times -- e.g. hundreds of seconds.
Here's an
EXPLAIN ANALYZE
:Here's what the generated query looks like:
However, when adding
WHERE project_id = 1610
to the subquery, the performance problem goes away completely:Any idea what's going on, and how we might solve this for all users of the gem?
cc @nertzy @amarshall
The text was updated successfully, but these errors were encountered: