Search rank subquery generates unuseable query plan under certain conditions #292

ajb · 2016-03-23T23:10:55Z

I've been troubleshooting an odd issue that came to the fore today. Our responses table has 300,000+ entries, but they are always filtered by a project_id, which is an indexed column.

We started intermittently seeing immensely-long query times -- e.g. hundreds of seconds.

Here's an EXPLAIN ANALYZE:

Here's what the generated query looks like:

SELECT responses.sequential_id FROM "responses"
INNER JOIN (
  SELECT "responses"."id" AS pg_search_id,
  (
    ts_rank(
      ("responses"."search_contents"),
      (to_tsquery('simple', ''' ' || 'laura' || ' ''')), 0
    )
  ) AS rank
  FROM "responses" WHERE (
    (
      ("responses"."search_contents") @@
      (
        to_tsquery('simple', ''' ' || 'laura' || ' ''')
      )
    )
  )
) AS pg_search_e30b3910ffd943c99f7edd ON "responses"."id" = pg_search_e30b3910ffd943c99f7edd.pg_search_id
WHERE project_id = 1610
LIMIT 20;

However, when adding WHERE project_id = 1610 to the subquery, the performance problem goes away completely:

Any idea what's going on, and how we might solve this for all users of the gem?

cc @nertzy @amarshall

The text was updated successfully, but these errors were encountered:

ajb · 2016-03-28T13:57:22Z

Just as a quick followup -- we have (temporarily?) removed pg_search, and re-implemented the parts we needed directly inside of our app, while replacing the subquery with a normal WHERE clause. This has indeed solved our slow query problem.

I found this entry in the changelog:

Support more ActiveRecord::Relation methods, such as #pluck and #select by moving search-related operations to subquery.

But pluck and select still seem to work without the subquery. Maybe I'm not testing properly?

nertzy · 2016-03-28T22:28:35Z

We need the rank to be something we could ORDER by. But before we had the subquery, the only way I knew of to make the rank available to the ORDER clause was to add it to the SELECT clause. But when I did that, Active Record would no longer automatically include all fields, meaning we would either need to add * to the SELECT or add the columns we wanted manually. We chose to add * to the SELECT by default, which meant that pg_search users could not limit the SELECT to less than that.

You're able to do this because you have full control and knowledge of the query you're building. Our users didn't have this level of control because they would get the * added to their queries whether they wanted it or not. And if we didn't automatically include *, then the most common use case wouldn't work at all.

nertzy · 2016-03-28T22:29:09Z

What do your indexes look like?

ajb · 2016-03-29T20:01:18Z

In the example above, there is an index on responses.sequential_id, responses.project_id, and a GIN index on responses.search_contents.

We chose to add * to the SELECT by default, which meant that pg_search users could not limit the SELECT to less than that.

If I understand correctly, you're saying that without the subquery, pg_search would automatically append select('responses.*') to my query, which would then prevent me from adding pluck('responses.sequential_id') to the end? This is not the behavior I'm seeing, but if you confirm that my understanding is correct, I'll do some more testing.

nertzy · 2016-03-29T22:06:22Z

That was the case before 1.0.0 when there wasn't yet the subquery.

Nowadays, the subquery allows us not to have the rank in the SELECT, which allows for Active Record's more expected default behavior. It used to be that when you called pluck, it removed the rank from the query, and thus generated invalid SQL.

So that's why we have the subquery.

ajb · 2016-03-29T22:24:30Z

Sorry, fixed my comment. (Edits in italics.)

I think we're on the same page now. I'll investigate a bit.

nertzy · 2016-03-29T23:04:05Z

The main question is, why won't PostgreSQL use the GIN index instead of doing the sequential scan in your original query?

I'm also wondering if it matters whether you use a GIN vs. a GIST index, and also what happens if you add a compound index on both project_id and sequential_id.

At this point it's not so much a pg_search question as it is a PostgreSQL query optimizer question. It might be worthwhile to read over Slow Query Questions and see if the larger PostgreSQL community has any advice. Maybe there's a better way for us to construct the query.

ajb · 2016-03-30T11:46:06Z

This occurs w/ both GIN and GIST indexes. There is also a compound index on project_id/sequential_id. Here's the complete list:

ppostma · 2016-05-27T12:45:57Z

Just wanted to say that we had this issue as well since we updated from 0.7.6 to 1.0.5. Our autocomplete query went from 87 ms to 2011 ms. Now I've to say that we don't use a GIN/GIST index because we also use the unaccent module, but we're planning to do that differently. I'll report back if that makes an improvement.

CarlosEspejo · 2016-09-26T00:53:17Z

For me the subquery triggers a scan of all partitions which kills the performance. Is there a way to add a where clause to the subquery?

jjohnson · 2017-02-12T17:32:01Z

The entire search_contents index would have to be scanned, so in most cases the planner would prefer a sequential table scan over using the index. Sequential scan is often cheaper on small tables too. You could play with the random_page_cost setting to test this. Then for every row it returns (50 rows in example), it runs Index Scan on responses. You can see, as the number of search results increases, so will the loops on responses.

Adding a constraint of project_id in the subquery helps. However, large result sets can still hurt. Limit and paging the subquery could help.

The subquery downside, it’s always going to have to do some kind of Hash Join / Nested Loop / Merge Join operation. It appears the current subquery design does need addressing.

pjungwir · 2018-07-06T21:37:27Z

We need the rank to be something we could ORDER by. But before we had the subquery, the only way I knew of to make the rank available to the ORDER clause was to add it to the SELECT clause.

This sounds like a misdiagnosis. You can order by something without selecting it.

I can replace our pg_search_scope declaration with this and it works fine:

scope :search, ->(term) {
  where(<<-EOQ, term: term).
    users.searchable @@ to_tsquery('simple', ''' ' || :term || ' ''' || ':*')
  EOQ
  order(<<-EOQ)
    ts_rank(users.searchable,
            to_tsquery('simple', ''' ' || '#{connection.quote_string(term)}' || ' ''' || ':*'), 0) DESC,
    users.id ASC
  EOQ
}

dwillett · 2018-07-16T16:41:42Z

Similar to @CarlosEspejo's comment, the inability to filter on the subquery renders the performance unusable for our use case. We have lots of records but our interface has a date range filter to restrict to recent records, so we can't have the subquery executing the full text search against all records.

Added a simple workaround in a fork to allow a block for subquery chaining, but not confident that would be the best overall way to go about it: dwillett@c6418b5

e.g.

Foo.search_by_title 'bar' do |subquery_relation|
  subquery_relation.where('date(created_at) > ?', 7.days.ago)
end

ezekg · 2019-08-05T14:30:23Z

@dwillett your patch worked great for me. Queries on large accounts went from a few seconds to < 10ms. Open to getting this merged upstream, @nertzy?

Edit: another solution, which is one I actually ended up going with, is not unscoping the model's scope before adding the pg_search-specific scopes.

catwell · 2022-03-08T16:56:58Z

For what it's worth, I have the same issue and will probably use the same workaround as @dwillett. It would be great to see a way to filter within the subquery (using an additional_attributes) merged.

@dwillett

Nearly identical to the patch provided by @dwillett, I've updated to apply to latest version of pg_search and added a spec + some documentation.

BenKanouse · 2024-11-12T21:33:21Z

I’m experiencing the same issue and suspect this might be a common challenge with this gem, especially for multi-tenant applications or any setup that relies on partitioning and aims to limit queries to specific partitions.

Thanks for all the work on this gem, I think it’s great!

jjohnson mentioned this issue Feb 12, 2017

pg_search_scope : how to prevent auto builded nested queries ? #307

Open

m5rk mentioned this issue Apr 2, 2023

Allow for further filtering of subquery #511

Open

wmoxam added a commit to wmoxam/pg_search that referenced this issue Sep 27, 2024

Fixes Casecommons#292

7b95337

Nearly identical to the patch provided by @dwillett, I've updated to apply to latest version of pg_search and added a spec + some documentation.

wmoxam mentioned this issue Sep 27, 2024

Fixes #292 #542

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search rank subquery generates unuseable query plan under certain conditions #292

Search rank subquery generates unuseable query plan under certain conditions #292

ajb commented Mar 23, 2016

ajb commented Mar 28, 2016

nertzy commented Mar 28, 2016

nertzy commented Mar 28, 2016

ajb commented Mar 29, 2016

nertzy commented Mar 29, 2016

ajb commented Mar 29, 2016

nertzy commented Mar 29, 2016

ajb commented Mar 30, 2016

ppostma commented May 27, 2016

CarlosEspejo commented Sep 26, 2016

jjohnson commented Feb 12, 2017

pjungwir commented Jul 6, 2018

dwillett commented Jul 16, 2018

ezekg commented Aug 5, 2019 •

edited

Loading

catwell commented Mar 8, 2022

BenKanouse commented Nov 12, 2024

Search rank subquery generates unuseable query plan under certain conditions #292

Search rank subquery generates unuseable query plan under certain conditions #292

Comments

ajb commented Mar 23, 2016

ajb commented Mar 28, 2016

nertzy commented Mar 28, 2016

nertzy commented Mar 28, 2016

ajb commented Mar 29, 2016

nertzy commented Mar 29, 2016

ajb commented Mar 29, 2016

nertzy commented Mar 29, 2016

ajb commented Mar 30, 2016

ppostma commented May 27, 2016

CarlosEspejo commented Sep 26, 2016

jjohnson commented Feb 12, 2017

pjungwir commented Jul 6, 2018

dwillett commented Jul 16, 2018

ezekg commented Aug 5, 2019 • edited Loading

catwell commented Mar 8, 2022

BenKanouse commented Nov 12, 2024

ezekg commented Aug 5, 2019 •

edited

Loading