Slow API requests for /api/srch/profiles?query= #4670

rexagod · 2019-01-19T13:31:41Z

References #3147. Hinders # 239. I think this should be a top priority issue, since gradually more users will be added and with time, the API will be used more than ever, both can prove fatal to the response time.

Note: This only happens for /api/srch/profiles?query= (7.8s) as of now, but can happen with other queries too, such as /api/srch/tags?query=(2.26s) when they grow in number in near future.

We can consider implementing better Data Structures and algorithms for storing and searching the data.

The text was updated successfully, but these errors were encountered:

spelgubbe · 2019-02-05T12:03:03Z

I can't find what the ExecuteSearch function does... I assume some kind of SQL query is built somewhere in the code. Maybe it is slow because the LIKE statement is used like this
select * from table where username like "%query%" (then you have to iterate through the whole table)

Edit:
Ok I found it, in search_service.rb#L220. If someone is searching for a specific user, I don't think there's any reason to search for %user%. Rather search for user% and the SQL query should go much faster. Also there needs to be a lower limit to how short the query can be, I don't think there's any reason to allow someone to search for a name shorter than 3 letters.

Also, it seems that the SQL statement for searching for users doesn't have a limit applied to it until after the query is done, so instead of the SQL query stopping after collecting a certain amount of matches, it keeps going (?) until the whole table has been searched. search_service.rb#L225. Performing a query like api/srch/profiles?query=b takes 12 seconds for me, I don't think that would be possible if there was a limit statement at the end of the SQL query.

jywarren · 2019-02-05T16:53:47Z

We just cached this: #4763 (comment) but we should still optimize, thanks for the research!

jywarren · 2019-02-05T16:56:58Z

plots2/app/services/search_service.rb

Lines 223 to 225 in a68a07d

    
           User.where('rusers.status = 1') 
        
               .joins(:user_tags)\ 
        
               .where('user_tags.value LIKE ?', '%' + query + '%')\

Hmm, could you try out a couple of these options to see what's faster?

#4561 (comment)

I think it may be worthwhile trying to auto-generate a LOT of users in your local copy, maybe using the Rails console, something like:

1000.times do
  User.create({...}).save
end

For examples of this, check out how we generate seed data here: https://github.com/publiclab/plots2/tree/master/db/seeds.rb

Then you can watch on the log output to see how long it takes to run these queries, when hitting a URL like this:

/api/srch/profiles?query=sidha&sort_by=recent&field=username

Does that make sense? Thanks for your help!

jywarren · 2019-02-05T16:57:26Z

@milaaraujo also see @spelgubbe's working on this part of the problem too 👍

jywarren · 2019-02-05T17:22:08Z

Also see notes in #3147 for other possible optimizations!

spelgubbe · 2019-02-05T17:24:34Z

I could try to set up a testing environment... I have only 1 proposed change right now. Either it makes no difference at all or it makes a huge difference. (when searching for profiles)

https://github.com/spelgubbe/plots2/commit/c7202e7364d78466bb46f12b61c168469d403fc8

jywarren · 2019-02-05T17:47:50Z

Oh awesome - well, if you want to open a pull request, we can also test it on our unstable testing server, which has a full copy of the database.

…

On Tue, Feb 5, 2019 at 12:25 PM Jakob K ***@***.***> wrote: I could try to set up a testing environment... I have only 1 proposed change right now. Either it makes no difference at all or it makes a huge difference. (when searching for profiles) ***@***.*** <spelgubbe/plots2@c7202e7> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4670 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABfJ90sUbTijeJgp91Cdn1nDWX1VxIYks5vKb5SgaJpZM4aJI7p> .

jywarren · 2019-02-05T19:28:11Z

OK, report back on speeds is that caching with 2 day expiry drops us from 2-4 seconds on profile API response to ~60-700ms, which is nice. But I still think optimization of the query would help in the initial 2-4 second response, so I encourage @spelgubbe to keep going!

ALSO, i want to illustrate how atWho integration could be tuned using different techniques:

(noting that debounce has been added in #4904 !)

a debounce strategy as in Explore request timing optimizations for typeahead search and username/tag autocompletion #3472 and initial attempt in initial debounced typeahead with debug statements #3172
not sending requests until at least 3 characters are typed (see Explore request timing optimizations for typeahead search and username/tag autocompletion #3472) (no longer needed now we have debounce in Added debounce for typeahead search optimization #4904)
local caching of responses (see atwho docs?)
responding more instantly to an empty query param so we don't get a 500 error which may be expensive

Analysis of response timing

And a longer interactive session here:

You can see how typing @warren 3 times, the later ones show much much lower load times; the initial ones are at 2-4 seconds, the later at 68-631ms, although one was 2.4s

spelgubbe · 2019-02-05T23:01:56Z

I set up a test environment at cloud9. I added 20000 fake users using

10000.times do
    name = ('a'..'z').to_a.shuffle[0,8].join
    email = ('a'..'z').to_a.shuffle[0,14].join
    
    testuser = User.create! "username" => name,
      "email" =>  email + "@example.com",
      "status" => 1,
      "openid_identifier" => nil,
      "password" => "password",
      "password_confirmation" => "password"
    testuser.role = "basic"
    testuser.save()
end

It seems that the SQL statement or where in the code you use the .limit() function doesn't affect the load time of the query at all. Something else is taking time, or my test environment is just weird.

Query takes no time at all but page takes 1.8sec to load.

Edit: I will add more users to see if that changes anything.

jywarren · 2019-02-06T21:39:19Z

hmm so what are you trying out -- want to open a PR so we can all take a look? Thanks!

…

On Tue, Feb 5, 2019 at 6:02 PM Jakob K ***@***.***> wrote: I set up a test environment at cloud9. I added 10000 fake users using 10000.times do name = ('a'..'z').to_a.shuffle[0,8].join email = ('a'..'z').to_a.shuffle[0,14].join testuser = User.create! "username" => name, "email" => email + ***@***.***", "status" => 1, "openid_identifier" => nil, "password" => "password", "password_confirmation" => "password" testuser.role = "basic" testuser.save()end It seems that the SQL statement or where in the code you use the .limit() function doesn't affect the load time of the query at all. Something else is taking time, or my test environment is just weird. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4670 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABfJ280T5LbFuZ_4bNlE8uMohYp0Po-ks5vKg1lgaJpZM4aJI7p> .

spelgubbe · 2019-02-08T09:55:22Z

hmm so what are you trying out -- want to open a PR so we can all take a look? Thanks!
…
On Tue, Feb 5, 2019 at 6:02 PM Jakob K @.> wrote: I set up a test environment at cloud9. I added 10000 fake users using 10000.times do name = ('a'..'z').to_a.shuffle[0,8].join email = ('a'..'z').to_a.shuffle[0,14].join testuser = User.create! "username" => name, "email" => email + @.", "status" => 1, "openid_identifier" => nil, "password" => "password", "password_confirmation" => "password" testuser.role = "basic" testuser.save()end It seems that the SQL statement or where in the code you use the .limit() function doesn't affect the load time of the query at all. Something else is taking time, or my test environment is just weird. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4670 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AABfJ280T5LbFuZ_4bNlE8uMohYp0Po-ks5vKg1lgaJpZM4aJI7p .

I'm not able to reproduce the issues in my test environment.

jywarren · 2019-02-08T15:17:34Z

That's ok - if you open a PR we can also push that branch to unstable to test on a real full database. If you'd like, we can make a conditional to try 2 different queries based on a parameter, like `params[:optimize]` and can then try it on unstable with https://unstable.publiclab.org/...?optimize=true vs false -- does that make sense? And test the response time that way.

…

On Fri, Feb 8, 2019 at 4:55 AM Jakob K ***@***.***> wrote: hmm so what are you trying out -- want to open a PR so we can all take a look? Thanks! … <#m_7208254936179464451_> On Tue, Feb 5, 2019 at 6:02 PM Jakob K ***@***.***> wrote: I set up a test environment at cloud9. I added 10000 fake users using 10000.times do name = ('a'..'z').to_a.shuffle[0,8].join email = ('a'..'z').to_a.shuffle[0,14].join testuser = User.create! "username" => name, "email" => email + ***@***.***", "status" => 1, "openid_identifier" => nil, "password" => "password", "password_confirmation" => "password" testuser.role = "basic" testuser.save()end It seems that the SQL statement or where in the code you use the .limit() function doesn't affect the load time of the query at all. Something else is taking time, or my test environment is just weird. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4670 (comment) <#4670 (comment)>>, or mute the thread https://github.com/notifications/unsubscribe-auth/AABfJ280T5LbFuZ_4bNlE8uMohYp0Po-ks5vKg1lgaJpZM4aJI7p . I'm not able to reproduce the issues in my test environment. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4670 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABfJ7-zeyfhiJ2BrSnVM-7cDST_afnAks5vLUmLgaJpZM4aJI7p> .

jywarren · 2019-03-11T20:29:49Z

debounce is now implemented for a range of instances of this, such as https://stable.publiclab.org/post "related notes", comment #tagname and @username autocompletion with at.js, and for the navbar search typeahead -- all in #4904

Now we should address optimizing the underlying queries, as we are seeing them cause site slowness in unexpected ways here:

#5015

jywarren · 2019-05-16T22:11:58Z

Closing to move to #5015!

rexagod added bug the issue is regarding one of our programs which faces problems when a certain task is executed high-priority help wanted requires help by anyone willing to contribute brainstorm Issues that need discussion and requirements need to be elucidated labels Jan 19, 2019

milaaraujo mentioned this issue Jan 19, 2019

API Optimization #4561

Open

4 tasks

rexagod mentioned this issue Jan 21, 2019

Added At.js support (removed Horsey and Banksy) publiclab/PublicLab.Editor#239

Merged

5 tasks

jywarren mentioned this issue Feb 5, 2019

Optimization needed for /api/srch/profiles API query #3147

Closed

jywarren mentioned this issue Feb 5, 2019

Caching on heavily-used API calls: srch/api/profiles #4763

Merged

jywarren added this to the Search improvements milestone Feb 5, 2019

jywarren mentioned this issue Feb 5, 2019

Weekly Community Check-In #8 - 'Brainstorming for ideas and guiding new comers' #4757

Closed

jywarren mentioned this issue Feb 8, 2019

Typeahead on [srch/notes] paralyzes app #4661

Closed

jywarren mentioned this issue Feb 20, 2019

Set a typeahead delay to prevent overloading server #4841

Merged

5 tasks

jywarren closed this as completed May 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow API requests for /api/srch/profiles?query= #4670

Slow API requests for /api/srch/profiles?query= #4670

rexagod commented Jan 19, 2019 •

edited

Loading

spelgubbe commented Feb 5, 2019 •

edited

Loading

jywarren commented Feb 5, 2019

jywarren commented Feb 5, 2019

jywarren commented Feb 5, 2019

jywarren commented Feb 5, 2019

spelgubbe commented Feb 5, 2019

jywarren commented Feb 5, 2019 via email

jywarren commented Feb 5, 2019 •

edited

Loading

spelgubbe commented Feb 5, 2019 •

edited

Loading

jywarren commented Feb 6, 2019 via email

spelgubbe commented Feb 8, 2019

jywarren commented Feb 8, 2019 via email

jywarren commented Mar 11, 2019 •

edited

Loading

jywarren commented May 16, 2019

Slow API requests for /api/srch/profiles?query= #4670

Slow API requests for /api/srch/profiles?query= #4670

Comments

rexagod commented Jan 19, 2019 • edited Loading

spelgubbe commented Feb 5, 2019 • edited Loading

jywarren commented Feb 5, 2019

jywarren commented Feb 5, 2019

jywarren commented Feb 5, 2019

jywarren commented Feb 5, 2019

spelgubbe commented Feb 5, 2019

jywarren commented Feb 5, 2019 via email

jywarren commented Feb 5, 2019 • edited Loading

Analysis of response timing

spelgubbe commented Feb 5, 2019 • edited Loading

jywarren commented Feb 6, 2019 via email

spelgubbe commented Feb 8, 2019

jywarren commented Feb 8, 2019 via email

jywarren commented Mar 11, 2019 • edited Loading

jywarren commented May 16, 2019

rexagod commented Jan 19, 2019 •

edited

Loading

spelgubbe commented Feb 5, 2019 •

edited

Loading

jywarren commented Feb 5, 2019 •

edited

Loading

spelgubbe commented Feb 5, 2019 •

edited

Loading

jywarren commented Mar 11, 2019 •

edited

Loading