Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow API requests for /api/srch/profiles?query= #4670

Closed
rexagod opened this issue Jan 19, 2019 · 14 comments
Closed

Slow API requests for /api/srch/profiles?query= #4670

rexagod opened this issue Jan 19, 2019 · 14 comments
Labels
brainstorm Issues that need discussion and requirements need to be elucidated bug the issue is regarding one of our programs which faces problems when a certain task is executed help wanted requires help by anyone willing to contribute high-priority

Comments

@rexagod
Copy link
Member

rexagod commented Jan 19, 2019

References #3147. Hinders # 239. I think this should be a top priority issue, since gradually more users will be added and with time, the API will be used more than ever, both can prove fatal to the response time.

Note: This only happens for /api/srch/profiles?query= (7.8s) as of now, but can happen with other queries too, such as /api/srch/tags?query=(2.26s) when they grow in number in near future.

api

We can consider implementing better Data Structures and algorithms for storing and searching the data.

@rexagod rexagod added bug the issue is regarding one of our programs which faces problems when a certain task is executed high-priority help wanted requires help by anyone willing to contribute brainstorm Issues that need discussion and requirements need to be elucidated labels Jan 19, 2019
@milaaraujo milaaraujo mentioned this issue Jan 19, 2019
4 tasks
@spelgubbe
Copy link

spelgubbe commented Feb 5, 2019

I can't find what the ExecuteSearch function does... I assume some kind of SQL query is built somewhere in the code. Maybe it is slow because the LIKE statement is used like this
select * from table where username like "%query%" (then you have to iterate through the whole table)

Edit:
Ok I found it, in search_service.rb#L220. If someone is searching for a specific user, I don't think there's any reason to search for %user%. Rather search for user% and the SQL query should go much faster. Also there needs to be a lower limit to how short the query can be, I don't think there's any reason to allow someone to search for a name shorter than 3 letters.

Also, it seems that the SQL statement for searching for users doesn't have a limit applied to it until after the query is done, so instead of the SQL query stopping after collecting a certain amount of matches, it keeps going (?) until the whole table has been searched. search_service.rb#L225. Performing a query like api/srch/profiles?query=b takes 12 seconds for me, I don't think that would be possible if there was a limit statement at the end of the SQL query.

@jywarren
Copy link
Member

jywarren commented Feb 5, 2019

We just cached this: #4763 (comment) but we should still optimize, thanks for the research!

@jywarren
Copy link
Member

jywarren commented Feb 5, 2019

User.where('rusers.status = 1')
.joins(:user_tags)\
.where('user_tags.value LIKE ?', '%' + query + '%')\

Hmm, could you try out a couple of these options to see what's faster?

#4561 (comment)

I think it may be worthwhile trying to auto-generate a LOT of users in your local copy, maybe using the Rails console, something like:

1000.times do
  User.create({...}).save
end

For examples of this, check out how we generate seed data here: https://github.com/publiclab/plots2/tree/master/db/seeds.rb

Then you can watch on the log output to see how long it takes to run these queries, when hitting a URL like this:

/api/srch/profiles?query=sidha&sort_by=recent&field=username

Does that make sense? Thanks for your help!

@jywarren
Copy link
Member

jywarren commented Feb 5, 2019

@milaaraujo also see @spelgubbe's working on this part of the problem too 👍

@jywarren
Copy link
Member

jywarren commented Feb 5, 2019

Also see notes in #3147 for other possible optimizations!

@spelgubbe
Copy link

I could try to set up a testing environment... I have only 1 proposed change right now. Either it makes no difference at all or it makes a huge difference. (when searching for profiles)

https://github.com/spelgubbe/plots2/commit/c7202e7364d78466bb46f12b61c168469d403fc8

@jywarren
Copy link
Member

jywarren commented Feb 5, 2019 via email

@jywarren
Copy link
Member

jywarren commented Feb 5, 2019

OK, report back on speeds is that caching with 2 day expiry drops us from 2-4 seconds on profile API response to ~60-700ms, which is nice. But I still think optimization of the query would help in the initial 2-4 second response, so I encourage @spelgubbe to keep going!

ALSO, i want to illustrate how atWho integration could be tuned using different techniques:

(noting that debounce has been added in #4904 !)

  1. a debounce strategy as in Explore request timing optimizations for typeahead search and username/tag autocompletion #3472 and initial attempt in initial debounced typeahead with debug statements #3172
  2. not sending requests until at least 3 characters are typed (see Explore request timing optimizations for typeahead search and username/tag autocompletion #3472) (no longer needed now we have debounce in Added debounce for typeahead search optimization #4904)
  3. local caching of responses (see atwho docs?)
  4. responding more instantly to an empty query param so we don't get a 500 error which may be expensive

Analysis of response timing

image

And a longer interactive session here:

image

You can see how typing @warren 3 times, the later ones show much much lower load times; the initial ones are at 2-4 seconds, the later at 68-631ms, although one was 2.4s

@spelgubbe
Copy link

spelgubbe commented Feb 5, 2019

I set up a test environment at cloud9. I added 20000 fake users using

10000.times do
    name = ('a'..'z').to_a.shuffle[0,8].join
    email = ('a'..'z').to_a.shuffle[0,14].join
    
    testuser = User.create! "username" => name,
      "email" =>  email + "@example.com",
      "status" => 1,
      "openid_identifier" => nil,
      "password" => "password",
      "password_confirmation" => "password"
    testuser.role = "basic"
    testuser.save()
end

It seems that the SQL statement or where in the code you use the .limit() function doesn't affect the load time of the query at all. Something else is taking time, or my test environment is just weird.

example query

Query takes no time at all but page takes 1.8sec to load.

Edit: I will add more users to see if that changes anything.

@jywarren
Copy link
Member

jywarren commented Feb 6, 2019 via email

@spelgubbe
Copy link

hmm so what are you trying out -- want to open a PR so we can all take a look? Thanks!

On Tue, Feb 5, 2019 at 6:02 PM Jakob K @.> wrote: I set up a test environment at cloud9. I added 10000 fake users using 10000.times do name = ('a'..'z').to_a.shuffle[0,8].join email = ('a'..'z').to_a.shuffle[0,14].join testuser = User.create! "username" => name, "email" => email + @.", "status" => 1, "openid_identifier" => nil, "password" => "password", "password_confirmation" => "password" testuser.role = "basic" testuser.save()end It seems that the SQL statement or where in the code you use the .limit() function doesn't affect the load time of the query at all. Something else is taking time, or my test environment is just weird. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4670 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AABfJ280T5LbFuZ_4bNlE8uMohYp0Po-ks5vKg1lgaJpZM4aJI7p .

I'm not able to reproduce the issues in my test environment.

@jywarren
Copy link
Member

jywarren commented Feb 8, 2019 via email

@jywarren
Copy link
Member

jywarren commented Mar 11, 2019

debounce is now implemented for a range of instances of this, such as https://stable.publiclab.org/post "related notes", comment #tagname and @username autocompletion with at.js, and for the navbar search typeahead -- all in #4904

Now we should address optimizing the underlying queries, as we are seeing them cause site slowness in unexpected ways here:

#5015

@jywarren
Copy link
Member

Closing to move to #5015!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
brainstorm Issues that need discussion and requirements need to be elucidated bug the issue is regarding one of our programs which faces problems when a certain task is executed help wanted requires help by anyone willing to contribute high-priority
Projects
None yet
Development

No branches or pull requests

3 participants