-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow API requests for /api/srch/profiles?query= #4670
Comments
I can't find what the ExecuteSearch function does... I assume some kind of SQL query is built somewhere in the code. Maybe it is slow because the LIKE statement is used like this Edit: Also, it seems that the SQL statement for searching for users doesn't have a limit applied to it until after the query is done, so instead of the SQL query stopping after collecting a certain amount of matches, it keeps going (?) until the whole table has been searched. search_service.rb#L225. Performing a query like api/srch/profiles?query=b takes 12 seconds for me, I don't think that would be possible if there was a limit statement at the end of the SQL query. |
We just cached this: #4763 (comment) but we should still optimize, thanks for the research! |
plots2/app/services/search_service.rb Lines 223 to 225 in a68a07d
Hmm, could you try out a couple of these options to see what's faster? I think it may be worthwhile trying to auto-generate a LOT of users in your local copy, maybe using the Rails console, something like: 1000.times do
User.create({...}).save
end For examples of this, check out how we generate seed data here: https://github.com/publiclab/plots2/tree/master/db/seeds.rb Then you can watch on the log output to see how long it takes to run these queries, when hitting a URL like this:
Does that make sense? Thanks for your help! |
@milaaraujo also see @spelgubbe's working on this part of the problem too 👍 |
Also see notes in #3147 for other possible optimizations! |
I could try to set up a testing environment... I have only 1 proposed change right now. Either it makes no difference at all or it makes a huge difference. (when searching for profiles) https://github.com/spelgubbe/plots2/commit/c7202e7364d78466bb46f12b61c168469d403fc8 |
Oh awesome - well, if you want to open a pull request, we can also test it
on our unstable testing server, which has a full copy of the database.
…On Tue, Feb 5, 2019 at 12:25 PM Jakob K ***@***.***> wrote:
I could try to set up a testing environment... I have only 1 proposed
change right now. Either it makes no difference at all or it makes a huge
difference. (when searching for profiles)
***@***.***
<spelgubbe/plots2@c7202e7>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4670 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABfJ90sUbTijeJgp91Cdn1nDWX1VxIYks5vKb5SgaJpZM4aJI7p>
.
|
OK, report back on speeds is that caching with 2 day expiry drops us from 2-4 seconds on profile API response to ~60-700ms, which is nice. But I still think optimization of the query would help in the initial 2-4 second response, so I encourage @spelgubbe to keep going! ALSO, i want to illustrate how (noting that debounce has been added in #4904 !)
Analysis of response timingAnd a longer interactive session here: You can see how typing |
I set up a test environment at cloud9. I added 20000 fake users using 10000.times do
name = ('a'..'z').to_a.shuffle[0,8].join
email = ('a'..'z').to_a.shuffle[0,14].join
testuser = User.create! "username" => name,
"email" => email + "@example.com",
"status" => 1,
"openid_identifier" => nil,
"password" => "password",
"password_confirmation" => "password"
testuser.role = "basic"
testuser.save()
end It seems that the SQL statement or where in the code you use the .limit() function doesn't affect the load time of the query at all. Something else is taking time, or my test environment is just weird. Query takes no time at all but page takes 1.8sec to load. Edit: I will add more users to see if that changes anything. |
hmm so what are you trying out -- want to open a PR so we can all take a
look? Thanks!
…On Tue, Feb 5, 2019 at 6:02 PM Jakob K ***@***.***> wrote:
I set up a test environment at cloud9. I added 10000 fake users using
10000.times do
name = ('a'..'z').to_a.shuffle[0,8].join
email = ('a'..'z').to_a.shuffle[0,14].join
testuser = User.create! "username" => name,
"email" => email + ***@***.***",
"status" => 1,
"openid_identifier" => nil,
"password" => "password",
"password_confirmation" => "password"
testuser.role = "basic"
testuser.save()end
It seems that the SQL statement or where in the code you use the .limit()
function doesn't affect the load time of the query at all. Something else
is taking time, or my test environment is just weird.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4670 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABfJ280T5LbFuZ_4bNlE8uMohYp0Po-ks5vKg1lgaJpZM4aJI7p>
.
|
I'm not able to reproduce the issues in my test environment. |
That's ok - if you open a PR we can also push that branch to unstable to
test on a real full database. If you'd like, we can make a conditional to
try 2 different queries based on a parameter, like `params[:optimize]` and
can then try it on unstable with
https://unstable.publiclab.org/...?optimize=true vs false -- does that make
sense? And test the response time that way.
…On Fri, Feb 8, 2019 at 4:55 AM Jakob K ***@***.***> wrote:
hmm so what are you trying out -- want to open a PR so we can all take a
look? Thanks!
… <#m_7208254936179464451_>
On Tue, Feb 5, 2019 at 6:02 PM Jakob K ***@***.***> wrote: I set up a test
environment at cloud9. I added 10000 fake users using 10000.times do name =
('a'..'z').to_a.shuffle[0,8].join email =
('a'..'z').to_a.shuffle[0,14].join testuser = User.create! "username" =>
name, "email" => email + ***@***.***", "status" => 1, "openid_identifier"
=> nil, "password" => "password", "password_confirmation" => "password"
testuser.role = "basic" testuser.save()end It seems that the SQL statement
or where in the code you use the .limit() function doesn't affect the load
time of the query at all. Something else is taking time, or my test
environment is just weird. — You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#4670 (comment)
<#4670 (comment)>>,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AABfJ280T5LbFuZ_4bNlE8uMohYp0Po-ks5vKg1lgaJpZM4aJI7p
.
I'm not able to reproduce the issues in my test environment.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#4670 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABfJ7-zeyfhiJ2BrSnVM-7cDST_afnAks5vLUmLgaJpZM4aJI7p>
.
|
debounce is now implemented for a range of instances of this, such as https://stable.publiclab.org/post "related notes", comment Now we should address optimizing the underlying queries, as we are seeing them cause site slowness in unexpected ways here: |
Closing to move to #5015! |
References #3147. Hinders # 239. I think this should be a top priority issue, since gradually more users will be added and with time, the API will be used more than ever, both can prove fatal to the response time.
Note: This only happens for
/api/srch/profiles?query=
(7.8s) as of now, but can happen with other queries too, such as/api/srch/tags?query=
(2.26s) when they grow in number in near future.We can consider implementing better Data Structures and algorithms for storing and searching the data.
The text was updated successfully, but these errors were encountered: