Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Timeout exceeded on Silences page #873

Closed
damomurf opened this issue Jun 18, 2017 · 17 comments
Closed

Error: Timeout exceeded on Silences page #873

damomurf opened this issue Jun 18, 2017 · 17 comments

Comments

@damomurf
Copy link

damomurf commented Jun 18, 2017

As part of dealing with #844 I've switching to v0.7.1 and almost every time I click on the "Silences" page, I receive a message in the selected Pending, Active or Expired tabs below of "Error: Timeout Exceeded".

I've enabled debug but can't see anything relevant in the Alertmanager logs.

71225807-71d4-4f58-a832-08f746e09f5b

@mxinden
Copy link
Member

mxinden commented Jun 19, 2017

@damomurf Have you tried deleting your browser cache? Does the browsers JS console or network tab say anything? Do you run Alertmanager behind a reversed proxy?

@damomurf
Copy link
Author

damomurf commented Jun 19, 2017

@mxinden No reverse proxy in play - direct to port 9093 for Alertmanager running as an Ubuntu service. The JS console shows nothing and the network info says that the request to silences?silenced=false returns a 200 but shows no content.

If I request the full silences?silcenced=false url using curl from a terminal, it seems to return the silence data with no issue, but we do have a reasonable about of silence data so it takes a few seconds to return.

I also tried in a clean browser, after deleting cache and cookie history, but same issue. There seems to be a problem in that the silences request actually returns a 200, but both Chrome and Safari are erroring on the returned data (and refuse to display it).

A successful silences request seems to return 126.88 KB of data. A failing one only returns 35.68 KB.

@mxinden
Copy link
Member

mxinden commented Jun 19, 2017

@damomurf Can you tell us which order of magnitude the amount of your silences is, so that we can try to reproduce it?

@damomurf
Copy link
Author

damomurf commented Jun 19, 2017

There are 234 silences in total, with 2 currently active. As I edited above, a successful silences API request seems to return 126.88 KB of data. A failing one only returns 35.68 KB.

@mxinden
Copy link
Member

mxinden commented Jun 26, 2017

@damomurf I have created 10000 silences in an Alertmanager instance and the UI is still working just fine. But I remember seeing Error: Timeout exceeded once before, but I am having difficulties reproducing it. Can you try out a fresh Alertmanager instance without any silences and see if you are still facing this issue?

I am guessing you are running on a Mac. Both @stuartnelson3 and @w0rm do so too. Has one of you ever seen this issue?

@w0rm
Copy link
Member

w0rm commented Jun 26, 2017

The fix should be rather easy, I don't think we need to restrict the timeout in the UI: https://github.com/prometheus/alertmanager/blob/master/ui/app/src/Utils/Api.elm#L84 we should really set it to Nothing because limiting it provides no real benefits and may cause problems in slow networks.

@mxinden
Copy link
Member

mxinden commented Jun 26, 2017

@w0rm Cool, thanks for the hint!

@damomurf Can you try this out in your setup and if it works create a PR? If not, I can do it as well. Either one is fine with me.

@pieterdejaeghere
Copy link

fyi: this issue does indeed pop up on slow (or just global >200ms) connections. It was reproducible on our network with 100% probability during a sudden latency hike on an intercontinental link.

@w0rm
Copy link
Member

w0rm commented Jun 26, 2017

@mxinden if you need a way to reproduce this, just throttle the network in the Chrome DevTools.

@damomurf
Copy link
Author

Aha. Of course - I'm in Australia hitting AlertManager running in the US with standard latency ~250+ms or so, so this makes a whole lot of sense. I'll try a custom build with the change above and see how I go.

@damomurf
Copy link
Author

damomurf commented Jun 27, 2017

I've just made the recommended change on a build from master but the problem still exists:

+++ b/ui/app/src/Utils/Api.elm
@@ -81,7 +81,7 @@ request method headers url body decoder =
         , url = url
         , body = body
         , expect = Http.expectJson decoder
-        , timeout = Just defaultTimeout
+        , timeout = Nothing
         , withCredentials = False
         }

4405145c-327e-4aa4-a61d-906475eaea83

I've tried cleaning up anything I possibly could incase the .elm stuff is being cached somewhere, but no go.

I even tried altering defaultTimeout to 100 seconds instead of 1, but it almost seems like it doesn't have any effect.

@w0rm
Copy link
Member

w0rm commented Jun 27, 2017

@damomurf have you done all steps: 1. Compiled the Elm code 2. Generated a bindata.go (make assets) 3. Compiled alertmanager?

@damomurf
Copy link
Author

@w0rm - I simply invoked "make build" as documented in the repo README.md. Is there another step to ensure the compilation (and importantly force re-compilation) of the Elm code?

@mxinden
Copy link
Member

mxinden commented Jun 27, 2017

@damomurf Yes there is. Sorry, we are not doing a very good job documenting it.

cd ui/app
make script.js
cd ../..
make assets
make build

I will improve this with #792 hopefully this week. This requires you to have Elm and UglifyJS installed.

rul pushed a commit to rul/alertmanager that referenced this issue Jun 27, 2017
@rul
Copy link
Contributor

rul commented Jun 27, 2017

This bug hit me too. I've tried compiling it with the fix in place and I can confirm it works. I'm not making a PR as the fix implies modifying a pseudo-binary file (bindata.go). I think it's better if a trusted collaborator of the project does that instead of a random Internet guy. :-)

This bug makes alertmanager ui unusable if the user is far from the server, so it would be really nice if we can have 0.7.2 with this fix.

Thanks!

@mxinden
Copy link
Member

mxinden commented Jun 27, 2017

@rul In regards to "random Internet guy": Generating bindata.go is deterministic. So our CI checks every commit, whether the code and the commited binary matches. So feel free to open it up, otherwise I will do so tomorrow.

@rul @damomurf Thanks for all the feedback. Very much appreciated.

rul pushed a commit to rul/alertmanager that referenced this issue Jun 27, 2017
@w0rm
Copy link
Member

w0rm commented Jun 27, 2017

Fixed in #890

@w0rm w0rm closed this as completed Jun 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants