-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue 159: Gorouter emits way too many metrics #275
Conversation
cloudfoundry#159 The metrics latency, latency.component and route_lookup_time are sent for each request and cause significant load on gorouter and loggregator. In our measurements the CPU load was reduced by 40% in both gorouter and doppler VMs. This is a huge gain for large scale. Note that the latency metrics is still available at the varz endpoint.
cloudfoundry#159 Switching off these events reduces the CPU load on gorouter and doppler VMS significantly. These events are of type "timer".
cloudfoundry#159 Switching off these events reduces the CPU load on gorouter and doppler VMS significantly. These events are of type "timer". Note that they should not be switched off when the app-autoscaler is used.
We have created an issue in Pivotal Tracker to manage this. Unfortunately, the Pivotal Tracker project is private so you may be unable to view the contents of the story. The labels on this github issue will be updated when the story is started. |
@stefanlay Thank you so much! 👏 👏 🥳 👏 👏 🎈 👏 👏 💯 👏 👏 🥇 👏 👏 ✨ 👏 👏 🎉 👏 👏 🍰 👏 👏 🍨 👏 👏 🐱 👏 👏 I will try to review this and test for myself sometime next week. |
Hi @ameowlia, I'd like to gently ask if you may find time to look into this. This change would help us very much running our big CF foundations where we in the meantime have > 30k Requests/s, and this number is increasing. |
bump this. I will follow-up on any questions from @ameowlia |
Hi all, Apologizes for the long time to review 😞 I just tested and everything worked as expected. @stefanlay can you make a PR to routing-release to add these config vars? Thanks, Amelia |
Many thanks, @ameowlia. Here is the PR for the routong-release: cloudfoundry/routing-release#198 |
A short explanation of the proposed change:
This change addresses Gorouter emits way too many metrics (metric frequency proportional to request frequency) #159.
On large scale landscapes the loggregator stack is a limiting factor for further growth. A big part of the loggregator load is due to the per-request envelopes sent by the gorouter. The envelopes are the following: The metrics
latency
,latency .<component>
androute_lookup_time
, two envelopes of typetimer
, and the access log.This change makes sending the metrics and the two envelopes of type
timer
configurable.An explanation of the use cases your change solves
In our load tests we measured a decrease of CPU load on gorouter and doppler of about 40% without the metrics and of about 60% without metrics and timer
Instructions to functionally test the behavior change using operator interfaces (BOSH manifest, logs, curl, and metrics)
Switch off sending the metrics and/or the timer envelopes in the manifest:
per_request_metrics_reporting: false
send_http_start_stop_server_event: false
send_http_start_stop_client_event: false
Please note that send_http_start_stop_client_event should not be set to false when the app autoscaler is used.
Expected result after the change
Neither the metrics nor the timer events are sent. This can be seen with
cf tail <appname>
.The gorouter and doppler CPU load decreases.
Current result before the change
The settings have no effect
I have viewed signed and have submitted the Contributor License Agreement
I have made this pull request to the
main
branchI have run all the unit tests using
scripts/run-unit-tests-in-docker
from routing-release.