Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue 159: Gorouter emits way too many metrics #275

Merged
merged 3 commits into from
Mar 2, 2021

Conversation

stefanlay
Copy link
Member

  • A short explanation of the proposed change:
    This change addresses Gorouter emits way too many metrics (metric frequency proportional to request frequency) #159.
    On large scale landscapes the loggregator stack is a limiting factor for further growth. A big part of the loggregator load is due to the per-request envelopes sent by the gorouter. The envelopes are the following: The metrics latency, latency .<component> and route_lookup_time , two envelopes of type timer, and the access log.
    This change makes sending the metrics and the two envelopes of type timer configurable.

  • An explanation of the use cases your change solves
    In our load tests we measured a decrease of CPU load on gorouter and doppler of about 40% without the metrics and of about 60% without metrics and timer

  • Instructions to functionally test the behavior change using operator interfaces (BOSH manifest, logs, curl, and metrics)
    Switch off sending the metrics and/or the timer envelopes in the manifest:
    per_request_metrics_reporting: false
    send_http_start_stop_server_event: false
    send_http_start_stop_client_event: false
    Please note that send_http_start_stop_client_event should not be set to false when the app autoscaler is used.

  • Expected result after the change
    Neither the metrics nor the timer events are sent. This can be seen with cf tail <appname>.
    The gorouter and doppler CPU load decreases.

  • Current result before the change
    The settings have no effect

  • I have viewed signed and have submitted the Contributor License Agreement

  • I have made this pull request to the main branch

  • I have run all the unit tests using scripts/run-unit-tests-in-docker from routing-release.

cloudfoundry#159

The metrics latency, latency.component and route_lookup_time are
sent for each request and cause significant load on gorouter and
loggregator. In our measurements the CPU load was reduced by 40%
in both gorouter and doppler VMs. This is a huge gain for large
scale.

Note that the latency metrics is still available at the varz
endpoint.
cloudfoundry#159

Switching off these events reduces the CPU load on gorouter and
doppler VMS significantly.

These events are of type "timer".
cloudfoundry#159

Switching off these events reduces the CPU load on gorouter and
doppler VMS significantly.

These events are of type "timer".

Note that they should not be switched off when the app-autoscaler
is used.
@cf-gitbot
Copy link

We have created an issue in Pivotal Tracker to manage this. Unfortunately, the Pivotal Tracker project is private so you may be unable to view the contents of the story.

The labels on this github issue will be updated when the story is started.

@ameowlia
Copy link
Member

@stefanlay Thank you so much! 👏 👏 🥳 👏 👏 🎈 👏 👏 💯 👏 👏 🥇 👏 👏 ✨ 👏 👏 🎉 👏 👏 🍰 👏 👏 🍨 👏 👏 🐱 👏 👏

I will try to review this and test for myself sometime next week.

@stefanlay
Copy link
Member Author

Hi @ameowlia, I'd like to gently ask if you may find time to look into this. This change would help us very much running our big CF foundations where we in the meantime have > 30k Requests/s, and this number is increasing.

@stefanlay stefanlay changed the title Metrics159 Fix issue 159: Gorouter emits way too many metrics Feb 18, 2021
@domdom82
Copy link
Contributor

domdom82 commented Mar 2, 2021

bump this. I will follow-up on any questions from @ameowlia

@ameowlia
Copy link
Member

ameowlia commented Mar 2, 2021

Hi all,

Apologizes for the long time to review 😞 I just tested and everything worked as expected.

@stefanlay can you make a PR to routing-release to add these config vars?

Thanks, Amelia

@stefanlay
Copy link
Member Author

Many thanks, @ameowlia. Here is the PR for the routong-release: cloudfoundry/routing-release#198

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants