This repository has been archived by the owner on Apr 26, 2024. It is now read-only.
-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Add metrics to track how the rate limiter is affecting requests (sleep/reject) #13534
Merged
MadLittleMods
merged 6 commits into
develop
from
madlittlemods/track-metrics-from-rate-limiter
Aug 17, 2022
Merged
Changes from 3 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
325cadc
Add metrics to track how the rate limiter is affecting requests
MadLittleMods 3267318
Add changelog
MadLittleMods 5679bb2
Fix lints
MadLittleMods 149ac1d
Remove unbounded host from labels
MadLittleMods 75ca101
Merge branch 'develop' into madlittlemods/track-metrics-from-rate-lim…
MadLittleMods 2e0e5cc
Merge branch 'develop' into madlittlemods/track-metrics-from-rate-lim…
MadLittleMods File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Add metrics to track how the rate limiter is affecting requests (sleep/reject). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,6 +18,8 @@ | |
import typing | ||
from typing import Any, DefaultDict, Iterator, List, Set | ||
|
||
from prometheus_client.core import Counter | ||
|
||
from twisted.internet import defer | ||
|
||
from synapse.api.errors import LimitExceededError | ||
|
@@ -35,6 +37,11 @@ | |
logger = logging.getLogger(__name__) | ||
|
||
|
||
rate_limit_sleep_counter = Counter("synapse_rate_limit_sleep", "", ["host"]) | ||
|
||
rate_limit_reject_counter = Counter("synapse_rate_limit_reject", "", ["host"]) | ||
|
||
|
||
class FederationRateLimiter: | ||
def __init__(self, clock: Clock, config: FederationRatelimitSettings): | ||
def new_limiter() -> "_PerHostRatelimiter": | ||
|
@@ -59,7 +66,7 @@ def ratelimit(self, host: str) -> "_GeneratorContextManager[defer.Deferred[None] | |
Returns: | ||
context manager which returns a deferred. | ||
""" | ||
return self.ratelimiters[host].ratelimit() | ||
return self.ratelimiters[host].ratelimit(host) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Better way to pass the There doesn't seem to be a good way to make |
||
|
||
|
||
class _PerHostRatelimiter: | ||
|
@@ -94,12 +101,14 @@ def __init__(self, clock: Clock, config: FederationRatelimitSettings): | |
self.request_times: List[int] = [] | ||
|
||
@contextlib.contextmanager | ||
def ratelimit(self) -> "Iterator[defer.Deferred[None]]": | ||
def ratelimit(self, host: str) -> "Iterator[defer.Deferred[None]]": | ||
# `contextlib.contextmanager` takes a generator and turns it into a | ||
# context manager. The generator should only yield once with a value | ||
# to be returned by manager. | ||
# Exceptions will be reraised at the yield. | ||
|
||
self.host = host | ||
|
||
request_id = object() | ||
ret = self._on_enter(request_id) | ||
try: | ||
|
@@ -119,6 +128,7 @@ def _on_enter(self, request_id: object) -> "defer.Deferred[None]": | |
# sleeping or in the ready queue). | ||
queue_size = len(self.ready_request_queue) + len(self.sleeping_requests) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it be good to track |
||
if queue_size > self.reject_limit: | ||
rate_limit_reject_counter.labels(self.host).inc() | ||
raise LimitExceededError( | ||
retry_after_ms=int(self.window_size / self.sleep_limit) | ||
) | ||
|
@@ -146,6 +156,7 @@ def queue_request() -> "defer.Deferred[None]": | |
|
||
if len(self.request_times) > self.sleep_limit: | ||
logger.debug("Ratelimiter: sleeping request for %f sec", self.sleep_sec) | ||
rate_limit_sleep_counter.labels(self.host).inc() | ||
ret_defer = run_in_background(self.clock.sleep, self.sleep_sec) | ||
|
||
self.sleeping_requests.add(request_id) | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd vote for not adding
"host"
here, given that is an unbounded variable. If we add some logging for when we're sleeping hosts then the metric would allow us to see that we've triggered it and the logging which hosts are affected?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense 👍. Was focused about what was in the doc, metrics per-host.
Currently, the logs are mostly
debug
here so we'll have to turn them on when we want to see.Affected host count per time-period might be interesting to see so we can differentiate one really noisy homeserver from a general ratelimit tuning problem across the federation. I guess we would add a gauge for this. I can follow-up in another PR for this -> #13541