-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Export jemalloc stats to prometheus when used #9882
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks generally good. Can you make it less exception-swallowy?
synapse/metrics/__init__.py
Outdated
try: | ||
_setup_jemalloc_stats() | ||
except Exception: | ||
logger.info("Failed to setup collector to record jemalloc stats.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably give a clue about what the error was?
synapse/metrics/__init__.py
Outdated
regex = re.compile(r"/\S+/libjemalloc.*$") | ||
|
||
jemalloc_path = None | ||
with open(f"/proc/{pid}/maps") as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is linux-specific, so you might want to consider making it degrade nicely for BSD, windows, etc.
synapse/metrics/__init__.py
Outdated
@@ -597,6 +600,163 @@ def f(*args, **kwargs): | |||
return f | |||
|
|||
|
|||
def _setup_jemalloc_stats(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you stick all this in a different file rather than in __init__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of course, if you were really keen, you'd make a separate python package that can be reused by other jemalloc users...
synapse/metrics/__init__.py
Outdated
except Exception: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...?
synapse/metrics/__init__.py
Outdated
except Exception: | ||
# There was an error fetching the value, skip. | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plsno
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
@@ -115,6 +116,7 @@ def start_reactor( | |||
|
|||
def run(): | |||
logger.info("Running") | |||
setup_jemalloc_stats() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to get called here, mainly so that we try to set it up after we've got logging set up.
synapse/metrics/jemalloc.py
Outdated
except Exception: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still feel like we shouldn't be completely dropping these exceptions. Why not log something at warn
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, github obviously marked this as outdated. Will fix.
synapse/metrics/jemalloc.py
Outdated
except Exception as e: | ||
logger.info("Failed to setup collector to record jemalloc stats: %s", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when do you expect this to be hit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can happen if we choose the wrong jemalloc library mainly, though that shouldn't really happen if we're looking in /proc/self/maps
Synapse 1.34.0 (2021-05-17) =========================== This release deprecates the `room_invite_state_types` configuration setting. See the [upgrade notes](https://github.com/matrix-org/synapse/blob/release-v1.34.0/UPGRADE.rst#upgrading-to-v1340) for instructions on updating your configuration file to use the new `room_prejoin_state` setting. This release also deprecates the `POST /_synapse/admin/v1/rooms/<room_id>/delete` admin API route. Server administrators are encouraged to update their scripts to use the new `DELETE /_synapse/admin/v1/rooms/<room_id>` route instead. No significant changes since v1.34.0rc1. Synapse 1.34.0rc1 (2021-05-12) ============================== Features -------- - Add experimental option to track memory usage of the caches. ([\matrix-org#9881](matrix-org#9881)) - Add support for `DELETE /_synapse/admin/v1/rooms/<room_id>`. ([\matrix-org#9889](matrix-org#9889)) - Add limits to how often Synapse will GC, ensuring that large servers do not end up GC thrashing if `gc_thresholds` has not been correctly set. ([\matrix-org#9902](matrix-org#9902)) - Improve performance of sending events for worker-based deployments using Redis. ([\matrix-org#9905](matrix-org#9905), [\matrix-org#9950](matrix-org#9950), [\matrix-org#9951](matrix-org#9951)) - Improve performance after joining a large room when presence is enabled. ([\matrix-org#9910](matrix-org#9910), [\matrix-org#9916](matrix-org#9916)) - Support stable identifiers for [MSC1772](matrix-org/matrix-spec-proposals#1772) Spaces. `m.space.child` events will now be taken into account when populating the experimental spaces summary response. Please see [the upgrade notes](https://github.com/matrix-org/synapse/blob/release-v1.34.0/UPGRADE.rst#upgrading-to-v1340) if you have customised `room_invite_state_types` in your configuration. ([\matrix-org#9915](matrix-org#9915), [\matrix-org#9966](matrix-org#9966)) - Improve performance of backfilling in large rooms. ([\matrix-org#9935](matrix-org#9935)) - Add a config option to allow you to prevent device display names from being shared over federation. Contributed by @aaronraimist. ([\matrix-org#9945](matrix-org#9945)) - Update support for [MSC2946](matrix-org/matrix-spec-proposals#2946): Spaces Summary. ([\matrix-org#9947](matrix-org#9947), [\matrix-org#9954](matrix-org#9954)) Bugfixes -------- - Fix a bug introduced in v1.32.0 where the associated connection was improperly logged for SQL logging statements. ([\matrix-org#9895](matrix-org#9895)) - Correct the type hint for the `user_may_create_room_alias` method of spam checkers. It is provided a `RoomAlias`, not a `str`. ([\matrix-org#9896](matrix-org#9896)) - Fix bug where user directory could get out of sync if room visibility and membership changed in quick succession. ([\matrix-org#9910](matrix-org#9910)) - Include the `origin_server_ts` property in the experimental [MSC2946](matrix-org/matrix-spec-proposals#2946) support to allow clients to properly sort rooms. ([\matrix-org#9928](matrix-org#9928)) - Fix bugs introduced in v1.23.0 which made the PostgreSQL port script fail when run with a newly-created SQLite database. ([\matrix-org#9930](matrix-org#9930)) - Fix a bug introduced in Synapse 1.29.0 which caused `m.room_key_request` to-device messages sent from one user to another to be dropped. ([\matrix-org#9961](matrix-org#9961), [\matrix-org#9965](matrix-org#9965)) - Fix a bug introduced in v1.27.0 preventing users and appservices exempt from ratelimiting from creating rooms with many invitees. ([\matrix-org#9968](matrix-org#9968)) Updates to the Docker image --------------------------- - Add `startup_delay` to docker healthcheck to reduce waiting time for coming online and update the documentation with extra options. Contributed by @maquis196. ([\matrix-org#9913](matrix-org#9913)) Improved Documentation ---------------------- - Add `port` argument to the Postgres database sample config section. ([\matrix-org#9911](matrix-org#9911)) Deprecations and Removals ------------------------- - Mark as deprecated `POST /_synapse/admin/v1/rooms/<room_id>/delete`. ([\matrix-org#9889](matrix-org#9889)) Internal Changes ---------------- - Reduce the length of Synapse's access tokens. ([\matrix-org#5588](matrix-org#5588)) - Export jemalloc stats to Prometheus if it is being used. ([\matrix-org#9882](matrix-org#9882)) - Add type hints to presence handler. ([\matrix-org#9885](matrix-org#9885)) - Reduce memory usage of the LRU caches. ([\matrix-org#9886](matrix-org#9886)) - Add type hints to the `synapse.handlers` module. ([\matrix-org#9896](matrix-org#9896)) - Time response time for external cache requests. ([\matrix-org#9904](matrix-org#9904)) - Minor fixes to the `make_full_schema.sh` script. ([\matrix-org#9931](matrix-org#9931)) - Move database schema files into a common directory. ([\matrix-org#9932](matrix-org#9932)) - Add debug logging for lost/delayed to-device messages. ([\matrix-org#9959](matrix-org#9959))
Synapse 1.34.0 (2021-05-17) =========================== This release deprecates the `room_invite_state_types` configuration setting. See the [upgrade notes](https://github.com/matrix-org/synapse/blob/release-v1.34.0/UPGRADE.rst#upgrading-to-v1340) for instructions on updating your configuration file to use the new `room_prejoin_state` setting. This release also deprecates the `POST /_synapse/admin/v1/rooms/<room_id>/delete` admin API route. Server administrators are encouraged to update their scripts to use the new `DELETE /_synapse/admin/v1/rooms/<room_id>` route instead. No significant changes since v1.34.0rc1. Synapse 1.34.0rc1 (2021-05-12) ============================== Features -------- - Add experimental option to track memory usage of the caches. ([\matrix-org#9881](matrix-org#9881)) - Add support for `DELETE /_synapse/admin/v1/rooms/<room_id>`. ([\matrix-org#9889](matrix-org#9889)) - Add limits to how often Synapse will GC, ensuring that large servers do not end up GC thrashing if `gc_thresholds` has not been correctly set. ([\matrix-org#9902](matrix-org#9902)) - Improve performance of sending events for worker-based deployments using Redis. ([\matrix-org#9905](matrix-org#9905), [\matrix-org#9950](matrix-org#9950), [\matrix-org#9951](matrix-org#9951)) - Improve performance after joining a large room when presence is enabled. ([\matrix-org#9910](matrix-org#9910), [\matrix-org#9916](matrix-org#9916)) - Support stable identifiers for [MSC1772](matrix-org/matrix-spec-proposals#1772) Spaces. `m.space.child` events will now be taken into account when populating the experimental spaces summary response. Please see [the upgrade notes](https://github.com/matrix-org/synapse/blob/release-v1.34.0/UPGRADE.rst#upgrading-to-v1340) if you have customised `room_invite_state_types` in your configuration. ([\matrix-org#9915](matrix-org#9915), [\matrix-org#9966](matrix-org#9966)) - Improve performance of backfilling in large rooms. ([\matrix-org#9935](matrix-org#9935)) - Add a config option to allow you to prevent device display names from being shared over federation. Contributed by @aaronraimist. ([\matrix-org#9945](matrix-org#9945)) - Update support for [MSC2946](matrix-org/matrix-spec-proposals#2946): Spaces Summary. ([\matrix-org#9947](matrix-org#9947), [\matrix-org#9954](matrix-org#9954)) Bugfixes -------- - Fix a bug introduced in v1.32.0 where the associated connection was improperly logged for SQL logging statements. ([\matrix-org#9895](matrix-org#9895)) - Correct the type hint for the `user_may_create_room_alias` method of spam checkers. It is provided a `RoomAlias`, not a `str`. ([\matrix-org#9896](matrix-org#9896)) - Fix bug where user directory could get out of sync if room visibility and membership changed in quick succession. ([\matrix-org#9910](matrix-org#9910)) - Include the `origin_server_ts` property in the experimental [MSC2946](matrix-org/matrix-spec-proposals#2946) support to allow clients to properly sort rooms. ([\matrix-org#9928](matrix-org#9928)) - Fix bugs introduced in v1.23.0 which made the PostgreSQL port script fail when run with a newly-created SQLite database. ([\matrix-org#9930](matrix-org#9930)) - Fix a bug introduced in Synapse 1.29.0 which caused `m.room_key_request` to-device messages sent from one user to another to be dropped. ([\matrix-org#9961](matrix-org#9961), [\matrix-org#9965](matrix-org#9965)) - Fix a bug introduced in v1.27.0 preventing users and appservices exempt from ratelimiting from creating rooms with many invitees. ([\matrix-org#9968](matrix-org#9968)) Updates to the Docker image --------------------------- - Add `startup_delay` to docker healthcheck to reduce waiting time for coming online and update the documentation with extra options. Contributed by @maquis196. ([\matrix-org#9913](matrix-org#9913)) Improved Documentation ---------------------- - Add `port` argument to the Postgres database sample config section. ([\matrix-org#9911](matrix-org#9911)) Deprecations and Removals ------------------------- - Mark as deprecated `POST /_synapse/admin/v1/rooms/<room_id>/delete`. ([\matrix-org#9889](matrix-org#9889)) Internal Changes ---------------- - Reduce the length of Synapse's access tokens. ([\matrix-org#5588](matrix-org#5588)) - Export jemalloc stats to Prometheus if it is being used. ([\matrix-org#9882](matrix-org#9882)) - Add type hints to presence handler. ([\matrix-org#9885](matrix-org#9885)) - Reduce memory usage of the LRU caches. ([\matrix-org#9886](matrix-org#9886)) - Add type hints to the `synapse.handlers` module. ([\matrix-org#9896](matrix-org#9896)) - Time response time for external cache requests. ([\matrix-org#9904](matrix-org#9904)) - Minor fixes to the `make_full_schema.sh` script. ([\matrix-org#9931](matrix-org#9931)) - Move database schema files into a common directory. ([\matrix-org#9932](matrix-org#9932)) - Add debug logging for lost/delayed to-device messages. ([\matrix-org#9959](matrix-org#9959))
Synapse 1.34.0 (2021-05-17) =========================== This release deprecates the `room_invite_state_types` configuration setting. See the [upgrade notes](https://github.com/matrix-org/synapse/blob/release-v1.34.0/UPGRADE.rst#upgrading-to-v1340) for instructions on updating your configuration file to use the new `room_prejoin_state` setting. This release also deprecates the `POST /_synapse/admin/v1/rooms/<room_id>/delete` admin API route. Server administrators are encouraged to update their scripts to use the new `DELETE /_synapse/admin/v1/rooms/<room_id>` route instead. No significant changes since v1.34.0rc1. Synapse 1.34.0rc1 (2021-05-12) ============================== Features -------- - Add experimental option to track memory usage of the caches. ([\#9881](matrix-org/synapse#9881)) - Add support for `DELETE /_synapse/admin/v1/rooms/<room_id>`. ([\#9889](matrix-org/synapse#9889)) - Add limits to how often Synapse will GC, ensuring that large servers do not end up GC thrashing if `gc_thresholds` has not been correctly set. ([\#9902](matrix-org/synapse#9902)) - Improve performance of sending events for worker-based deployments using Redis. ([\#9905](matrix-org/synapse#9905), [\#9950](matrix-org/synapse#9950), [\#9951](matrix-org/synapse#9951)) - Improve performance after joining a large room when presence is enabled. ([\#9910](matrix-org/synapse#9910), [\#9916](matrix-org/synapse#9916)) - Support stable identifiers for [MSC1772](matrix-org/matrix-spec-proposals#1772) Spaces. `m.space.child` events will now be taken into account when populating the experimental spaces summary response. Please see [the upgrade notes](https://github.com/matrix-org/synapse/blob/release-v1.34.0/UPGRADE.rst#upgrading-to-v1340) if you have customised `room_invite_state_types` in your configuration. ([\#9915](matrix-org/synapse#9915), [\#9966](matrix-org/synapse#9966)) - Improve performance of backfilling in large rooms. ([\#9935](matrix-org/synapse#9935)) - Add a config option to allow you to prevent device display names from being shared over federation. Contributed by @aaronraimist. ([\#9945](matrix-org/synapse#9945)) - Update support for [MSC2946](matrix-org/matrix-spec-proposals#2946): Spaces Summary. ([\#9947](matrix-org/synapse#9947), [\#9954](matrix-org/synapse#9954)) Bugfixes -------- - Fix a bug introduced in v1.32.0 where the associated connection was improperly logged for SQL logging statements. ([\#9895](matrix-org/synapse#9895)) - Correct the type hint for the `user_may_create_room_alias` method of spam checkers. It is provided a `RoomAlias`, not a `str`. ([\#9896](matrix-org/synapse#9896)) - Fix bug where user directory could get out of sync if room visibility and membership changed in quick succession. ([\#9910](matrix-org/synapse#9910)) - Include the `origin_server_ts` property in the experimental [MSC2946](matrix-org/matrix-spec-proposals#2946) support to allow clients to properly sort rooms. ([\#9928](matrix-org/synapse#9928)) - Fix bugs introduced in v1.23.0 which made the PostgreSQL port script fail when run with a newly-created SQLite database. ([\#9930](matrix-org/synapse#9930)) - Fix a bug introduced in Synapse 1.29.0 which caused `m.room_key_request` to-device messages sent from one user to another to be dropped. ([\#9961](matrix-org/synapse#9961), [\#9965](matrix-org/synapse#9965)) - Fix a bug introduced in v1.27.0 preventing users and appservices exempt from ratelimiting from creating rooms with many invitees. ([\#9968](matrix-org/synapse#9968)) Updates to the Docker image --------------------------- - Add `startup_delay` to docker healthcheck to reduce waiting time for coming online and update the documentation with extra options. Contributed by @maquis196. ([\#9913](matrix-org/synapse#9913)) Improved Documentation ---------------------- - Add `port` argument to the Postgres database sample config section. ([\#9911](matrix-org/synapse#9911)) Deprecations and Removals ------------------------- - Mark as deprecated `POST /_synapse/admin/v1/rooms/<room_id>/delete`. ([\#9889](matrix-org/synapse#9889)) Internal Changes ---------------- - Reduce the length of Synapse's access tokens. ([\#5588](matrix-org/synapse#5588)) - Export jemalloc stats to Prometheus if it is being used. ([\#9882](matrix-org/synapse#9882)) - Add type hints to presence handler. ([\#9885](matrix-org/synapse#9885)) - Reduce memory usage of the LRU caches. ([\#9886](matrix-org/synapse#9886)) - Add type hints to the `synapse.handlers` module. ([\#9896](matrix-org/synapse#9896)) - Time response time for external cache requests. ([\#9904](matrix-org/synapse#9904)) - Minor fixes to the `make_full_schema.sh` script. ([\#9931](matrix-org/synapse#9931)) - Move database schema files into a common directory. ([\#9932](matrix-org/synapse#9932)) - Add debug logging for lost/delayed to-device messages. ([\#9959](matrix-org/synapse#9959))
This is useful to see how much memory Synapse is actually using, as opposed to the memory reserved from the OS.
We do this by loading jemalloc lib and doing calls to
mallctl
to fetch the various stats we care about, such asstats.allocated
.Note: By default python has a small object allocator that sits on top of jemalloc, so unless that is disabled via
PYTHONMALLOC=malloc
theallocated
stats won't be accurate.