-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sliding Sync: Add more tracing #17514
Sliding Sync: Add more tracing #17514
Conversation
next_pos=SlidingSyncStreamToken(to_token, connection_position), | ||
lists=lists, | ||
rooms=rooms, | ||
extensions=extensions, | ||
) | ||
|
||
# Make it easy to find traces for syncs that aren't empty | ||
set_tag(SynapseTags.RESULT_PREFIX + "result", bool(sliding_sync_result)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good name for this tag? I'm trying to describe empty
but would rather avoid the negated condition for people to use empty=false
if there is a better name. full
doesn't describe this properly.
Willing to use empty
with a nudge though.
set_tag( | ||
SynapseTags.FUNC_ARG_PREFIX + "required_state_other_count", | ||
num_others, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just log these to the spans (using log_kv
), tags are more meant to be searchable things that are limited in scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the comparison to the prometheus metrics where we do have to worry about cardinality of values (be mindful of limiting the number possible values).
But we already don't care about this for tracing spans given we set the MXID in authenticated_entity
, the full http.url
with specific tokens, request_id
with unique values, etc.
I can update it if this is something we are going to start caring about. There are lots of other places to update where we tag counts as well.
I tried finding some references in the OpenTracing/OpenTelemetry docs but I didn't really see anything conclusive for tracing span tags/attributes:
References:
- https://medium.com/@emmanuel.courreges/jaeger-tracing-and-opentelemetry-usage-guidelines-cdc1fcf48415
- https://opentelemetry.io/docs/concepts/signals/traces/#when-to-use-span-events-versus-span-attributes
- https://github.com/opentracing/specification/blob/master/specification.md#note-required-formats-for-injection-and-extraction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, fair enough. Though I think it probably would be easier better to not tag these counts? Then its just a single log_kv({ ... })
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly easier given we currently set required_state_wildcard
/required_state_wildcard_event_type
in other blocks. But we can always pull the necessary booleans/counts out further and use log_kv({ ... })
if you prefer.
Thanks for the review @erikjohnston 🐋 |
# Synapse 1.114.0 (2024-09-02) This release enables support for [MSC4186](matrix-org/matrix-spec-proposals#4186) — Simplified Sliding Sync. This allows using the upcoming releases of the Element X mobile apps without having to run a Sliding Sync Proxy. ### Features - Enable native sliding sync support ([MSC3575](matrix-org/matrix-spec-proposals#3575) and [MSC4186](matrix-org/matrix-spec-proposals#4186)) by default. ([\#17648](element-hq/synapse#17648)) # Synapse 1.114.0rc3 (2024-08-30) ### Bugfixes - Fix regression in v1.114.0rc2 that caused workers to fail to start. ([\#17626](element-hq/synapse#17626)) # Synapse 1.114.0rc2 (2024-08-30) ### Features - Improve cross-signing upload when using [MSC3861](matrix-org/matrix-spec-proposals#3861) to use a custom UIA flow stage, with web fallback support. ([\#17509](element-hq/synapse#17509)) - Make `hash_password` script accept password input from stdin. ([\#17608](element-hq/synapse#17608)) ### Bugfixes - Fix hierarchy returning 403 when room is accessible through federation. Contributed by Krishan (@kfiven). ([\#17194](element-hq/synapse#17194)) - Fix content-length on federation `/thumbnail` responses. ([\#17532](element-hq/synapse#17532)) - Fix authenticated media responses using a wrong limit when following redirects over federation. ([\#17543](element-hq/synapse#17543)) ### Internal Changes - MSC3861: load the issuer and account management URLs from OIDC discovery. ([\#17407](element-hq/synapse#17407)) - Refactor sliding sync class into multiple files. ([\#17595](element-hq/synapse#17595)) - Store sliding sync per-connection state in the database. ([\#17599](element-hq/synapse#17599)) - Make the sliding sync `PerConnectionState` class immutable. ([\#17600](element-hq/synapse#17600)) - Add support to `@tag_args` for standalone functions. ([\#17604](element-hq/synapse#17604)) - Speed up incremental syncs in sliding sync by adding some more caching. ([\#17606](element-hq/synapse#17606)) - Always return the user's own read receipts in sliding sync. ([\#17617](element-hq/synapse#17617)) - Replace `isort` and `black` with `ruff`. ([\#17620](element-hq/synapse#17620)) - Refactor sliding sync code to move room list logic out into a separate class. ([\#17622](element-hq/synapse#17622)) ### Updates to locked dependencies * Bump attrs from 23.2.0 to 24.2.0. ([\#17609](element-hq/synapse#17609)) * Bump cryptography from 42.0.8 to 43.0.0. ([\#17584](element-hq/synapse#17584)) * Bump phonenumbers from 8.13.43 to 8.13.44. ([\#17610](element-hq/synapse#17610)) * Bump pygithub from 2.3.0 to 2.4.0. ([\#17612](element-hq/synapse#17612)) * Bump pyyaml from 6.0.1 to 6.0.2. ([\#17611](element-hq/synapse#17611)) * Bump sentry-sdk from 2.12.0 to 2.13.0. ([\#17585](element-hq/synapse#17585)) * Bump serde from 1.0.206 to 1.0.208. ([\#17581](element-hq/synapse#17581)) * Bump serde from 1.0.208 to 1.0.209. ([\#17613](element-hq/synapse#17613)) * Bump serde_json from 1.0.124 to 1.0.125. ([\#17582](element-hq/synapse#17582)) * Bump serde_json from 1.0.125 to 1.0.127. ([\#17614](element-hq/synapse#17614)) * Bump types-jsonschema from 4.23.0.20240712 to 4.23.0.20240813. ([\#17583](element-hq/synapse#17583)) * Bump types-setuptools from 71.1.0.20240726 to 71.1.0.20240818. ([\#17586](element-hq/synapse#17586)) # Synapse 1.114.0rc1 (2024-08-20) ### Features - Add a flag to `/versions`, `org.matrix.simplified_msc3575`, to indicate whether experimental sliding sync support has been enabled. ([\#17571](element-hq/synapse#17571)) - Handle changes in `timeline_limit` in experimental sliding sync. ([\#17579](element-hq/synapse#17579)) - Correctly track read receipts that should be sent down in experimental sliding sync. ([\#17575](element-hq/synapse#17575), [\#17589](element-hq/synapse#17589), [\#17592](element-hq/synapse#17592)) ### Bugfixes - Start handlers for new media endpoints when media resource configured. ([\#17483](element-hq/synapse#17483)) - Fix timeline ordering (using `stream_ordering` instead of topological ordering) in experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint. ([\#17510](element-hq/synapse#17510)) - Fix experimental sliding sync implementation to remember any updates in rooms that were not sent down immediately. ([\#17535](element-hq/synapse#17535)) - Better exclude partially stated rooms if we must await full state in experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint. ([\#17538](element-hq/synapse#17538)) - Handle lower-case http headers in `_Mulitpart_Parser_Protocol`. ([\#17545](element-hq/synapse#17545)) - Fix fetching federation signing keys from servers that omit `old_verify_keys`. Contributed by @tulir @ Beeper. ([\#17568](element-hq/synapse#17568)) - Fix bug where we would respond with an error when a remote server asked for media that had a length of 0, using the new multipart federation media endpoint. ([\#17570](element-hq/synapse#17570)) ### Improved Documentation - Clarify default behaviour of the [`auto_accept_invites.worker_to_run_on`](https://element-hq.github.io/synapse/develop/usage/configuration/config_documentation.html#auto-accept-invites) option. ([\#17515](element-hq/synapse#17515)) - Improve docstrings for profile methods. ([\#17559](element-hq/synapse#17559)) ### Internal Changes - Add more tracing to experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint. ([\#17514](element-hq/synapse#17514)) - Fixup comment in sliding sync implementation. ([\#17531](element-hq/synapse#17531)) - Replace override of deprecated method `HTTPAdapter.get_connection` with `get_connection_with_tls_context`. ([\#17536](element-hq/synapse#17536)) - Fix performance of device lists in `/key/changes` and sliding sync. ([\#17537](element-hq/synapse#17537), [\#17548](element-hq/synapse#17548)) - Bump setuptools from 67.6.0 to 72.1.0. ([\#17542](element-hq/synapse#17542)) - Add a utility function for generating random event IDs. ([\#17557](element-hq/synapse#17557)) - Speed up responding to media requests. ([\#17558](element-hq/synapse#17558), [\#17561](element-hq/synapse#17561), [\#17564](element-hq/synapse#17564), [\#17566](element-hq/synapse#17566), [\#17567](element-hq/synapse#17567), [\#17569](element-hq/synapse#17569)) - Test github token before running release script steps. ([\#17562](element-hq/synapse#17562)) - Reduce log spam of multipart files. ([\#17563](element-hq/synapse#17563)) - Refactor per-connection state in experimental sliding sync handler. ([\#17574](element-hq/synapse#17574)) - Add histogram metrics for sliding sync processing time. ([\#17593](element-hq/synapse#17593)) ### Updates to locked dependencies * Bump bytes from 1.6.1 to 1.7.1. ([\#17526](element-hq/synapse#17526)) * Bump lxml from 5.2.2 to 5.3.0. ([\#17550](element-hq/synapse#17550)) * Bump phonenumbers from 8.13.42 to 8.13.43. ([\#17551](element-hq/synapse#17551)) * Bump regex from 1.10.5 to 1.10.6. ([\#17527](element-hq/synapse#17527)) * Bump sentry-sdk from 2.10.0 to 2.12.0. ([\#17553](element-hq/synapse#17553)) * Bump serde from 1.0.204 to 1.0.206. ([\#17556](element-hq/synapse#17556)) * Bump serde_json from 1.0.122 to 1.0.124. ([\#17555](element-hq/synapse#17555)) * Bump sigstore/cosign-installer from 3.5.0 to 3.6.0. ([\#17549](element-hq/synapse#17549)) * Bump types-pyyaml from 6.0.12.20240311 to 6.0.12.20240808. ([\#17552](element-hq/synapse#17552)) * Bump types-requests from 2.31.0.20240406 to 2.32.0.20240712. ([\#17524](element-hq/synapse#17524)) # Synapse 1.113.0 (2024-08-13) No significant changes since 1.113.0rc1. # Synapse 1.113.0rc1 (2024-08-06) ### Features - Track which rooms have been sent to clients in the experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint. ([\#17447](element-hq/synapse#17447)) - Add Account Data extension support to experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint. ([\#17477](element-hq/synapse#17477)) - Add receipts extension support to experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint. ([\#17489](element-hq/synapse#17489)) - Add typing notification extension support to experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint. ([\#17505](element-hq/synapse#17505)) ### Bugfixes - Update experimental [MSC3575](matrix-org/matrix-spec-proposals#3575) Sliding Sync `/sync` endpoint to handle invite/knock rooms when filtering. ([\#17450](element-hq/synapse#17450)) - Fix a bug introduced in v1.110.0 which caused `/keys/query` to return incomplete results, leading to high network activity and CPU usage on Matrix clients. ([\#17499](element-hq/synapse#17499)) ### Improved Documentation - Update the [`allowed_local_3pids`](https://element-hq.github.io/synapse/v1.112/usage/configuration/config_documentation.html#allowed_local_3pids) config option's msisdn address to a working example. ([\#17476](element-hq/synapse#17476)) ### Internal Changes - Change sliding sync to use their own token format in preparation for storing per-connection state. ([\#17452](element-hq/synapse#17452)) - Ensure we don't send down negative `bump_stamp` in experimental sliding sync endpoint. ([\#17478](element-hq/synapse#17478)) - Do not send down empty room entries down experimental sliding sync endpoint. ([\#17479](element-hq/synapse#17479)) - Refactor Sliding Sync tests to better utilize the `SlidingSyncBase`. ([\#17481](element-hq/synapse#17481), [\#17482](element-hq/synapse#17482)) - Add some opentracing tags and logging to the experimental sliding sync implementation. ([\#17501](element-hq/synapse#17501)) - Split and move Sliding Sync tests so we have some more sane test file sizes. ([\#17504](element-hq/synapse#17504)) - Update the `limited` field description in the Sliding Sync response to accurately describe what it actually represents. ([\#17507](element-hq/synapse#17507)) - Easier to understand `timeline` assertions in Sliding Sync tests. ([\#17511](element-hq/synapse#17511)) - Reset the sliding sync connection if we don't recognize the per-connection state position. ([\#17529](element-hq/synapse#17529)) ### Updates to locked dependencies * Bump bcrypt from 4.1.3 to 4.2.0. ([\#17495](element-hq/synapse#17495)) * Bump black from 24.4.2 to 24.8.0. ([\#17522](element-hq/synapse#17522)) * Bump phonenumbers from 8.13.39 to 8.13.42. ([\#17521](element-hq/synapse#17521)) * Bump ruff from 0.5.4 to 0.5.5. ([\#17494](element-hq/synapse#17494)) * Bump serde_json from 1.0.120 to 1.0.121. ([\#17493](element-hq/synapse#17493)) * Bump serde_json from 1.0.121 to 1.0.122. ([\#17525](element-hq/synapse#17525)) * Bump towncrier from 23.11.0 to 24.7.1. ([\#17523](element-hq/synapse#17523)) * Bump types-pyopenssl from 24.1.0.20240425 to 24.1.0.20240722. ([\#17496](element-hq/synapse#17496)) * Bump types-setuptools from 70.1.0.20240627 to 71.1.0.20240726. ([\#17497](element-hq/synapse#17497))
Add more tracing. Spawning from looking at a couple traces and wanting a little more info.
Follow-up to github.com//pull/17501
The changes in this PR allow you to find slow Sliding Sync traces ignoring the
wait_for_events
time. In Jaeger, you can now filter for thecurrent_sync_for_user
operation withRESULT.result=true
indicating that it actually returned non-empty results.If you want to find traces for your own user, you can use
RESULT.result=true ARG.sync_config.user="@madlittlemods:matrix.org"
Pull Request Checklist
EventStore
toEventWorkerStore
.".code blocks
.(run the linters)