Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Be able to correlate timeouts in reverse-proxy layer in front of Synapse (tag specific request headers) #13786

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/13786.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add `opentracing.request_headers_to_tag` config to specify which request headers extract and tag traces with in order to correlate timeouts in reverse-proxy layers in front of Synapse with traces.
8 changes: 8 additions & 0 deletions docs/usage/configuration/config_documentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -3612,6 +3612,11 @@ Sub-options include:
* `jaeger_config`: Jaeger can be configured to sample traces at different rates.
All configuration options provided by Jaeger can be set here. Jaeger's configuration is
mostly related to trace sampling which is documented [here](https://www.jaegertracing.io/docs/latest/sampling/).
* `request_headers_to_tag`: A list of headers to extract from the request and
add to to the top-level servlet tracing span as tags. Useful when you're using
a reverse proxy service like Cloudflare to protect your Synapse instance in
order to correlate and match up requests that timed out at the Cloudflare
layer to the Synapse traces.
Comment on lines +3615 to +3619
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to throw a wrench in these plans, but if other reverse proxies do something similar; I wonder if it would make more sense to have a header which could be the source for SynapseRequest.request_id (instead of generating a new one):

def get_request_id(self) -> str:
return "%s-%i" % (self.get_method(), self.request_seq)

Would that be clearer instead of having two separate IDs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Seems reasonable to me and a little cleaner. I'll create a new PR with this solution ⏩

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this would make it harder to track things in the logs or anything, but seems like it would be a lot cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried this out in #13801

Losing the simplicity of the incrementing integer to reference in logs as you scroll kinda sucks though. I'm leaning back to this option I think 🤔

Copy link
Member

@clokep clokep Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Losing the simplicity of the incrementing integer to reference in logs as you scroll kinda sucks though. I'm leaning back to this option I think 🤔

Is that really used though? I usually just copy and paste for grep. Once the number gets beyond a few digits it is useless anyway IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not the one having to log dive and debug big Synapse instances so I don't know. In tests, we will still have the same sequential request ID's since this config won't be defined. We're not changing any default either so it's basically opt-in.

If anyone has a strong desire for the old way, we can always revert as well. If you're happy with #13801, we can move forward with that ⏩

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like we should ask folks real quick then!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Example configuration:
```yaml
Expand All @@ -3629,6 +3634,9 @@ opentracing:
param: 1
logging:
false

request_headers_to_tag:
- "cf-ray"
```
---
## Workers ##
Expand Down
15 changes: 15 additions & 0 deletions synapse/api/auth.py
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,21 @@ async def get_user_by_req(
parent_span.set_tag("device_id", requester.device_id)
if requester.app_service is not None:
parent_span.set_tag("appservice_id", requester.app_service.id)

# Tag any headers that we need to extract from the request. This
# is useful to specify any headers that a reverse-proxy in front
# of Synapse may be sending to correlate and match up something
# in that layer to a Synapse trace. ex. when Cloudflare times
# out it gives a `cf-ray` header which we can also tag here to
# find the trace.
for header_name in self.hs.config.tracing.request_headers_to_tag:
headers = request.requestHeaders.getRawHeaders(header_name)
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved
if headers is not None:
parent_span.set_tag(
SynapseTags.REQUEST_HEADER_PREFIX + header_name,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really familiar with opentracing tags, is request_header.cf-ray an OK thing to do? Or do we need to normalize that to something like request_header.cf_ray or whatever?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tag can be whatever. And since it's the name of a header, I think it's probably easiest to follow with the actual header name. Similar to what we do with function arguments but those happen to conform to the Python snake casing already.

These are the OpenTelemetry docs but here are some naming conventions, https://opentelemetry.io/docs/reference/specification/common/attribute-naming/

str(headers[0]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the call to str? If header_name is a string, then headers[0] will be a string.

See:

https://github.com/twisted/twisted/blob/7c00738a0c24032070ce92304b4c8b887666c0fc/src/twisted/web/http_headers.py#L256-L258

)
Comment on lines +177 to +189
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested that this works locally but I'm not sure if the cf-ray header that we want to use on matrix.org is actually a thing.

Based on the following documentation and that we see it in our haproxy logs already, I assume this will be viable,

Add the CF-Ray header to your origin web server logs to match requests proxied to Cloudflare to requests in your server logs.


return requester

@cancellable
Expand Down
17 changes: 17 additions & 0 deletions synapse/config/tracer.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,13 @@ def read_config(self, config: JsonDict, **kwargs: Any) -> None:

self.force_tracing_for_users: Set[str] = set()

# A list of headers to extract from the request and add to to the
# top-level servlet tracing span as tags. Useful when you're using a
# reverse proxy service like Cloudflare to protect your Synapse instance
# in order to correlate and match up requests that timed out at the
# Cloudflare layer to the Synapse traces.
self.request_headers_to_tag = []

if not self.opentracer_enabled:
return

Expand All @@ -62,3 +69,13 @@ def read_config(self, config: JsonDict, **kwargs: Any) -> None:
("opentracing", "force_tracing_for_users", f"index {i}"),
)
self.force_tracing_for_users.add(u)

request_headers_to_tag = opentracing_config.get(
"request_headers_to_tag",
[],
)
if not isinstance(request_headers_to_tag, list):
raise ConfigError(
"Expected a list", ("opentracing", "request_headers_to_tag")
)
self.request_headers_to_tag = request_headers_to_tag
4 changes: 4 additions & 0 deletions synapse/logging/opentracing.py
Original file line number Diff line number Diff line change
Expand Up @@ -303,6 +303,10 @@ class SynapseTags:
# incoming HTTP request ID (as written in the logs)
REQUEST_ID = "request_id"

# Tag a header from the incoming request. The name of the header should be
# appended to this prefix.
REQUEST_HEADER_PREFIX = "request_header."

# HTTP request tag (used to distinguish full vs incremental syncs, etc)
REQUEST_TAG = "request_tag"

Expand Down