Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E2E logging coverage #158

Closed
karelz opened this issue May 12, 2020 · 3 comments
Closed

E2E logging coverage #158

karelz opened this issue May 12, 2020 · 3 comments

Comments

@karelz
Copy link
Member

karelz commented May 12, 2020

Should cover:

  • YARP code
  • ASP.NET Core components
  • dotnet/runtime components
@karelz karelz added Type: Idea This issue is a high-level idea for discussion. Type: Enhancement New feature or request and removed Type: Idea This issue is a high-level idea for discussion. labels May 12, 2020
@karelz karelz added this to the 1.0.0 milestone May 12, 2020
@karelz
Copy link
Member Author

karelz commented Sep 17, 2020

We need design of the list.
@davidni can you please post the required list?

@karelz karelz modified the milestones: 1.0.0, 1.0.0-preview6 Sep 17, 2020
@davidni
Copy link
Contributor

davidni commented Sep 17, 2020

Here's the timing diagram of one duplex proxy request and the things we keep track of:

image

Raw timing markers we collect for each request

Measure Description
t1_aspNetCoreRouting Marks just after ASP .NET Endpoint Routing has executed the routing logic to pick a matching route for an incoming request.
t2_pickDestination Marks just after Island Gateway has selected one endpoint of the target service to route to.
t3_buildTargetUrl Marks just after Island Gateway has computed the final target uri to proxy a request to.
t4_reqHeadersStart Marks just before Island Gateway issues the outbound HTTPs call to the target endpoint. In code, this occurs just before we call HttpMessageInvoker.SendAsync.
t5_reqBodyStart Marks the start of the stream-copy operation copying bytes from the client's request body to the backend.
t6a1_reqBodyFirstRead Marks when the first call to ReadAsync completes on the incoming request stream. This measures when Island Gateway received the first byte in the request body from the client.
t6a2_reqBodyEnd Marks when the stream-copy operation on the request path (i.e. client to Island Gateway to backend) completed.
t6b1_respHeadersReceived Marks when Island Gateway received response headers from the backend service
t6b2_respBodyStart Marks the start of the stream-copy operation copying bytes from the backend's response body to the client.
t6b3_respBodyFirstRead Marks when the first call to ReadAsync completes on the backend's response stream. This measures when Island Gateway received the first byte in the response body from the backend.
t6b4_respBodyEnd Marks when the stream-copy operation on the response path (i.e. backend to Island Gateway to client) completed.
t6b5_respEnd Marks when Island Gateway has completed its response to the client. In HTTP2 requests, this includes sending END_STREAM to the client.

Aggregates we collect for each request

Measure Description
reqBytes Total number of bytes proxied on the request path (i.e. from client to Island Gateway to backend).
reqOps Total IO ReadAsync operations performed on the request path (i.e. from client to Island Gateway to backend).
reqReadTime Total time spent in ReadAsync calls to receive request body bytes from the client.
reqWriteTime Total time spent in WriteAsync calls to send request body bytes to the backend.
respBytes Total number of bytes proxied on the response path (i.e. from backend to Island Gateway to client).
respOps Total IO ReadAsync operations performed on the response path (i.e. from backend to Island Gateway to client).
respReadTime Total time spent in ReadAsync calls to receive response body bytes from the backend.
respWriteTime Total time spent in WriteAsync calls to send response body bytes to the client.

Notable measures (computed using the above raw values)

The following computations are useful in investigations, and have special meanings that should be well understood:

  • Request overhead: t4. Measures how long Island Gateway took before even attempting to call the backend service.

  • Response overhead: t6b2 - t6b1. Measures how long Island Gateway took between receiving the backend's response and starting to send it back to the client.

  • Backend time to respond with headers: t6b1 - t4. Measures how long a backend service took, from Island Gateway's perspective, to receive request headers and produce a meaningful HTTP response. This includes the time to establish a connection to the backend service when a new connection was needed.

  • Request stream duration: t6a2 - t5. Measures how long the request body stream was open for. Long latencies here may indicate the client is slow to send bytes, or the backend is slow to accept request bytes. Use reqReadTime and reqWriteTime to identify which one was to blame.

  • Response stream duration: t6a2 - t5. Measures how long the response body stream was open for. Long latencies here may indicate the backend is slow to send bytes, or the client is slow to accept request bytes. Use respReadTime and respWriteTime to identify which one was to blame.

  • After-request stream duration: max_of(t6a2, t6b4) - t6a2. Measures how long the response stream delayed the proxy completion after the request stream was done. If the response stream finished first, this is zero. This may be useful in figuring out which component is causing E2E proxy operations to take longer, and high values indicate that the response stream is the culprit.

  • After-response stream duration: max_of(t6a2, t6b4) - t6b4. Measures how long the request stream delayed the proxy completion after the response stream was done. If the request stream finished first, this is zero. This may be useful in figuring out which component is causing E2E proxy operations to take longer, and high values indicate that the request stream is the culprit.

@MihaZupan
Copy link
Member

The events and counters discussed in this issue have been added by #501.

Closing this issue in favour of #61, #660 for tracking any remaining diagnostics that should be added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants