Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace context propagation docs and enable distributed traces by default #1557

Merged
merged 7 commits into from
Jan 22, 2025

Conversation

grcevski
Copy link
Contributor

@grcevski grcevski commented Jan 21, 2025

Add documentation for the new context propagation support and enable it by default.

Copy link

codecov bot commented Jan 21, 2025

Codecov Report

Attention: Patch coverage is 42.85714% with 4 lines in your changes missing coverage. Please review.

Project coverage is 71.13%. Comparing base (1dad6ff) to head (c6ed7e6).
Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
pkg/internal/ebpf/tctracer/tctracer.go 20.00% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1557      +/-   ##
==========================================
- Coverage   71.17%   71.13%   -0.04%     
==========================================
  Files         197      197              
  Lines       19668    19671       +3     
==========================================
- Hits        13998    13993       -5     
- Misses       4993     4997       +4     
- Partials      677      681       +4     
Flag Coverage Δ
integration-test 53.32% <42.85%> (+0.62%) ⬆️
k8s-integration-test 53.45% <42.85%> (-0.09%) ⬇️
oats-test 34.43% <28.57%> (+1.71%) ⬆️
unittests 47.30% <0.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@rafaelroquetto rafaelroquetto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Will leave it for the @grafsean to approve


Beyla will read any incoming trace context header values, track the Go program execution flow and propagate the trace context by automatically adding the `traceparent` field in outgoing HTTP/gRPC requests. If an application already adds the `taceparent` field in outgoing requests, Beyla will use that value for tracing instead its own generated trace context. If Beyla cannot find an incoming `traceparent` context value, it will generate one according to the W3C specification.
Beyla will read any incoming trace context header values, track the program execution flow and propagate the trace context by automatically adding the `traceparent` field in outgoing HTTP/gRPC requests. If an application already adds the `taceparent` field in outgoing requests, Beyla will use that value for tracing instead its own generated trace context. If Beyla cannot find an incoming `traceparent` context value, it will generate one according to the W3C specification.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo taceparent


For TLS encrypted traffic (HTTPS), Beyla is unable to inject the trace information in the outgoing HTTP headers and instead it injects the information
at TCP/IP packet level. Because of this limitation, Beyla is only able to send the trace information to other Beyla instrumented services. L7 proxies
and load balancers will also disrupt the TCP/IP context propagation, becuase the original packets are discarded and replayed downstream.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: becuase

@grcevski
Copy link
Contributor Author

Looks good to me. Will leave it for the @grafsean to approve

I made some code changes to fix things, do you mind looking over the code changes? I found a bug that our TC code was overriding a manually set TP in an outgoing request. NodeClient tests for this.

| ---------------------------- | -------------------------------------- | ------- | ------- |
| `enable_context_propagation` | `BEYLA_BPF_ENABLE_CONTEXT_PROPAGATION` | boolean | (true) |

Enables injecting of the `Traceparent` header value for outgoing HTTP requests, allowing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to link somewhere here to docs/sources/distributed-traces.md?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done!

Copy link
Contributor

@marctc marctc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we forgetting to mention limitations with Java? or we already overcame those?

@marctc marctc requested a review from grafsean January 22, 2025 09:59
Copy link
Contributor

@grafsean grafsean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the docs. I'll copy edit when I do a pass over these files.

@grcevski
Copy link
Contributor Author

Aren't we forgetting to mention limitations with Java? or we already overcame those?

I think we need to separately document those in limitations.md, since they are not related to the ability to write outgoing header values, but more related to internal thread handoff.

@@ -134,6 +137,11 @@ static __always_inline void http_get_or_create_trace_info(http_connection_metada
if (meta) {
u32 type = trace_type_from_meta(meta);
set_trace_info_for_connection(conn, type, tp_p);
// If the user code setup traceparent manually, don't interfere and add
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Non-rhetorical question) why does this matter? For (outbound) client calls, wouldn't we want to update the outgoing_trace_map (done in server_or_client_trace()) so that the tc egress program can refer back to it and write the traceparent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think we shouldn't mess with the user application code. For example, imagine a user has done this in their code:

    const options = {
        headers: {
            "traceparent": ...
        }
    }
    
    http.get(url, options, (res) => {
    ... do something
    });

They expect that their downstream service will get this traceparent, I don't know why would they do this, but they can. If we don't check for this condition and setup the outgoing trace map, then Beyla will overwrite the span_id of the outgoing request. The trace_id component will be respected, but the span_id will get generated and updated. This span_id manipulation happens because we use the TCP seq/ack for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the same time, this is the user sending the traceparent on their own, all the TC code would do in this case is just duplicate the header, which will have no effect.

@@ -138,11 +138,16 @@ func (p *Tracer) startTC(ctx context.Context) {
return
}

if p.cfg.EBPF.UseTCForL7CP {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if instead of trying to enforce this at the level where the tracers are created, we should enforce this at the config validation level, not letting it propagate....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's true, we'll need to change the tests to reflect that, because we now have to explicitly disable ContextPropagation if TCL7 is used. Do you think it's worth the effort since we don't actually support this option?

@marctc
Copy link
Contributor

marctc commented Jan 22, 2025

Aren't we forgetting to mention limitations with Java? or we already overcame those?

I think we need to separately document those in limitations.md, since they are not related to the ability to write outgoing header values, but more related to internal thread handoff.

ah sorry, I thought this is the same document

@grcevski grcevski merged commit fa8b83a into grafana:main Jan 22, 2025
17 checks passed
@grcevski grcevski deleted the update_tracing_docs branch January 22, 2025 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants