Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/traces] Action/sampling on resources attributes based on span attributes #20294

Closed
yehaotian opened this issue Mar 23, 2023 · 21 comments
Labels
enhancement New feature or request processor/tailsampling Tail sampling processor

Comments

@yehaotian
Copy link

yehaotian commented Mar 23, 2023

Component(s)

traces related processors

Is your feature request related to a problem? Please describe.

Usecase:
We have some debug span attributes and would like to 100% sample it in tailsamplingprocessor policy. At this moment tail sampling processor does not support policy on span attributes but only on resource and record string_attribute.

Currently we do not have status code on span, so that the workaround is using spanprocessor to set ERROR status when match the debug span attribute and leverage status_code policy for 100% sampling.
However, we have a new case need to send the trace(debug trace) contains the debug span attribute to a different destination instead sending all ERROR cases, how can we use the current attribute/filter/span/resources/tailsampling processors to achieve that?

Note: The debug span attribute only exist in the root span, not all spans in the trace.

Describe the solution you'd like

If the current processors cannot make it happen, several solutions can be applied:

  1. Have spanprocessor able to set resource and record attributes instead of just status code.
  2. Have resourcesprocessor able to insert/set resource and record attributes when match span attributes.
  3. Have attributesprocessor able to insert/set resource and record attributes when match span attributes.
  4. Have tailsamplingprocessor policy on span attributes (unlikely due to performance?)
  5. ...

Describe alternatives you've considered

No response

Additional context

No response

@yehaotian yehaotian added enhancement New feature or request needs triage New item requiring triage labels Mar 23, 2023
@yehaotian
Copy link
Author

cc: @TylerHelmuth

@TylerHelmuth TylerHelmuth added processor/tailsampling Tail sampling processor and removed needs triage New item requiring triage labels Mar 23, 2023
@github-actions
Copy link
Contributor

Pinging code owners for processor/tailsampling: @jpkrohling. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@TylerHelmuth
Copy link
Member

@jpkrohling I'm not familiar with the tailsamplingprocessor config, but when I hear "I wish I had access to this piece of telemetry" I think of OTTL. Is there a place in tailsamplingprocessor where we could utilize OTTL conditions?

@jpkrohling
Copy link
Member

That's an interesting thought. Perhaps we could do a brainstorming session, looking at the current policies and seeing which ones could be replaced by an "OTTL policy".

@yehaotian
Copy link
Author

Hi @jpkrohling @TylerHelmuth any updates on this? Are we expecting OTTL policy for tail sampling processor soon?

@TylerHelmuth
Copy link
Member

I think OTTL can help here, but I won't be able to work on it soon.

@jpkrohling
Copy link
Member

Same. I'm happy to review your PR, @yehaotian, in case you want to send one, but I don't have time to work on it myself.

@jiekun
Copy link
Member

jiekun commented Apr 11, 2023

@yehaotian CMIIW, the expected configuration should be something like this?

processors:
  tail_sampling:
    ...
    policies:
      [
          {
            name: ottl_policy_1,
            type: ottl_policy,
            query: sample(attributes["your.custom.attributes"] = "/health")  # or something else meets OTTL standard
          },

I feel like we could replace status_code / string_attribute / numeric_attribute / boolean_attribute / span_count / trace_state with OTTL or let them co-exist for a certain time period first, to get more feedback from users.

@jpkrohling I'm pretty interested in implementing this. May I have your suggestion, like:

  1. should we further discuss the OTTL way to provide user convenience? (Not sure if it was discussed at the SIG meeting)
  2. should I have a detailed tech design or just create a pull request with codes and related descriptions?

@TylerHelmuth
Copy link
Member

Couple quick suggestions:

  • replace query with statement.
  • Drop the Invocation (sample) and right the statement as if it were an OTTL condition. We don't have a condition-only parser yet for OTTL ([pkg/ottl] expose a parser explicitly for parsing conditions #13545) but the condition can be appended to a noop invocation during startup. This is the pattern the filterprocessor and countconnector use.

Hopefully the tailsamplingprocesor would be able to take advantage of internal/filter/filterottl to do the matching.

@jpkrohling I haven't read through the tailsamplingprocessor code myself, but if internal/filter/filterottl slots in nicely to some boolean check somewhere then I think this is a relatively straightforward approach.

@jpkrohling
Copy link
Member

some boolean check somewhere then I think this is a relatively straightforward approach

Yes, it boils down to that. We have more states than just booleans, but a policy can decide to return only a "sample" or "not sampled" if it wants.

@jiekun
Copy link
Member

jiekun commented May 30, 2023

@yehaotian Please check the OTTL condition policy and see if it mets your requirement :P

@TylerHelmuth
Copy link
Member

Closing this for now as the OTTL condition policy will allow conditions based any telemetry field. Please ping me if you think it should be reopened.

@yehaotian
Copy link
Author

yehaotian commented May 30, 2023

@TylerHelmuth Thanks for the change!!
I have following policy and it seems not working:

        {
          name: debug-policy,
          type: ottl_condition,
          ottl_condition: {
            error_mode: ignore,
            span: [
              "attributes[\"debug-id\"] != nil",
            ]
          }
        },

btw no error messages.

@yehaotian
Copy link
Author

Debug log:

{"level":"debug","ts":1685482015.4306045,"caller":"sampling/ottl.go:59","msg":"Evaluating with OTTL conditions filter","kind":"processor","name":"tail_sampling","pipeline":"traces","traceID":"061c3f37e4c300ec350e3f9d9bd75e67"}
{"level":"debug","ts":1685482015.4307067,"caller":"tailsamplingprocessor@v0.78.0/processor.go:199","msg":"Sampling policy evaluation completed","kind":"processor","name":"tail_sampling","pipeline":"traces","batch.len":1,"sampled":0,"notSampled":4,"droppedPriorToEvaluation":0,"policyEvaluationErrors":0}

The span does have debug-id attribute

@TylerHelmuth
Copy link
Member

@yehaotian is debug-id definitely an attribute and not a resource attribute?

@yehaotian
Copy link
Author

Yes, this is something can be added in the business logic for debug.
Also it is listed in Attribute not Resource category

@TylerHelmuth
Copy link
Member

@yehaotian Out of curiosity what happens if you do == instead? I want to make sure we didn't accidentally inverse the condition.

@jiekun
Copy link
Member

jiekun commented May 31, 2023

@TylerHelmuth @yehaotian I believe this should work. test case as below:
https://github.com/jiekun/opentelemetry-collector-contrib/blob/1a42bceb9a1cf10bc11e427628a4042843514847/processor/tailsamplingprocessor/internal/sampling/ottl_test.go#L66

		{
			"OTTL conditions inverse match(!=) span attributes 2",
			[]string{"attributes[\"attr_k_1\"] != \"attr_v_1\""},  // this is my ottl condition
			[]string{},
			[]spanWithAttributes{{SpanAttributes: map[string]string{"attr_k_1": "attr_v_2"}}},  // span attributes
			false,
			Sampled,  // it should be sampled
		},

I will check it again today with " != nil" condition.

@yehaotian
Copy link
Author

After enabling debug logging exporter, I notice sometimes the attributes fail to be reported to otel collector which seems super wired to me.
FYI, we are using Jaeger opentracing instrumentation with otel collector Jaeger receiver, perhaps there are issues during schema transformation

@TylerHelmuth
Copy link
Member

For our testing purposes we need to reduce variability. Can you use the attributeprocessor or transformprocessor to add the attribute to the telemetry before the tailsamplingprocessor to ensure that it is always present?

@jiekun
Copy link
Member

jiekun commented Jun 1, 2023

After enabling debug logging exporter, I notice sometimes the attributes fail to be reported to otel collector which seems super wired to me. FYI, we are using Jaeger opentracing instrumentation with otel collector Jaeger receiver, perhaps there are issues during schema transformation

Thanks for clarifying this. I checked the != nil condition with unit test as well, which works correctly. I am putting those test cases here FYI. Feel free to provide more info / feedback.

		{
			Desc:                "OTTL conditions 1",
			SpanConditions:      []string{"attributes[\"attr_k_1\"] == \"attr_v_1\""},
			SpanEventConditions: []string{},
			Spans:               []spanWithAttributes{{SpanAttributes: map[string]string{"attr_k_1": "attr_v_1"}}},
			WantErr:             false,
			Decision:            Sampled,
		},
		{
			Desc:                "OTTL conditions 2",
			SpanConditions:      []string{"attributes[\"attr_k_1\"] != \"attr_v_1\""},
			SpanEventConditions: []string{},
			Spans:               []spanWithAttributes{{SpanAttributes: map[string]string{"attr_k_1": "attr_v_1"}}},
			WantErr:             false,
			Decision:            NotSampled,
		},
		{
			Desc:                "OTTL conditions 3",
			SpanConditions:      []string{"attributes[\"attr_k_1\"] != nil"},
			SpanEventConditions: []string{},
			Spans:               []spanWithAttributes{{SpanAttributes: map[string]string{"attr_k_1": "attr_v_1"}}},
			WantErr:             false,
			Decision:            Sampled,
		},

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request processor/tailsampling Tail sampling processor
Projects
None yet
Development

No branches or pull requests

4 participants