Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/datadog]: update resource naming and span naming #1861

Merged

Conversation

ericmustin
Copy link
Contributor

@ericmustin ericmustin commented Dec 18, 2020

Description:

This PR improves datadog trace translation in a few areas

  1. Datadog Resource naming has improved to add more rich names for messaging spans (such a google pub sub) and grpc spans. GRPC spans will have resources sorted by grpc.path if possible, or <rpc.method> <rpc.service>, and messaging spans will have resources sorted by <messaging.operation> <messaging.destination>.

  2. Datadog span operation names will be shortened to only display <instrumentation_library.name>.<span_kind>. previously they were <instrumentation_library.name>.span_kind_<span_kind>, which was not helpful extra info and was not displayed well in UI.

  3. Datadog span operation names will coerce - to _. Previously the - was being coerced to _ in some places in UI but not others, leading to an unusual experience, so just forcing it to _ regardless will resolve this issue.

Link to tracking Issue:
n/a

Testing:

Updated Unit Tests

@ericmustin ericmustin requested a review from a team December 18, 2020 16:09
@codecov
Copy link

codecov bot commented Dec 18, 2020

Codecov Report

Merging #1861 (d457e6b) into master (330f787) will increase coverage by 0.00%.
The diff coverage is 87.50%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1861   +/-   ##
=======================================
  Coverage   90.19%   90.19%           
=======================================
  Files         385      385           
  Lines       19063    19074   +11     
=======================================
+ Hits        17193    17204   +11     
- Misses       1398     1399    +1     
+ Partials      472      471    -1     
Flag Coverage Δ
integration 69.88% <ø> (ø)
unit 88.95% <87.50%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
exporter/datadogexporter/utils/trace_helpers.go 0.00% <0.00%> (ø)
exporter/datadogexporter/translate_traces.go 83.56% <100.00%> (+0.80%) ⬆️
processor/groupbytraceprocessor/event.go 95.96% <0.00%> (-0.81%) ⬇️
receiver/k8sclusterreceiver/watcher.go 97.64% <0.00%> (+2.35%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 330f787...d457e6b. Read the comment docs.

@ericmustin
Copy link
Contributor Author

cc @mx-psi and @KSerrania here for review from datadog

Copy link
Contributor

@KSerrania KSerrania left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, we should fix the two lines where codecov complains.

@@ -380,33 +382,47 @@ func getDatadogSpanName(s pdata.Span, datadogTags map[string]string) string {
// The spec has changed over time and, depending on the original exporter, IL Name could represented a few different ways
// so we try to account for all permutations
if ilnOtlp, okOtlp := datadogTags[tracetranslator.TagInstrumentationName]; okOtlp {
return utils.NormalizeSpanName(fmt.Sprintf("%s.%s", ilnOtlp, s.Kind()))
return utils.NormalizeSpanName(fmt.Sprintf("%s.%s", ilnOtlp, strings.TrimPrefix(s.Kind().String(), "SPAN_KIND_")))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: strings.TrimPrefix(<a string>, "SPAN_KIND_") is being used in a lot of places, maybe it would make sense to move this to a utility function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense will do

return fmt.Sprintf("%s %s", msgOperation, destination)
}

return msgOperation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a test case where messaging.operation is set but not messaging.destination to test this code path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 yup will add test

@@ -80,7 +80,8 @@ func NormalizeSpanName(tag string) string {
case buf.Len() == 0:
continue
// handle valid characters that can't start the string.
case unicode.IsDigit(c) || c == '.' || c == '-':
// '-' creates issues in the UI so we skip it
case unicode.IsDigit(c) || c == '.':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably add a tag with a digit or . in a test case to test this branch, now that we don't handle - here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 yup will add test

@mhamrah
Copy link
Contributor

mhamrah commented Dec 18, 2020

Looking forward to this PR. Here are traces we send for messaging and grpc, as a use case:

Messaging
image

gRPC
image

@ericmustin
Copy link
Contributor Author

@KSerrania @mx-psi mind giving another look? @james-bebbington would be keen to get this in before next release

Copy link
Contributor

@KSerrania KSerrania left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We may want to add another test for NormalizeSpanKind, since codecov complains about it (though I'm not sure why, I thought it was already run as part of other tests).

@ericmustin
Copy link
Contributor Author

@james-bebbington any feedback here? just pinging as it's been a week, we'd be keen to get any feedback /get this merged before next release

Copy link
Member

@mx-psi mx-psi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I second Kylian's message about the normalization test

@cemo
Copy link

cemo commented Dec 28, 2020

Hi all,

I have opened another issue for naming. For us it is completely not readable and this PR is not helping so much. I suggest to switch completely to conform conventions.

#1909

Thanks

@tigrannajaryan
Copy link
Member

@ericmustin please fix the merge conflict.

@ericmustin
Copy link
Contributor Author

@tigrannajaryan updated, thanks

@tigrannajaryan tigrannajaryan merged commit 0e17f0e into open-telemetry:master Jan 4, 2021
dyladan referenced this pull request in dynatrace-oss-contrib/opentelemetry-collector-contrib Jan 29, 2021
**Description:** 
Adds a connection state reporter to the Jaeger exporter sending state changes to the log and as a metric.

**Link to tracking Issue:** Closes #1861, as it's the best we can do at this exporter level. On a more general level, the collector could provide more verbose info about the underlying gRPC connections. See #2237 for more information.

**Testing:** unit and manual tests.

**Documentation:** None.

Here are the complete logs for an OpenTelemetry Collector that got started without a Jaeger backend available. Note that the config.yaml contains two exporters, one (`jaeger`) with the `insecure: true` option, and a second one (`jaeger/2`) without this option. A Jaeger all-in-one is then started, without TLS support. This ends with the first exporter eventually transitioning to "READY", but the second won't.
```
2020-12-02T11:17:26.116+0100	INFO	service/service.go:409	Starting OpenTelemetry Collector...	{"Version": "latest", "GitHash": "<NOT PROPERLY GENERATED>", "NumCPU": 12}
2020-12-02T11:17:26.116+0100	INFO	service/service.go:253	Setting up own telemetry...
2020-12-02T11:17:26.130+0100	INFO	service/telemetry.go:101	Serving Prometheus metrics	{"address": "localhost:8888", "level": 0, "service.instance.id": "b4f3bc7c-2593-48e5-9ef9-8b585bbcf4fc"}
2020-12-02T11:17:26.131+0100	INFO	service/service.go:290	Loading configuration...
2020-12-02T11:17:26.131+0100	INFO	service/service.go:301	Applying configuration...
2020-12-02T11:17:26.131+0100	INFO	service/service.go:322	Starting extensions...
2020-12-02T11:17:26.132+0100	INFO	builder/exporters_builder.go:306	Exporter is enabled.	{"component_kind": "exporter", "exporter": "jaeger"}
2020-12-02T11:17:26.132+0100	INFO	builder/exporters_builder.go:306	Exporter is enabled.	{"component_kind": "exporter", "exporter": "jaeger/2"}
2020-12-02T11:17:26.132+0100	INFO	service/service.go:337	Starting exporters...
2020-12-02T11:17:26.132+0100	INFO	builder/exporters_builder.go:92	Exporter is starting...	{"component_kind": "exporter", "component_type": "jaeger", "component_name": "jaeger"}
2020-12-02T11:17:26.132+0100	INFO	jaegerexporter/exporter.go:183	State of the connection with the Jaeger Collector backend	{"component_kind": "exporter", "component_type": "jaeger", "component_name": "jaeger", "state": "CONNECTING"}
2020-12-02T11:17:26.132+0100	INFO	builder/exporters_builder.go:97	Exporter started.	{"component_kind": "exporter", "component_type": "jaeger", "component_name": "jaeger"}
2020-12-02T11:17:26.132+0100	INFO	builder/exporters_builder.go:92	Exporter is starting...	{"component_kind": "exporter", "component_type": "jaeger", "component_name": "jaeger/2"}
2020-12-02T11:17:26.132+0100	INFO	jaegerexporter/exporter.go:183	State of the connection with the Jaeger Collector backend	{"component_kind": "exporter", "component_type": "jaeger", "component_name": "jaeger/2", "state": "CONNECTING"}
2020-12-02T11:17:26.132+0100	INFO	builder/exporters_builder.go:97	Exporter started.	{"component_kind": "exporter", "component_type": "jaeger", "component_name": "jaeger/2"}
2020-12-02T11:17:26.132+0100	INFO	builder/pipelines_builder.go:207	Pipeline is enabled.	{"pipeline_name": "traces", "pipeline_datatype": "traces"}
2020-12-02T11:17:26.132+0100	INFO	service/service.go:350	Starting processors...
2020-12-02T11:17:26.132+0100	INFO	builder/pipelines_builder.go:51	Pipeline is starting...	{"pipeline_name": "traces", "pipeline_datatype": "traces"}
2020-12-02T11:17:26.132+0100	INFO	builder/pipelines_builder.go:61	Pipeline is started.	{"pipeline_name": "traces", "pipeline_datatype": "traces"}
2020-12-02T11:17:26.132+0100	INFO	builder/receivers_builder.go:235	Receiver is enabled.	{"component_kind": "receiver", "component_type": "otlp", "component_name": "otlp", "datatype": "traces"}
2020-12-02T11:17:26.132+0100	INFO	service/service.go:362	Starting receivers...
2020-12-02T11:17:26.132+0100	INFO	builder/receivers_builder.go:70	Receiver is starting...	{"component_kind": "receiver", "component_type": "otlp", "component_name": "otlp"}
2020-12-02T11:17:26.132+0100	INFO	otlpreceiver/otlp.go:93	Starting GRPC server on endpoint 0.0.0.0:4317	{"component_kind": "receiver", "component_type": "otlp", "component_name": "otlp"}
2020-12-02T11:17:26.132+0100	INFO	otlpreceiver/otlp.go:130	Setting up a second GRPC listener on legacy endpoint 0.0.0.0:55680	{"component_kind": "receiver", "component_type": "otlp", "component_name": "otlp"}
2020-12-02T11:17:26.132+0100	INFO	otlpreceiver/otlp.go:93	Starting GRPC server on endpoint 0.0.0.0:55680	{"component_kind": "receiver", "component_type": "otlp", "component_name": "otlp"}
2020-12-02T11:17:26.132+0100	INFO	builder/receivers_builder.go:75	Receiver started.	{"component_kind": "receiver", "component_type": "otlp", "component_name": "otlp"}
2020-12-02T11:17:26.132+0100	INFO	service/service.go:265	Everything is ready. Begin running and processing data.
2020-12-02T11:17:27.132+0100	INFO	jaegerexporter/exporter.go:183	State of the connection with the Jaeger Collector backend	{"component_kind": "exporter", "component_type": "jaeger", "component_name": "jaeger/2", "state": "TRANSIENT_FAILURE"}
2020-12-02T11:17:28.132+0100	INFO	jaegerexporter/exporter.go:183	State of the connection with the Jaeger Collector backend	{"component_kind": "exporter", "component_type": "jaeger", "component_name": "jaeger", "state": "TRANSIENT_FAILURE"}
2020-12-02T11:17:43.132+0100	INFO	jaegerexporter/exporter.go:183	State of the connection with the Jaeger Collector backend	{"component_kind": "exporter", "component_type": "jaeger", "component_name": "jaeger", "state": "READY"}
```

And here are the metrics for the final state:

```
# HELP otelcol_jaegerexporter_conn_state Last connection state: 0 = Idle, 1 = Connecting, 2 = Ready, 3 = TransientFailure, 4 = Shutdown
# TYPE otelcol_jaegerexporter_conn_state gauge
otelcol_jaegerexporter_conn_state{exporter_name="jaeger",service_instance_id="341f6179-0c34-4ad1-b2e5-19b2bed4c9d1"} 2
otelcol_jaegerexporter_conn_state{exporter_name="jaeger/2",service_instance_id="341f6179-0c34-4ad1-b2e5-19b2bed4c9d1"} 3
```

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants