-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade OpenTelemetry Collector dependency #3858
Conversation
d9aace2
to
7e13bbe
Compare
045b0c0
to
7202edc
Compare
@ptodev should this PR be in draft since it's still a WIP? |
Yes... Apologies. I didn't know about this feature back when I created it 😄 I converted it to Draft now, although it's almost time to open it for reviews. |
if filename, exists := lokiEntry.Labels["filename"]; exists { | ||
filenameStr := string(filename) | ||
// The `promtailreceiver` from the opentelemetry-collector-contrib | ||
// repo adds these two labels based on these "semantic conventions | ||
// for log media". | ||
// https://opentelemetry.io/docs/reference/specification/logs/semantic_conventions/media/ | ||
// We're keeping them as well, but we're also adding the `filename` | ||
// attribute so that it can be used from the | ||
// `loki.attribute.labels` hint for when the opposite OTel -> Loki | ||
// transformation happens. | ||
lr.Attributes().PutStr("log.file.path", filenameStr) | ||
lr.Attributes().PutStr("log.file.name", path.Base(filenameStr)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The promtailreceiver
does no longer exist, and the lokireceiver
replacement does not follow the same logic (maybe because it's considered more of a network-based source?), and these conventions are experimental anyway.
I'd recommend we remove this to keep us more in-line with upstream and adding a comment like this
// TODO: We should consult with upstream to add the `log.file.path` and `log.file.name` attributes once the semantic conventions for log media are mature
// https://opentelemetry.io/docs/reference/specification/logs/semantic_conventions/media/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hesitant to do this in this PR because this would be a breaking change. Would you mind if we do this separately, on another PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll make the change separately. If you'd put a TODO(@tpaschalis)
comment I'd appreciate it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ptodev Can you add the TODO comment here prior to merging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, just added it 👍
@rfratto I start a local Tempo using the instructions in the Tempo repo. I make a few minor modifications to make this run with a locally running Agent - my config file is docker-compose copy.yaml.txt Then I start an Agent with one of these configs: I can't run two Agents side by side, so I firstly test an "old" Agent, stop it, then start a "new" one. These are the things I look for:
A few gaps in my testing:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple small suggestions for docs
@rfratto I am referring to functionality similar to the one in the batch processor. There, for example the I'll ask the Otel team if such work is already being planned, and in general if they think the
|
@ptodev I see. I'm in favor of always using OTel SDK metrics, and waiting for the upstream components to switch off of OpenCensus. I think I misunderstood in my earlier comment how many metrics were affected (i.e., not literally all of them). The metrics I'm particularly interested in making sure we keep are the high-level ones for exports/receivers/processors, which we should still have since those are all OTel SDK-compatible now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM minus the final set of comments, let's get this merged!
cloud.google.com/go/pubsub v1.28.0 | ||
contrib.go.opencensus.io/exporter/prometheus v0.4.2 | ||
cloud.google.com/go/pubsub v1.30.0 | ||
contrib.go.opencensus.io/exporter/prometheus v0.4.2 // indirect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Bump on this conversation)
return fmt.Errorf("error while setting up the remote sampling source: %w", err) | ||
} | ||
conn, err := grpc.Dial(jrse.cfg.Source.Remote.Endpoint, opts...) | ||
conn, err := jrse.cfg.Source.Remote.ToClientConn(context.Background(), host, jrse.telemetry) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This package is a fork from upstream, and upstream also passes the context here, so we should too IMO.
go.mod
Outdated
require ( | ||
go.opentelemetry.io/collector/component v0.79.0 | ||
go.opentelemetry.io/collector/confmap v0.79.0 | ||
go.opentelemetry.io/collector/consumer v0.79.0 | ||
go.opentelemetry.io/collector/exporter v0.79.0 | ||
go.opentelemetry.io/collector/featuregate v1.0.0-rcv0012 | ||
go.opentelemetry.io/collector/processor/batchprocessor v0.79.0 | ||
go.opentelemetry.io/collector/processor/memorylimiterprocessor v0.79.0 | ||
go.opentelemetry.io/collector/receiver v0.79.0 | ||
go.opentelemetry.io/collector/receiver/otlpreceiver v0.79.0 | ||
) | ||
|
||
require ( | ||
github.com/alecthomas/participle/v2 v2.0.0 // indirect | ||
github.com/gobwas/glob v0.2.3 // indirect | ||
github.com/google/s2a-go v0.1.4 // indirect | ||
github.com/iancoleman/strcase v0.2.0 // indirect | ||
github.com/open-telemetry/opentelemetry-collector-contrib/internal/filter v0.79.0 // indirect | ||
github.com/open-telemetry/opentelemetry-collector-contrib/pkg/ottl v0.79.0 // indirect | ||
github.com/shoenig/go-m1cpu v0.1.6 // indirect | ||
github.com/tilinna/clock v1.1.0 // indirect | ||
go.opentelemetry.io/contrib/propagators/b3 v1.17.0 // indirect | ||
go.opentelemetry.io/otel/bridge/opencensus v0.39.0 // indirect | ||
google.golang.org/genproto/googleapis/api v0.0.0-20230530153820-e85fd2cbaebc // indirect | ||
google.golang.org/genproto/googleapis/rpc v0.0.0-20230530153820-e85fd2cbaebc // indirect | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Bump on this conversation)
go.opentelemetry.io/collector => github.com/ptodev/opentelemetry-collector v0.0.0-20230608105454-0c02b84d03d4 | ||
go.opentelemetry.io/collector/exporter/otlpexporter => github.com/ptodev/opentelemetry-collector/exporter/otlpexporter v0.0.0-20230608105454-0c02b84d03d4 | ||
go.opentelemetry.io/collector/exporter/otlphttpexporter => github.com/ptodev/opentelemetry-collector/exporter/otlphttpexporter v0.0.0-20230608105454-0c02b84d03d4 | ||
go.opentelemetry.io/collector/featuregate => github.com/ptodev/opentelemetry-collector/featuregate v0.0.0-20230608105454-0c02b84d03d4 | ||
go.opentelemetry.io/collector/pdata => github.com/ptodev/opentelemetry-collector/pdata v0.0.0-20230608105454-0c02b84d03d4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(bump on this conversation; your PR is approved so it can be merged and we can switch to using the Grafana org's fork)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc input is fine as-is.
There are some issues which need fixing, but they will be fixed in a follow-up PR. The most important issue revolves around OpenCensus metrics coming from the collector. Static mode allows multiple tracing configs to be created for the same agent. In those configs could be components with the same names. Unfortunately, OpenCensus metrics are registered globally and it is very hard to isolate them so that each metric is specific to a component instance. Apparently the current code on the Agent is buggy since actually if there is more than 1 traces_config, the OpenCensus metrics from the Collector components will display inaccurate values. Also, my comment that we need to contribute Otel instrumentation in order to see OpenCensus metrics was wrong. Apparently the way to see OpenCensus metrics is though a line like this. However, due to the issue mentioned above, we are not going to incorporate OpenCensus metrics. If there is a need to incorporate the metrics which go missing due to this upgrade, we will contribute Otel instrumentation to the relevant Collector components. |
Upgrade Otel Collector dependency
Upgrade Otel Collector dependency
WIP PR so that we can track the changes going into the feature branch, and whether the CI completes successfully.
Fixes #3660
Notes to the reviewer
telemetry.useOtelForInternalMetrics
feature gate and not all Otel components support native Otel metrics. There are also new metrics which will be present on the Agent's/metrics
endpoint which were not previously available. The new and missing metrics are listed here:pkg/flow/tracing/tracing.go
we updatego.opentelemetry.io/otel/semconv
from version 1.12.0 to 1.17.0. I believe this should be fine due to backwards compatibility guarantees. For more information on what this package contains, see Otel's wiki page about Semantic Conventions.filename
field. I did not port it over. We can recommend that people use local.file instead. Thus we only expose thescheme
andtoken
attributes.otelcol.exporter.loki
now also adds instrumentation scope like this:"instrumentation_scope":{"name":"name","version":"version"}}
[pdata] Deprecate pcommon.Map.Sort() open-telemetry/opentelemetry-collector#6989
Things to do before merging
/metrics
endpoint. IMO this is a nice to have, and not a must?