Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluating how to support tracing and OpenTelemetry #776

Closed
stebet opened this issue Mar 25, 2020 · 32 comments · Fixed by #1261
Closed

Evaluating how to support tracing and OpenTelemetry #776

stebet opened this issue Mar 25, 2020 · 32 comments · Fixed by #1261
Assignees
Milestone

Comments

@stebet
Copy link
Contributor

stebet commented Mar 25, 2020

This is something that I have been wondering about for some time on what would be the best approach for observability.

Since debugging a library like this can be hard, I wonder what we'd want to do to make this easier.

I'd really like to suggest using System.Diagnostics.DiagnosticSource to provide parity with a lot of other .NET libraries and frameworks (both Core and Framework).

Using this library it becomes a lot easier to plug in logging frameworks and profiling tools to monitor and measure operations within the library. It also has the added benefit to have next to no overhead when no-one is listening for the events, since it is exposing ETW events (and in .NET Core, can expose event counters).

@lukebakken lukebakken added this to the 7.0.0 milestone Mar 25, 2020
@lukebakken lukebakken self-assigned this Mar 25, 2020
@vhatsura
Copy link

vhatsura commented May 6, 2020

I also suggest looking into the new Tracing API on Activity which should be part of .NET 5 release.

@stebet
Copy link
Contributor Author

stebet commented May 6, 2020

Yeah, that looks good. A good first step would be to add basic Activity support, ETW and event counters. I'm planning to look at that soon-ish. This would be a logical next step after that.

@jaycdave88
Copy link

Hi @stebet, just wanted to follow-up on your previous note and see if you had any sorta update or a product roadmap you could share around Activity support?

@michaelklishin
Copy link
Member

@jaycdave88 right now the plan is to investigate System.Diagnostics.DiagnosticSource. This may or may not happen by 7.0 but can go into a 7.x release. We need to spike it first. If there are popular alternatives in the community, we would consider them. Let us know ;)

@CodeBlanch
Copy link

@michaelklishin Depending on the timeline, definitely check out the new ActivitySource API. Very similar to DiagnosticSource, but it supports sampling and you can write tags/attributes on the events instead of writing raw objects (which people usually need to consume reflectively). It releases with .NET 5 (~November), but you don't need .NET 5 to use it. The NuGet package targets all actively supported .NET versions.

The main benefit to using ActivitySource is it means instrumentation libraries like OpenTelemetry will work without needing to build an adapter or anything.

Also happy to help with this if you are open to it.

@michaelklishin
Copy link
Member

If ActivitySource will support all then/currently supported versions of .NET, we should consider it. A proof-of-concept PR would be appreciated (I assume there are preview releases of some kind that developers can try?)

@stebet
Copy link
Contributor Author

stebet commented Sep 21, 2020

ActivitySource should be the correct one to look at. I spiked this myself a few months back, time to revisit :)

What would people consider good activities/counters to start with?

Here are a few rough ideas:

  • Bytes sent
  • Bytes received
  • Frames deserialized
  • Frames serialized
  • Total Connections opened
  • Connections currently open
  • Consumers active
  • Channels Open
  • Total channels opened
  • ???

@cijothomas
Copy link

If ActivitySource will support all then/currently supported versions of .NET, we should consider it. A proof-of-concept PR would be appreciated (I assume there are preview releases of some kind that developers can try?)

ActivitySource/Activity are part of the nuget package https://www.nuget.org/packages/System.Diagnostics.DiagnosticSource/5.0.0-rc.1.20451.14. It can be used right now with preview releases.
An example on using ActivitySource is available here in OpenTelemetry repo.
https://github.com/open-telemetry/opentelemetry-dotnet/blob/master/examples/Console/TestConsoleExporter.cs#L41-L94

@snakorse
Copy link

ActivitySource should be the correct one to look at. I spiked this myself a few months back, time to revisit :)

What would people consider good activities/counters to start with?

Here are a few rough ideas:

  • Bytes sent
  • Bytes received
  • Frames deserialized
  • Frames serialized
  • Total Connections opened
  • Connections currently open
  • Consumers active
  • Channels Open
  • Total channels opened
  • ???

I think something likeFrames serializing should be considered, the telemetry instruments may want to inject tracing header into the frame before it serialized to send.

@cijothomas
Copy link

Just to clarify - ActivitySource/Activity API is for tracing , and not for metrics. Things like BytesSent, etc are metric, which require a metric API. .NET has EventCounter API to support this already, but it is a very limited API. There is a new Metric API (Opentelemetry compatible), coming with .NET 6 (Nov 2021), which will support all the old versions of the .NET itself. (open-telemetry/opentelemetry-dotnet#1501)

@stebet
Copy link
Contributor Author

stebet commented Nov 20, 2020

@cijothomas Yep. I'm aware of the differences. We're looking to integrate both the new dotnet counters (for metrics) as well as support for Activity Tracing. Tracing will come later since the OpenTelemetry is currently being worked on but I'm watching it closely :)

@cijothomas
Copy link

@stebet Thanks!
To instrument, you won't need OpenTelemetry dependency, right? You can use 5.0.0 of DiagnosticSource.
(Also OpenTelemetry will release 1.0.0 in <2 weeks)

Please also check the Metric plans in .NET 6 as well.

@stebet
Copy link
Contributor Author

stebet commented Nov 20, 2020

@stebet Thanks!
To instrument, you won't need OpenTelemetry dependency, right? You can use 5.0.0 of DiagnosticSource.
(Also OpenTelemetry will release 1.0.0 in <2 weeks)

Please also check the Metric plans in .NET 6 as well.

Thanks for the info :)

@jbogard
Copy link

jbogard commented Nov 30, 2020

@bollhals
Copy link
Contributor

bollhals commented Mar 8, 2021

ActivitySource should be the correct one to look at. I spiked this myself a few months back, time to revisit :)

What would people consider good activities/counters to start with?

Here are a few rough ideas:

  • Bytes sent
  • Bytes received
  • Frames deserialized
  • Frames serialized
  • Total Connections opened
  • Connections currently open
  • Consumers active
  • Channels Open
  • Total channels opened
  • ???

I wanted to play a bit with it and found that the PollingCounters are only NS2.1 / .net core 3.0.
I guess that means we can't do it for NS.2.0 targets or am I missing something?

@stebet
Copy link
Contributor Author

stebet commented Mar 9, 2021

Let's make a distinction between activities and counters since they aren't the same thing (although related).

There are Event counters in NS2.0 targets, but it's a lot less capable. It'd be fine by me if we only had counters for NS2.1+ and skip them for NS2.0, which is on the way out eventually anyway.

Activities can work for both targets.

@bollhals
Copy link
Contributor

bollhals commented Mar 9, 2021

Shall I create a new one for the counters, so we don't mix them up in here?

There are Event counters in NS2.0 targets, but it's a lot less capable.

Are there? I wasn't able to find any useable classes?

It'd be fine by me if we only had counters for NS2.1+ and skip them for NS2.0, which is on the way out eventually anyway.

Works for me 👍

@tthiery
Copy link

tthiery commented Jun 13, 2021

When using OpenTelemetry for tracing across services, the span context needs to be channeled over the wire. Will this issue contain the encoding (while publishing) and decoding (while receiving) from and to the .NET Activity framework be included here or is this expected to be done somewhere else? To my limited insights, I do not see a hookup infrastructure where a third party could include a middleware into the bare RabbitMQ.Client when sending/receiving messages.

I am currently building a middleware layer on top of RabbitMQ and wonder, whether I should write a "default" middleware doing the OpenTelemetry decoding.

The code how this OpenTelemetry span context propagation works without support from the RabbitMQ.Client is described in this article. The code fragments in the article also show how it works transparently for HTTP.

@rwkarg
Copy link

rwkarg commented Jun 15, 2021

We similarly have a wrapper around RabbitMQ.Client we've been using for a few years (that exposes consumers/exchanges as ISource/TargetBlock<AmqpMessage>) which we instrumented for OpenTracing and are looking at moving to OpenTelemetry/Activity.

We currently extract/inject trace context into message headers on consume/publish but it would be nice if the base client library handled that as well as the Activity creation.

Where I'm uncertain is how to handle the length of the Activity (we only use Consumer based flows, but manual Get would likely need to be different?). There is the period from Prefetch to the Consumer implementation finishing the event/callback. The "processing" of the message could either be inside that span, resulting in an ack/nack before returning from the Consumer event/callback or, as in our framework, the AmqpMessage could be placed in a client side queue (ex. BufferBlock) and processed potentially after the Consumer event/callback is completed (ex. by an attached ActionBlock). (We stash the Channel in the AmqpMessage headers so we can ack/nack the message when processing is completed).

I suppose that the client side queue could be modeled in OpenTelemetry as another Consumer-type span with a parent as the Consumer event/callback span that would be completed as soon as the event/callback span has placed the message in the in-process queue.

@davidfowl
Copy link

Is this issue well understood? Are there any design blockers getting this through?

@michaelklishin
Copy link
Member

@davidfowl it has been over two years since it was last discussed. So no, it is not well defined.

@davidfowl
Copy link

I've been following this so far https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/examples/MicroserviceExample/Utils/Messaging

It's an example on the open telemetry dotnet repo that uses RabbitMQ.Client and adds the appropriate activities to the producer and consumer.

If we start with this as the proposal, what issues would we run into?

@pellared
Copy link

I think it may be worth to share with you the OpenTelemetry semantic conventions (they are experimental, but there is nothing better) for telemetry produced for RabbitMQ:

@michaelklishin
Copy link
Member

That question can only be answered by trying it but this looks like an example, not something that can or should be taken straight into production.

This example has some curious decisions around how it uses this client. Instead of inheriting from/extending EventingBasicConsumer it uses a "helper", channel opening and queue declaration are lumped together, and so on.

I assume that everything around how they use activities is reasonable. The question for library maintains use how do you integrate that in a way that's not too restrictive and does not push OpenTelemetry down everyone's throat.

@michaelklishin
Copy link
Member

I'd look for clients for other popular data services (PostgreSQL, Redis, ElasticSearch, etcd, et cetera) to learn from, these examples only demonstrate some key OpenTelemetry API elements (in C#).

@michaelklishin
Copy link
Member

@pellared that's helpful, thanks. Now someone who needs this feature should be able to come up with an API update proposal for this client, which won't be trivial. Team RabbitMQ likely won't get to work on this any time soon but this is open source software.

@michaelklishin michaelklishin changed the title Use System.Diagnostics.DiagnosticSource for tracing and telemetry Evaluating how to support tracing and OpenTelemetry Oct 14, 2023
@davidfowl
Copy link

This can be contributed, it’s extremely important to get our core ecosystem libraries instrumented. Let’s get on the same page about how to accomplish this.

The way .NET integrates with open telemetry allows library authors to build support without direct dependencies on otel. Even though the semantic conventions are not final, you want to track and use them. This is what we are doing in .NET and making changes as they change .

@davidfowl
Copy link

davidfowl commented Oct 14, 2023

The other alternative would be to build it an instrumentation library on top, but, that’s not ideal and you’d need the appropriate hooks in order to do it from the outside of the library.

This would be a point in time solution. This isn’t the way we want libraries to integrate with our telemetry systems.

@lukebakken lukebakken self-assigned this Oct 14, 2023
@stebet
Copy link
Contributor Author

stebet commented Oct 18, 2023

This is already in progress for the most part here: #1261

Had some comments from @lmolkova as well that I need to address. I'll take a look at this again soon. Would love more feedback as well.

Some attributes need to be updated to be in line with the standards (it's been awhile since I last synced), but this is working for the most part. CC: @davidfowl @michaelklishin

@stebet
Copy link
Contributor Author

stebet commented Oct 18, 2023

Also, implementing it like this via. ActivitySource shouldn't require any API changes. A good PR after the initial implentation would be to add OTel Metrics/Logs support as well.

@davidfowl
Copy link

That looks promising, is there anything blocking this change?

@stebet
Copy link
Contributor Author

stebet commented Oct 19, 2023

That looks promising, is there anything blocking this change?

Mostly cleanup and responding to comments. I'll get to it today :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.