Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(otel): add opentelemety utility functions #272

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

AndrewWinterman
Copy link
Contributor

@AndrewWinterman AndrewWinterman commented Jun 28, 2024

This PR extracts opentelemetry utility functions from my private project
and adds them to this project without calling them. It partially resolves #43

I'd like a broader discussion about whether these should be
automatically called by the library where possible, or if they should
simply be provided to clients to use if they so wish.

I did my best to follow OpenTelemetry semantic conventions as described
here
https://opentelemetry.io/docs/specs/semconv/messaging/messaging-spans/,
but they are at times ambiguous for rabbitmq-- e.g. is the destination
for a message the Queue or the Consumer Tag the message was delivered to.

Given the channel based approaches of this library, it is impossible for
the library to know the full execution of a consumer. Unless
autoack=false, we cannot actually know when to end the span associated
with a delivery, so at least in the consumer case, it's probably best to
allow the client to manage spans for themselves.

We can manage spans on the producer side, and at the very least
extract span identifiers to include on published headers automatically,
and provide utilities for pulling them back out again.

My intention with putting this PR up is to move the conversation
forward. Because the PR only provides private methods (if I left
members public please call them out), it can be safely merged while
these questions are worked out.

AndrewWinterman and others added 2 commits June 28, 2024 15:11
This PR extracts opentelemetry utility functions from my private project
and adds them to this project without calling them. It resolves rabbitmq#43

I'd like a broader discussion about whether these should be
automatically called by the library where possible, or if they should
simply be provided to clients to use if they so wish.

I did my best to follow OpenTelemetry semantic conventions as described
here
https://opentelemetry.io/docs/specs/semconv/messaging/messaging-spans/,
but they are at times ambiguous for rabbitmq-- e.g. is the destination
for a message the Queue or the Consumer Tag the message was delivered to.

Given the channel based approaches of this library, it is impossible for
the library to know the full execution of a consumer. Unless
autoack=false, we cannot actually know when to end the span associated
with a delivery, so at least in the consumer case, it's probably best to
allow the client to manage spans for themselves.

We *can* manage spans on the producer side, and at the very least
extract span identifiers to include on published headers automatically,
and provide utilities for pulling them back out again.

My intention with putting this PR up is to move the conversation
forward. Because the PR *only* provides private methods (if I left
members public please call them out), it can be safely merged while
these questions are worked out.
@AndrewWinterman AndrewWinterman marked this pull request as draft June 28, 2024 23:05
@AndrewWinterman
Copy link
Contributor Author

I converted this to a draft. it's getting dinged for unused methods, which was intentional, so maybe best to leave it in draft unmerged until we resolve some of those design questions

@lukebakken lukebakken requested review from Zerpet and lukebakken July 1, 2024 16:45
@lukebakken lukebakken self-assigned this Jul 1, 2024
@Zerpet
Copy link
Contributor

Zerpet commented Jul 2, 2024

Hey, thank you for taking the time to contribute to this library. I'll respond in-line to the topics in the OP.

I'd like a broader discussion about whether these should be automatically called by the library where possible, or if they should simply be provided to clients to use if they so wish.

Given that this project is a library, it's a great opportunity to provide automatic native instrumentation. That means we should automatically create spans, and inject/extract context where it makes sense.

they are at times ambiguous for rabbitmq-- e.g. is the destination for a message the Queue or the Consumer Tag the message was delivered to.

I agree that some semantics are ambiguous for RabbitMQ. I would advocate to adhere to the specific conventions for RabbitMQ described in this link, and do our best with ambiguities not covered (and document them!).

Given the channel based approaches of this library, it is impossible for
the library to know the full execution of a consumer. Unless
autoack=false, we cannot actually know when to end the span associated
with a delivery, so at least in the consumer case, it's probably best to
allow the client to manage spans for themselves.

Given the "subscription" workflow of RabbitMQ (polling/pulling is highly discouraged), I think we can record the attributes of the subscription (and add more if necessary), and inject those into the spans upon receiving messages, and just before forwarding them into the Go channel. This idea needs validation, but I would prefer this over leaving the consumption instrumentation to the users.

We can manage spans on the producer side, and at the very least
extract span identifiers to include on published headers automatically,
and provide utilities for pulling them back out again.

I agree 👍 This is a sensible idea.

@lukebakken
Copy link
Contributor

@AndrewWinterman
Copy link
Contributor Author

I'll wire up what I can. I still think there's some open questions on the consumer side.

Hey, thank you for taking the time to contribute to this library. I'll respond in-line to the topics in the OP.

I'd like a broader discussion about whether these should be automatically called by the library where possible, or if they should simply be provided to clients to use if they so wish.

Given that this project is a library, it's a great opportunity to provide automatic native instrumentation. That means we should automatically create spans, and inject/extract context where it makes sense.

okay, sounds good.

they are at times ambiguous for rabbitmq-- e.g. is the destination for a message the Queue or the Consumer Tag the message was delivered to.

I agree that some semantics are ambiguous for RabbitMQ. I would advocate to adhere to the specific conventions for RabbitMQ described in this link, and do our best with ambiguities not covered (and document them!).

Given the channel based approaches of this library, it is impossible for
the library to know the full execution of a consumer. Unless
autoack=false, we cannot actually know when to end the span associated
with a delivery, so at least in the consumer case, it's probably best to
allow the client to manage spans for themselves.

Given the "subscription" workflow of RabbitMQ (polling/pulling is highly discouraged), I think we can record the attributes of the subscription (and add more if necessary), and inject those into the spans upon receiving messages, and just before forwarding them into the Go channel. This idea needs validation, but I would prefer this over leaving the consumption instrumentation to the users.

The problem is that we need a way to transport the spans, and then an idiomatic way for clients to consume them downstream, so if you've called

	deliveries, err := c.channel.Consume(
		queue.Name, // name
		c.tag,      // consumerTag,
		*autoAck,   // autoAck
		false,      // exclusive
		false,      // noLocal
		false,      // noWait
		nil,        // arguments
	)

we now need to embed a span or a context (go authors say not to embed contexts, so span?) in the delivery. Is that amenable?

Each span is probably best treated as a new root span, and as implemented in my draft, gets a link to the publication that created it.

We can manage spans on the producer side, and at the very least
extract span identifiers to include on published headers automatically,
and provide utilities for pulling them back out again.

I agree 👍 This is a sensible idea.

@AndrewWinterman
Copy link
Contributor Author

I suppose another way to go would be to provide a delivery#Context method that returns a newly constructed context. I think that's roughly equivalent to how http works.

@AndrewWinterman
Copy link
Contributor Author

For what it's worth, I suggest looking into how OTel was added to the .NET client. There is a LOT of discussion here:

I admit I didnt go through this. I'm sure it's informative but it's not super accessbile to me (yet?). I haven't written any dotnet...

@@ -1492,7 +1492,7 @@ func (ch *Channel) Publish(exchange, key string, mandatory, immediate bool, msg
/*
PublishWithContext sends a Publishing from the client to an exchange on the server.

NOTE: this function is equivalent to [Channel.Publish]. Context is not honoured.
NOTE: Context termination is not honoured.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're now using the context for span propagation in-process.

delivery.go Outdated
// the appropraite headers set. See [context-propagation] for more details
//
// [context-propagation]: https://opentelemetry.io/docs/concepts/context-propagation/
func (d *Delivery) Span(ctx context.Context, options ...trace.SpanStartOption) (context.Context, trace.Span) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an okay route-- clients can use their own context + span to indicate boundaries of a batch, and then get child spans for each delivery, with each span linked to the publication.

I also provide access to the Link for a delivery in case they want to combine multiple links into one span for their batch (that's what I would prefer in my use case, but that's because I'm defining a naturally batching consumer).

The tradeoff is that without storing additional state, we're relying on the client to tell us the context when we go to ack nack, which could lead to errors.

If we instead store the span on the delivery, we can close it when we ack, after inserting a child settle span for the ack itself. This has the implication that every consumer needs to settle the delivery even if their in autoack mode in order for them to see spans in their telemetry info (spans are usually not sent until they are closed). In autoack mode the settle method would just close the span, with no further implications at the wire level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we instead store the span on the delivery, we can close it when we ack, after inserting a child settle span for the ack itself

I prefer this approach, TBH. Relying on the user would make this implementation brittle. I'm ok with having a limitation handling autoack, because that's really not a recommended practice. autoack is synonym of YOLO I don't care about my data, just GO!

delivery.go Outdated
}
}

func (d *Delivery) Settle(ctx context.Context, response DeliveryResponse, multiple, requeue bool) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel deeply ambivalent about this approach, but the alternative would seem to be providing (Ack,Nack,Rject)Ctx methods. TBH that's probably better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that (Ack,Nack,Reject)Ctx methods are probably a better alternative, and settle the delivery automagically in those functions.

}

// extractSpanFromReturn creates a span for a returned message
func extractSpanFromReturn(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh I haven't wired the return up yet. Probably gets a similar treatment to Delivery, if that works.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rabbitmq semconv specs does not mention how message returns should be instrumented, maybe we should open an issue in https://github.com/open-telemetry/semantic-conventions/issues asking for clarification.

@wzy9607
Copy link

wzy9607 commented Aug 31, 2024

I don't think we should add opentelemetry spport directly to the amqp091-go package. It's better to implement the instrumentation in a sub package or separate repo (e.g. goredis instrumentation redisotel and gorm instrumentation go-gorm/opentelemetry), so that users who want to use amqp091-go without otel will not be forced to indirectly depends on otel modules.
I have made a draft version of instrumentation in https://github.com/wzy9607/amqp091otel, though I also havent's find a simple way to handle autoack=true.

However, I do think, to support instrumentation, some utilities should be added to the amqp091-go package. for example,

  1. Mechanism to add middleware to Publish and Consume, which made it easier to instrument them and for users to use the instrumentation package. For reference, go-redis's Hook mechanism and how redisotel utilize it.
  2. Standard methods for consumer to start a context.Context from Delivery and mark the end of it, so the instrumentation package can properly start and end a consumer process Span.
  3. Method to obtain some channel property and server infos, which will make it easier for the instrumentation package to add those attributes, e.g. messaging.client.id, server.address.

@Zerpet
Copy link
Contributor

Zerpet commented Sep 13, 2024

Given the channel based approaches of this library, it is impossible for
the library to know the full execution of a consumer. Unless
autoack=false, we cannot actually know when to end the span associated
with a delivery, so at least in the consumer case, it's probably best to
allow the client to manage spans for themselves.

Given the "subscription" workflow of RabbitMQ (polling/pulling is highly discouraged), I think we can record the attributes of the subscription (and add more if necessary), and inject those into the spans upon receiving messages, and just before forwarding them into the Go channel. This idea needs validation, but I would prefer this over leaving the consumption instrumentation to the users.

The problem is that we need a way to transport the spans, and then an idiomatic way for clients to consume them downstream, so if you've called

	deliveries, err := c.channel.Consume(
		queue.Name, // name
		c.tag,      // consumerTag,
		*autoAck,   // autoAck
		false,      // exclusive
		false,      // noLocal
		false,      // noWait
		nil,        // arguments
	)

we now need to embed a span or a context (go authors say not to embed contexts, so span?) in the delivery. Is that amenable?

I think this is acceptable. We can inject Span, or the necessary attributes to build a span, in the message header or properties.

Each span is probably best treated as a new root span, and as implemented in my draft, gets a link to the publication that created it.

I like this idea 👍 It's probably easier to reason about using links between spans, than creating a sub-span from a "publish" span. Specially if we consider the use case where a consumer may reject a message and re-queue it.

@Zerpet
Copy link
Contributor

Zerpet commented Sep 13, 2024

I don't think we should add opentelemetry spport directly to the amqp091-go package. It's better to implement the instrumentation in a sub package or separate repo (e.g. goredis instrumentation redisotel and gorm instrumentation go-gorm/opentelemetry), so that users who want to use amqp091-go without otel will not be forced to indirectly depends on otel modules. I have made a draft version of instrumentation in https://github.com/wzy9607/amqp091otel, though I also havent's find a simple way to handle autoack=true.

I'm ok with having the open telemetry bits in a different package. At the same time, the library should provide automatic instrumentation. I think it's ok to "force" consumers of the library to "depend" on OTEL modules is acceptable, because the API libraries are non-functional/no-op calls without the OTEL SDK. It will be the user's decision to import OTEL SDK to make the API calls functional.

However, I do think, to support instrumentation, some utilities should be added to the amqp091-go package. for example,

  1. Mechanism to add middleware to Publish and Consume, which made it easier to instrument them and for users to use the instrumentation package. For reference, go-redis's Hook mechanism and how redisotel utilize it.
  2. Standard methods for consumer to start a context.Context from Delivery and mark the end of it, so the instrumentation package can properly start and end a consumer process Span.
  3. Method to obtain some channel property and server infos, which will make it easier for the instrumentation package to add those attributes, e.g. messaging.client.id, server.address.

Those suggestions are nice-to-have, but I'm not sure I understand why those utilities are necessary in order to support OTEL.

Copy link
Contributor

@Zerpet Zerpet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments in the discussions, and above in the main thread.

@wzy9607
Copy link

wzy9607 commented Sep 15, 2024

I don't think we should add opentelemetry spport directly to the amqp091-go package. It's better to implement the instrumentation in a sub package or separate repo (e.g. goredis instrumentation redisotel and gorm instrumentation go-gorm/opentelemetry), so that users who want to use amqp091-go without otel will not be forced to indirectly depends on otel modules. I have made a draft version of instrumentation in https://github.com/wzy9607/amqp091otel, though I also havent's find a simple way to handle autoack=true.

I'm ok with having the open telemetry bits in a different package. At the same time, the library should provide automatic instrumentation. I think it's ok to "force" consumers of the library to "depend" on OTEL modules is acceptable, because the API libraries are non-functional/no-op calls without the OTEL SDK. It will be the user's decision to import OTEL SDK to make the API calls functional.

I'm fine with either approach.

  1. Mechanism to add middleware to Publish and Consume, which made it easier to instrument them and for users to use the instrumentation package. For reference, go-redis's Hook mechanism and how redisotel utilize it.
  2. Standard methods for consumer to start a context.Context from Delivery and mark the end of it, so the instrumentation package can properly start and end a consumer process Span.
  3. Method to obtain some channel property and server infos, which will make it easier for the instrumentation package to add those attributes, e.g. messaging.client.id, server.address.

Those suggestions are nice-to-have, but I'm not sure I understand why those utilities are necessary in order to support OTEL.

2 is the same thing as this pr's Delivery.Span. 1 and 3 are utilities to make it easier for users to use separate package instrumentation and aren't necessary.

}

func (d Delivery) Settle(ctx context.Context, response DeliveryResponse, multiple, requeue bool) error {
defer settleDelivery(ctx, &d, response, multiple, requeue)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my understanding of messaging semconv, (“Settle” spans SHOULD be created for every manually or automatically triggered settlement operation. A single “Settle” span can account for a single message or for multiple messages (in case messages are passed for settling as batches). For each message it accounts for, the “Settle” span MAY link to the creation context of the message.)
the settle Span should start before calling Acknowledger.Ack() etc., and end right after Acknowledger.Ack() have returned.

Comment on lines +1594 to +1595
if err != nil {
errFn(err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errFn needs to be called regardless of there is an err or not, to properly ends the Span.

Suggested change
if err != nil {
errFn(err)
errFn(err)
if err != nil {

Maybe also rename errFn to endFn to make the intention clearer.

exchange, routinKey string,
immediate bool,
) (context.Context, Publishing, func(err error)) {
spanName := fmt.Sprintf("%s publish", routinKey)
Copy link

@wzy9607 wzy9607 Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specs recently changed to The span name SHOULD be {messaging.operation.name} {destination}.
And, to keep consistency with the example in the specs and .net implementation, messaging.destination.name attribute should be the exchange.
So, maybe:

	destinationName := exchange
	if len(destinationName) == 0 {
		destinationName = "amq.default"
	}
	spanName := "publish " + destinationName
...
	trace.WithAttributes(
		semconv.MessagingDestinationName(destinationName),

Comment on lines +188 to +189
semconv.MessagingMessageID(publishing.MessageId),
semconv.MessagingMessageConversationID(publishing.CorrelationId),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think messaging.message.conversation_id and messaging.message.id attrs should only be set if non empty, as in .net implementation

semconv.MessagingMessageID(publishing.MessageId),
semconv.MessagingMessageConversationID(publishing.CorrelationId),
semconv.MessagingSystemRabbitmq,
semconv.MessagingClientIDKey.String(publishing.AppId),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AppId is not the rabbitmq client id, but a application specified per message header.
I think maybe config.Properties["connection_name"] could be used if set, see https://www.rabbitmq.com/docs/connections#client-provided-names.

@AndrewWinterman
Copy link
Contributor Author

Hi folks, I haven't had a chance to circle back on this. Is the consensus at this point that OTEL is an acceptable default choice?

@wzy9607 how do you feel about @Zerpet's pushback on moving to a middleware/hooks based approach.

@lukebakken any other feedback at this point?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Proposal: W3C Context propagation integration via OpenTelemetry
4 participants