Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Event Hubs] ETW fixes #43210

Merged
merged 1 commit into from
Apr 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@

- The `EventProcessorClient` will now create a unique span for each event emitted to the handler for processing. Previously a single span was created for all events in a batch. ([#31922](https://github.com/Azure/azure-sdk-for-net/issues/31922))

- Fixed a parameter ordering error in ETW 22 (EventBatchProcessingError) that caused structured data arguments to be ordered differently than the associated replacement token in the log message.

### Other Changes

## 5.11.1 (2024-03-05)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -94,16 +94,16 @@ public virtual void EventBatchProcessingComplete(string partitionId,
/// <param name="identifier">A unique name used to identify the event processor.</param>
/// <param name="eventHubName">The name of the Event Hub that the processor is associated with.</param>
/// <param name="consumerGroup">The name of the consumer group that the processor is associated with.</param>
/// <param name="operationId">An identifier for the processing operation, allowing its activities to be correlated.</param>
/// <param name="errorMessage">The message for the exception that occurred.</param>
/// <param name="operationId">An identifier for the processing operation, allowing its activities to be correlated.</param>
///
[Event(22, Level = EventLevel.Error, Message = "An exception occurred while processing events for partition '{0}' by processor instance with identifier '{1}' for Event Hub: {2} and Consumer Group: {3}. Operation Id: '{5}'; Error Message: '{4}'")]
[Event(22, Level = EventLevel.Error, Message = "An exception occurred while processing events for partition '{0}' by processor instance with identifier '{1}' for Event Hub: {2} and Consumer Group: {3}. Error Message: '{4}'; Operation Id: '{5}'")]
public virtual void EventBatchProcessingError(string partitionId,
string identifier,
string eventHubName,
string consumerGroup,
string operationId,
string errorMessage)
string errorMessage,
string operationId)
{
if (IsEnabled())
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1145,7 +1145,7 @@ protected override async Task OnProcessingEventBatchAsync(IEnumerable<EventData>
// This exception is not surfaced to the error handler or bubbled, as the entire batch must be
// processed or events will be lost. Preserve the exceptions, should any occur.

Logger.EventBatchProcessingError(partition.PartitionId, Identifier, EventHubName, ConsumerGroup, operation, ex.Message);
Logger.EventBatchProcessingError(partition.PartitionId, Identifier, EventHubName, ConsumerGroup, ex.Message, operation);

caughtExceptions ??= new List<Exception>();
caughtExceptions.Add(ex);
Expand Down Expand Up @@ -1174,7 +1174,7 @@ protected override async Task OnProcessingEventBatchAsync(IEnumerable<EventData>
// This exception was either not related to processing events or was the result of sending an empty batch to be
// processed. Since there would be no other caught exceptions, tread this like a single case.

Logger.EventBatchProcessingError(partition.PartitionId, Identifier, EventHubName, ConsumerGroup, operation, ex.Message);
Logger.EventBatchProcessingError(partition.PartitionId, Identifier, EventHubName, ConsumerGroup, ex.Message,operation);
throw;
}
finally
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1199,8 +1199,8 @@ public async Task EventProcessingLogsExceptions()
processorClient.Identifier,
processorClient.EventHubName,
processorClient.ConsumerGroup,
It.IsAny<string>(),
expectedException.Message),
expectedException.Message,
It.IsAny<string>()),
Times.Exactly(eventBatch.Length));

cancellationSource.Cancel();
Expand Down
2 changes: 2 additions & 0 deletions sdk/eventhub/Azure.Messaging.EventHubs/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@

### Bugs Fixed

- Fixed a warning for "too many parameters" that was written to logs when the processor attempted to log load balancing messages due to improper ETW attribute use.

### Other Changes

- It is now possible to set `byte[]` values as [application properties](https://docs.oasis-open.org/amqp/core/v1.0/os/amqp-core-messaging-v1.0-os.html#type-application-properties) in the `EventData.Properties` collection.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2111,15 +2111,15 @@ public virtual void EventProcessorLoadBalancingCycleComplete(string identifier,
/// <param name="durationSeconds">The total duration that load balancing took to complete, in seconds.</param>
/// <param name="loadBalancingIntervalSeconds">The interval, in seconds, that partition ownership is reserved for.</param>
///
[Event(103, Level = EventLevel.Warning, Message = "A load balancing cycle has taken too long to complete for the processor instance with identifier '{0}' for Event Hub: {1}. A slow cycle can cause stability issues with partition ownership. Consider investigating storage latency and thread pool health. Common causes are latency in storage operations and too many partitions owned. You may also want to consider increasing the 'PartitionOwnershipExpirationInterval' in the processor options. Cycle Duration: '{2:0.00}' seconds. Partition Ownership Duration: '{3:0.00}' seconds. {4}")]
[NonEvent]
public virtual void EventProcessorLoadBalancingCycleSlowWarning(string identifier,
string eventHubName,
double durationSeconds,
double loadBalancingIntervalSeconds)
{
if (IsEnabled())
{
WriteEvent(103, identifier ?? string.Empty, eventHubName ?? string.Empty, durationSeconds, loadBalancingIntervalSeconds, Resources.TroubleshootingGuideLink);
EventProcessorLoadBalancingCycleSlowWarningCore(identifier, eventHubName, durationSeconds, loadBalancingIntervalSeconds, Resources.TroubleshootingGuideLink);
}
}

Expand All @@ -2134,7 +2134,7 @@ public virtual void EventProcessorLoadBalancingCycleSlowWarning(string identifie
/// <param name="ownedPartitionCount">The number of partitions owned.</param>
/// <param name="maximumAdvisedCount">The maximum number of partitions that are advised for this processor instance.</param>
///
[Event(104, Level = EventLevel.Warning, Message = "The processor instance with identifier '{0}' for Event Hub: {1} owns a higher than recommended number of partitions for average workloads. Owning too many partitions may cause slow performance and stability issues. Consider monitoring performance and partition ownership stability to ensure that they meet expectations. If not, adding processors to the group may help. Total partition count: '{2}'. Owned partition count: '{3}'. Maximum recommended partitions owned: '{4}'. This warning is based on a general heuristic that will differ between applications. If you are not experiencing issues, this warning is safe to ignore. {5}")]
[NonEvent]
public virtual void EventProcessorHighPartitionOwnershipWarning(string identifier,
string eventHubName,
int totalPartitionCount,
Expand All @@ -2143,7 +2143,7 @@ public virtual void EventProcessorHighPartitionOwnershipWarning(string identifie
{
if (IsEnabled())
{
WriteEvent(104, identifier ?? string.Empty, eventHubName ?? string.Empty, totalPartitionCount, ownedPartitionCount, maximumAdvisedCount, Resources.TroubleshootingGuideLink);
EventProcessorHighPartitionOwnershipWarningCore(identifier, eventHubName, totalPartitionCount, ownedPartitionCount, maximumAdvisedCount, Resources.TroubleshootingGuideLink);
}
}

Expand Down Expand Up @@ -2682,15 +2682,15 @@ public virtual void BufferedProducerIdleComplete(string identifier,
/// <param name="loadBalancingIntervalSeconds">The interval, in seconds, that should pass between load balancing cycles.</param>
/// <param name="ownershipIntervalSeconds">The interval, in seconds, that partition ownership is reserved for..</param>
///
[Event(128, Level = EventLevel.Warning, Message = "The 'PartitionOwnershipExpirationInterval' and 'LoadBalancingUpdateInterval' are configured using intervals that may cause stability issues with partition ownership for the processor instance with identifier '{0}' for Event Hub: {1}. It is recommended that the 'PartitionOwnershipExpirationInterval' be at least 3 times greater than the 'LoadBalancingUpdateInterval' and very strongly advised that it should be no less than twice as long. When these intervals are too close together, ownership may expire before it is renewed during load balancing which will cause partitions to migrate. Consider adjusting the intervals in the processor options if you experience issues. Load Balancing Interval '{2:0.00}' seconds. Partition Ownership Interval '{3:0.00}' seconds. {4}")]
[NonEvent]
public virtual void ProcessorLoadBalancingIntervalsTooCloseWarning(string identifier,
string eventHubName,
double loadBalancingIntervalSeconds,
double ownershipIntervalSeconds)
{
if (IsEnabled())
{
WriteEvent(128, identifier ?? string.Empty, eventHubName ?? string.Empty, ownershipIntervalSeconds, loadBalancingIntervalSeconds, Resources.TroubleshootingGuideLink);
ProcessorLoadBalancingIntervalsTooCloseWarningCore(identifier, eventHubName, ownershipIntervalSeconds, loadBalancingIntervalSeconds, Resources.TroubleshootingGuideLink);
}
}

Expand Down Expand Up @@ -2769,6 +2769,71 @@ public virtual void EventProcessorFatalTaskError(string identifier,
///
public virtual string GetLogFormattedUtcNow() => DateTime.UtcNow.ToString("yyyy-mm-ddTHH:mm:ss.fffZ", CultureInfo.InvariantCulture);

/// <summary>
/// Indicates that an <see cref="EventProcessor{TPartition}" /> instance has a load balancing cycle that
/// ran slowly enough to be a concern.
/// </summary>
///
/// <param name="identifier">A unique name used to identify the event processor.</param>
/// <param name="eventHubName">The name of the Event Hub that the processor is associated with.</param>
/// <param name="durationSeconds">The total duration that load balancing took to complete, in seconds.</param>
/// <param name="loadBalancingIntervalSeconds">The interval, in seconds, that partition ownership is reserved for.</param>
/// <param name="troubleshootingGuideLink">A link to the Event Hubs troubleshooting guide.</param>
///
[Event(103, Level = EventLevel.Warning, Message = "A load balancing cycle has taken too long to complete for the processor instance with identifier '{0}' for Event Hub: {1}. A slow cycle can cause stability issues with partition ownership. Consider investigating storage latency and thread pool health. Common causes are latency in storage operations and too many partitions owned. You may also want to consider increasing the 'PartitionOwnershipExpirationInterval' in the processor options. Cycle Duration: '{2:0.00}' seconds. Partition Ownership Duration: '{3:0.00}' seconds. {4}")]
private void EventProcessorLoadBalancingCycleSlowWarningCore(string identifier,
string eventHubName,
double durationSeconds,
double loadBalancingIntervalSeconds,
string troubleshootingGuideLink)
{
WriteEvent(103, identifier ?? string.Empty, eventHubName ?? string.Empty, durationSeconds, loadBalancingIntervalSeconds, troubleshootingGuideLink);
}

/// <summary>
/// Indicates that an <see cref="EventProcessor{TPartition}" /> instance has taken responsibility for a number of
/// partitions that may impact performance and normal operation.
/// </summary>
///
/// <param name="identifier">A unique name used to identify the event processor.</param>
/// <param name="eventHubName">The name of the Event Hub that the processor is associated with.</param>
/// <param name="totalPartitionCount">The total number of partitions.</param>
/// <param name="ownedPartitionCount">The number of partitions owned.</param>
/// <param name="maximumAdvisedCount">The maximum number of partitions that are advised for this processor instance.</param>
/// <param name="troubleshootingGuideLink">A link to the Event Hubs troubleshooting guide.</param>
///
[Event(104, Level = EventLevel.Warning, Message = "The processor instance with identifier '{0}' for Event Hub: {1} owns a higher than recommended number of partitions for average workloads. Owning too many partitions may cause slow performance and stability issues. Consider monitoring performance and partition ownership stability to ensure that they meet expectations. If not, adding processors to the group may help. Total partition count: '{2}'. Owned partition count: '{3}'. Maximum recommended partitions owned: '{4}'. This warning is based on a general heuristic that will differ between applications. If you are not experiencing issues, this warning is safe to ignore. {5}")]
private void EventProcessorHighPartitionOwnershipWarningCore(string identifier,
string eventHubName,
int totalPartitionCount,
int ownedPartitionCount,
int maximumAdvisedCount,
string troubleshootingGuideLink)
{
WriteEvent(104, identifier ?? string.Empty, eventHubName ?? string.Empty, totalPartitionCount, ownedPartitionCount, maximumAdvisedCount, troubleshootingGuideLink);
}

/// <summary>
/// Indicates that an <see cref="EventProcessor{TPartition}" /> instance has a load balancing cycle that
/// ran slowly enough to be a concern.
/// </summary>
///
/// <param name="identifier">A unique name used to identify the event processor.</param>
/// <param name="eventHubName">The name of the Event Hub that the processor is associated with.</param>
/// <param name="loadBalancingIntervalSeconds">The interval, in seconds, that should pass between load balancing cycles.</param>
/// <param name="ownershipIntervalSeconds">The interval, in seconds, that partition ownership is reserved for.</param>
/// <param name="troubleshootingGuideLink">A link to the Event Hubs troubleshooting guide.</param>
///
[Event(128, Level = EventLevel.Warning, Message = "The 'PartitionOwnershipExpirationInterval' and 'LoadBalancingUpdateInterval' are configured using intervals that may cause stability issues with partition ownership for the processor instance with identifier '{0}' for Event Hub: {1}. It is recommended that the 'PartitionOwnershipExpirationInterval' be at least 3 times greater than the 'LoadBalancingUpdateInterval' and very strongly advised that it should be no less than twice as long. When these intervals are too close together, ownership may expire before it is renewed during load balancing which will cause partitions to migrate. Consider adjusting the intervals in the processor options if you experience issues. Load Balancing Interval '{2:0.00}' seconds. Partition Ownership Interval '{3:0.00}' seconds. {4}")]
private void ProcessorLoadBalancingIntervalsTooCloseWarningCore(string identifier,
string eventHubName,
double loadBalancingIntervalSeconds,
double ownershipIntervalSeconds,
string troubleshootingGuideLink)
{
WriteEvent(128, identifier ?? string.Empty, eventHubName ?? string.Empty, ownershipIntervalSeconds, loadBalancingIntervalSeconds, troubleshootingGuideLink);
}

/// <summary>
/// Indicates that the receiving of events has completed.
/// </summary>
Expand Down
Loading