Only emit AMQP channels downstream when they are active. #24582

conniey · 2021-10-06T19:57:19Z

Currently in AMQPChannelProcessor, when the CBS node is closed, it'll make a request upstream for a new instance. This instance is immediately emitted even if it's not active.
- In the case that the underlying connection is closed, the link will never be active.
- Consequently, there is a lot of back and forth when reconnection happens.
This should alleviate the mass of logs accumulated about retries and fix retry policy-related issues.

/cc @anuchandy

liukun-msft · 2022-03-02T06:49:47Z

Log Analysis

azure-amqp-core version: 2.4.0

We can define 5 phases for the closing logs logs.md:

Connection start to close
CBS node start to close
CBS node close timeout exception
CBS node constantly retry logs
ReactorExecutor closed

Call graph

Graph 1 - CBS node start to close and timeout exception

Graph 2 - CBS node constantly retry and ReactorExecutor closed

Issue 1: Two threads are created to close CBS node

Reason

We manually try to close an closed CBS node twice. and both got timeout.

First time:

...
//RequestResponseChannel
this.subscriptions = Disposables.composite(
    ...
    amqpConnection.getShutdownSignals().next().flatMap(signal -> {
        logger.verbose("Shutdown signal received.");
        return closeAsync();
    }).subscribe()
);

Second time:

//ReactorConnection
final Mono<Void> cbsCloseOperation;
if (cbsChannelProcessor != null) {
    cbsCloseOperation = cbsChannelProcessor.flatMap(channel -> channel.closeAsync());
} else {
    cbsCloseOperation = Mono.empty();
}

These two manually calls both encounter the timeout exceptions and create a new thread to resume.

Issue 2: CBS node close timeout exception

RequestResponseChannel sender and receiver links haven't sent the Link:Detach frame. And the onRemoveLinkClosed in link handlers haven't been triggered to emit complete closing signal to end the cbsClosingWithTimeout.

Why Link:Detach frame haven't been sent?

Because ReactorSession is closed ealier than RequestResponseChannel and Session:END frame will override the sending of Link:Detach frame. So when channel is ready to send Session:END and Link:Detach are not be scheduled to send, we will met this timeout issue.

When we use the debug mode, sometimes there is no timeout issue. This is because channel is ready until both Session:END and Link:Detach have been scheduled to send. In this case, channel will send Link:Detach first and then Session:END frame.

Issue 3: CBS node retry infinite loop which cause massive logs

Reason

When CBS node (RequestResponseChannel) is closed, it will emit a complete signal, and that signal is catch by connectionSubscription which will request a new upstream RequestResponseChannel.

But when new instance of RequestResponseChannel is created, it will be closed immediately since connection is shutdown. And that close action will emit a complete signal again and request upstream again.

This back and forth process cause massive log in console.

Code details
When CBS sender and receiver links are closed. It will emit a complete signal:

//RequestResponseChannel
private void onTerminalState() {
    ...
    endpointStates.emitComplete(((signalType, emitResult) -> onEmitSinkFailure(...)));
    ...
}

The complete signal is catched by connectionSubscription. Because channel status is not disposed currently and then goes to requestUpstream().

//AmqpChannelProcessor
connectionSubscription = endpointStatesFunction.apply(amqpChannel).subscribe(
        ...
        () -> {
                ...
                requestUpstream();
        });

Because CBS channel is requested from a repeat Flux inside createRequestResponseChannel(), so a RequestResponseChannel instance is created.

//ReactorConnection
protected AmqpChannelProcessor<RequestResponseChannel> createRequestResponseChannel(
    final Flux<RequestResponseChannel> createChannel = ...
         .map(reactorSession -> new RequestResponseChannel(...))
        .doOnNext(e -> {...})
        .repeat();

However, inside RequestResponseChannel constructor, it call closeAysc() and schedule link close tasks, which close current channel after it created.

...
//RequestResponseChannel
this.subscriptions = Disposables.composite(
    ...
    amqpConnection.getShutdownSignals().next().flatMap(signal -> {
        logger.verbose("Shutdown signal received.");
        return closeAsync();
    }).subscribe()
);

As new channel is closed, it will emit a complete signal to request new upstream, which goes to the beginning step.

conniey added Client This issue points to a problem in the data-plane of the library. Azure.Core azure-core pillar-reliability The issue is related to reliability, one of our core engineering pillars. (includes stress testing) amqp Label for tracking issues related to AMQP labels Oct 6, 2021

conniey added this to the [2021] November milestone Oct 6, 2021

conniey assigned anuchandy and conniey Oct 6, 2021

conniey mentioned this issue Oct 13, 2021

EmissionException: Spec. Rule 1.3 encountered when trying to recover receiver. #24762

Closed

conniey added Event Hubs Service Bus labels Jan 14, 2022

conniey removed this from the [2021] November milestone Jan 14, 2022

ramya-rao-a added this to the Backlog milestone Jan 14, 2022

liukun-msft self-assigned this Feb 21, 2022

liukun-msft mentioned this issue Mar 2, 2022

Stop emitting RequestResponseChannel after ReactorConnection is disposed #27400

Merged

liukun-msft modified the milestones: Backlog, [2022] April Mar 16, 2022

liukun-msft closed this as completed in #27400 Jun 9, 2022

liukun-msft mentioned this issue Jun 9, 2022

[AMQP] CBS node sometimes waits to close until timeout #29335

Closed

github-actions bot locked and limited conversation to collaborators Apr 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only emit AMQP channels downstream when they are active. #24582

Only emit AMQP channels downstream when they are active. #24582

conniey commented Oct 6, 2021 •

edited

Loading

liukun-msft commented Mar 2, 2022 •

edited

Loading

Only emit AMQP channels downstream when they are active. #24582

Only emit AMQP channels downstream when they are active. #24582

Comments

conniey commented Oct 6, 2021 • edited Loading

liukun-msft commented Mar 2, 2022 • edited Loading

Log Analysis

Call graph

Issue 1: Two threads are created to close CBS node

Issue 2: CBS node close timeout exception

Issue 3: CBS node retry infinite loop which cause massive logs

conniey commented Oct 6, 2021 •

edited

Loading

liukun-msft commented Mar 2, 2022 •

edited

Loading