Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Event Hubs] Review retry logic against the guidelines #2661

Closed
ramya-rao-a opened this issue May 2, 2019 · 2 comments
Closed

[Event Hubs] Review retry logic against the guidelines #2661

ramya-rao-a opened this issue May 2, 2019 · 2 comments
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. Event Hubs
Milestone

Comments

@ramya-rao-a
Copy link
Contributor

Refer to General Guidelines on Retry and investigate which parts of those apply to Event Hubs and how much can be made configurable

@ramya-rao-a ramya-rao-a added Client This issue points to a problem in the data-plane of the library. Event Hubs labels May 2, 2019
@ramya-rao-a ramya-rao-a changed the title [Event Hubs] Investigate if retry logic should be configurable [Event Hubs] Review retry logic against the guidelines May 5, 2019
@ramya-rao-a
Copy link
Contributor Author

Review of current state of things:

  • Retry logic is implemented in the amqp-common library and is used by both Event Hubs and Service Bus
  • While Event Hubs and Service Bus libraries can configure the number of retry attempts and delay between retries when using the retry logic from the common library, this is not exposed to users today.
  • Below is the list of all main operations and their retry policies as per current state of things in Event Hubs.
Operations Class Retry policy
AMQP Connection creation when the client is created ConnectionContext None. And this is fine. No real connection to the service is made until the first request is sent to Service Bus to either send/receive or use any of the $management operations
AMQP Link creation MessageSender, BatchingReceiver, StreamingReceiver, ManagementClient, CbsClient No retries for link creation, timeout is 60 seconds. For negotiating claim, see below
AMQP Link recovery which occurs when a retryable error occurs on the link or the connection MessageSender, BatchingReceiver, StreamingReceiver retryAttempts: 150, delayBetweenRetries: 15, timeoutForOperation: 60 ------- When any of the these links are recovered, claims are negotiated which brings up the CBS link which is why we don't attempt to recover it explicitly. ------ Subsequent management operations will create new $management links which is why we don't attempt to recover it explicitly
NegotiateClaim CbsClient retryAttempts: 3, delayBetweenRetries: 15, timeoutForOperation: 10
Operations on the $management link ManagementClient retryAttempts: 3, delayBetweenRetries: 15, timeoutForOperation: 10.
send & sendBatch MessageSender retryAttempts: 3, delayBetweenRetries: 5 + jitter , timeoutForOperation: 60
Receive in a batch BatchingReceiver None. If link goes down in the middle of receiving, error is thrown to the user but link is not recovered
Receive in a stream StreamingReceiver None. If link goes down in the middle of receiving due to a retryable error, the link is recovered and we resume receiving messages

@ramya-rao-a
Copy link
Contributor Author

ramya-rao-a commented May 12, 2019

Conclusion:

All the below facing apis should take an optional argument which will contain a flat list of the retry related configuration that we want to expose to users i.e retryAttempts, delayBetweenRetries and idleTimeoutInSeconds. If the api already has an optional parameter, then provide overloads such that all existing apis work, and the overload will roll up all optional arguments into one options bag

The default values for each retry option for each api will be reviewed during the actual work of implementing these changes and therefore are not covered here

  • send and sendBatch
    • retry is most frequently used here due to rhea not being in a "sendable" state when it has too many messages queued up for sending. Also helpful for intermittent network problems. Therefore, it uses higher timeouts than others
    • TODO: Expose the retry options to the user and pass it to EventHubsSender
    •   send(data: EventData, partitionId?: string | number): Promise<Delivery>; // current api
        send(data: EventData, options: SendOptions): Promise<Delivery>; // suggested overload
  • receiveBatch
    • The errors that occur during this operation are mainly either errors on the receiver link, the session or the entire connection.
    • TODO: Update each of these to determine if the error is retryable. If yes, then we should set up the link and add credits again as per retry options. Remember to reset the stream to the original position before this.
    receiveBatch(
            partitionId: string | number, 
            maxMessageCount: number, 
            maxWaitTimeInSeconds?: number, 
            options?: ReceiveOptions
        ): Promise<EventData[]>; // current api
       receiveBatch(
            partitionId: string | number, 
            maxMessageCount: number, 
            options?: ReceiveOptions
        ): Promise<EventData[]>; // suggested overload
      ```
  • receive (streaming receiver)
    • The errors that occur during this operation are mainly either errors on the receiver link, the session or the entire connection. We already have code in place to recreate the link if the error was retryable.
    • TODO: Expose the retry options to the user and pass it to EventHubsRecevier
    • No change in api needed as it already takes in an options bag.
  • $management operations
    • The entire retry logic is already in place in RequestResponseLink class.
    • TODO: Expose the retry options to user and pass it along via the sendRequest call on the RequestResponseLink class

Types

  • Introduce new interface RequestOptions to hold the retry configuration. This will be used to pass cancellation token as part of another task
  • Existing ReceiveOptions will extend the RequestOptions
  • SendOptions will extend RequestOptions, with the additional partitionId option

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. Event Hubs
Projects
None yet
Development

No branches or pull requests

1 participant