Add max retry while creating streams in subscription #594

ajaaym · 2019-05-06T15:58:31Z

Tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

codecov · 2019-05-06T16:17:58Z

Codecov Report

Merging #594 into master will decrease coverage by 0.09%.
The diff coverage is 95.65%.

@@           Coverage Diff           @@
##           master   #594     +/-   ##
=======================================
- Coverage   98.09%    98%   -0.1%     
=======================================
  Files          13     13             
  Lines         841    853     +12     
  Branches      175    180      +5     
=======================================
+ Hits          825    836     +11     
  Misses          1      1             
- Partials       15     16      +1

Impacted Files	Coverage Δ
src/message-stream.ts	`99% <95.65%> (-1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6415e7c...533c01d. Read the comment docs.

callmehiphop · 2019-05-06T16:29:19Z

src/message-stream.ts

+/*!
+ * Time to reset retries counter.
+ */
+const RESET_RETRIES_COUNT_TIME_MS = 30000;


Where does 30 seconds come from? We might want to get some input from @anguillanneuf here, but for some reason I seem to remember the PubSub team requesting that we retry for up to 5 minutes.

@callmehiphop Idea is that if stream starts failing, almost all active/creating new stream will fail with in that timeout(30s by default) and it will retry for max number of retries specified(3 by default). 10-30s is good enough to determine if there is a connection/authentication issue to start with or if it needs refresh token/deadline exceeded after initial setup. 5 min to keep retrying is too long for this.

I can understand the need to modify our retry mechanisms to support unauthenticated requests, but this change makes me a little nervous. My concern is that we start seeing an increase of errors that kill the subscriber that would otherwise be retried within that 5 minute duration. I would be interested to know how other clients handle retrying this specific rpc.

@sduskis any thoughts here?

Adding @jadekler @jskeet for comments. I don't know if the 90 seconds found in GAPIC applies here.

We definitely need to have a more holistic look at this topic. A couple of notes:

I think an ideal state to be in is that the library creates no deadlines, instead opting to allow the user to specify a deadline. In some languages you can do this with contexts, but not all languages have contexts. We (client libs team, AC tools team) are investigating this topic later today.

The list of codes to retry below seems too broad. I've noticed this in several pubsub clients. I've got an action item to chat with @sduskis about this. That said, immediate feedback on this PR is that I'm a thumbs down on this PR adding UNAUTHENTICATED to retry codes. It's a signal to the user that their config is borked and should not be retried.

callmehiphop · 2019-05-06T16:38:30Z

src/message-stream.ts

@@ -301,8 +298,14 @@ export class MessageStream extends PassThrough {
   */
  private _onEnd(stream: PullStream, status: StatusObject): void {
    this._removeStream(stream);
-
-    if (RETRY_CODES.includes(status.code)) {
+    if (Date.now() - this._timeSinceFirstError > this._options.timeout!) {


So to clarify, this will wait to see if the stream was open for more than 30 seconds (or whatever it was set to) and if so we consider it a success?

Its actually checks if there are any errors in stream try creating new stream for 30 sec or max retries and fail afterwards.

callmehiphop · 2019-05-06T17:06:00Z

src/message-stream.ts

+    }
+    if (
+      RETRY_CODES.includes(status.code) &&
+      this._retries++ < this._options.maxRetries!


It isn't really uncommon for all the streams to end around the same time, I think if this were to happen then we would likely blow past the default maxRetries pretty quick and end up having too few streams open at any given time.

_fillStreamPool actually fills up stream pool, so it will create all the stream it requires to fill up the pool back even on last retry. The scenario you mentioned can happen only if there is a single stream thats connected and all other old and newly created stream keeps failing which sounds less likely. In this case it will recreate all remaining stream if last non failing stream fails.

I don't think that is always true. I've seen scenarios where the ending of streams can be staggered ever so slightly and in some cases _fillStreamPool will need to be called multiple times.

chance of that scenario could be reduced by decreasing timeout and increasing maxRetries. WDYT of adding call to _fillStreamPool in keepAliveHandler if there are not stream failure since specified timeout.

I like the idea of improving our refill strategy, but I think putting it in the keep alive handler is a little heavy handed. I think even just wrapping fillStreamPool in something like setImmediate might do the trick.

fillStreamPool is actually a async function and its getting executed in separate event loop than the one that failed stream.

Sorry, the thought I was trying to convey is that we might want to slightly stagger the call to fill the pool so that more streams have a chance to close.

@callmehiphop Added fix to fill up pool if stream errors out after retries are exhausted and there are less active stream in a pool. Also changed logic for retry count to count multiple stream failure as single retry.

…usted

callmehiphop · 2019-05-07T07:11:16Z

src/message-stream.ts

-   * @param {object} client The gRPC client to wait for.
-   * @returns {Promise}
-   */
-  private async _waitForClientReady(client: ClientStub): Promise<void> {


I know we talked about deleting this, so I was expecting it, do we also need to enable channel pooling?

That option is available by default.
const pubsub = new PubSub(
{
'grpc_gcp.apiConfig': {
"channelPool": {
"maxSize": 5
},
}
}
);

Oh, this might be problematic. My understanding is that we're making the switch to the pure JS version of gRPC soon and it doesn't support gcp. @alexander-fenster can you confirm that?

That's right. If you need to use this grpc_gcp functionality you'll need to do two things:

pass C-code gRPC as a parameter to client constructor;

set the grpc_gcp options in Pub/Sub code since they will be removed from gax.

I'm going to do the same for Spanner. If Pub/Sub is also affected, we can postpone its upgrade to gax v1.0.0 until this is fixed in Pub/Sub code (together with Spanner).

Thanks @alexander-fenster! Is the plan to keep Spanner on the C version until gcp support finds its way to grpc-js?

Yes. Also, we will remove all grpc-gcp code from gax (actually, already removed, just not yet released as v1.0.0) since it's not a good place for it (and an extra dependency on grpc), so libraries that want to use grpc-gcp should depend on it directly.

Good to know. I think we're going to want postpone the gax upgrade here unless @ajaaym can think of another fix that doesn't involve channel pooling.

@callmehiphop WDYT of option in getClient of pubsub to provide new client vs cached one? Then depending on maxStreams we can get create maxStreams/100 + 1 client.

I think that sounds good. Out of curiosity have we tested against grpc-js to see if it is affected by this?

I was playing around with that yesterday and looks like its not that stable yet. Yes there is a limitation of 100 concurrent stream. Also I dont see option to set flow control limit which would be needed for load balancing ultra slow consumer. When testing with 100 streams I was randomly getting this error along with DEADLINE EXCEEDED. I think its better we hold on to upgrade until we have this option and well tested with different conditions.

Error [ERR_STREAM_WRITE_AFTER_END]: write after end at writeAfterEnd (_stream_writable.js:243:12) at ClientHttp2Stream.Writable.write (_stream_writable.js:291:5) at filterStack.sendMessage.then (/Users/ajaymovalia/work/node/testwithexpress2/node_modules/@grpc/grpc-js/build/src/call-stream.js:396:34) Emitted 'error' event at: at onwriteError (_stream_writable.js:431:12) at onwrite (_stream_writable.js:456:5) at process._tickCallback (internal/process/next_tick.js:63:19)

googlebot added the cla: yes This human has signed the Contributor License Agreement. label May 6, 2019

ajaaym added 2 commits May 6, 2019 12:01

add max retry in subscription

12e0c3f

update default value in description

b82589a

ajaaym force-pushed the subscripton_rety branch from b1303ca to b82589a Compare May 6, 2019 16:02

ajaaym requested a review from callmehiphop May 6, 2019 16:03

fix lint

b878269

fix lint

f44921b

callmehiphop reviewed May 6, 2019

View reviewed changes

ajaaym and others added 4 commits May 6, 2019 16:19

dont increase retry count if there is pending stream creation

c5c0afa

fill up stream pool if there are active stream after retries are exha…

6aef7a5

…usted

Merge branch 'master' into subscripton_rety

3d898e9

Merge branch 'master' into subscripton_rety

a9fcbe5

callmehiphop reviewed May 7, 2019

View reviewed changes

ajaaym and others added 2 commits May 8, 2019 07:44

Merge branch 'master' into subscripton_rety

b912276

Merge branch 'master' into subscripton_rety

533c01d

callmehiphop mentioned this pull request May 13, 2019

refactor(subscriber): message stream retry logic #607

Merged

3 tasks

yoshi-automation added the 🚨 This issue needs some love. label May 13, 2019

ajaaym closed this May 14, 2019

feywind pushed a commit to feywind/nodejs-pubsub that referenced this pull request Nov 12, 2024

docs: update jsdoc license/samples-README (googleapis#594)

2e13a3c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add max retry while creating streams in subscription #594

Add max retry while creating streams in subscription #594

ajaaym commented May 6, 2019

codecov bot commented May 6, 2019 •

edited

Loading

callmehiphop May 6, 2019

ajaaym May 6, 2019

callmehiphop May 9, 2019 •

edited

Loading

anguillanneuf May 9, 2019

jeanbza May 9, 2019

callmehiphop May 6, 2019

ajaaym May 6, 2019

callmehiphop May 6, 2019

ajaaym May 6, 2019

callmehiphop May 6, 2019

ajaaym May 6, 2019

callmehiphop May 6, 2019

ajaaym May 6, 2019

callmehiphop May 6, 2019

ajaaym May 6, 2019

callmehiphop May 7, 2019

ajaaym May 7, 2019

callmehiphop May 7, 2019

alexander-fenster May 7, 2019 •

edited

Loading

callmehiphop May 7, 2019

alexander-fenster May 7, 2019

callmehiphop May 7, 2019

ajaaym May 7, 2019

callmehiphop May 8, 2019

ajaaym May 8, 2019

Add max retry while creating streams in subscription #594

Add max retry while creating streams in subscription #594

Conversation

ajaaym commented May 6, 2019

codecov bot commented May 6, 2019 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

callmehiphop May 9, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexander-fenster May 7, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented May 6, 2019 •

edited

Loading

callmehiphop May 9, 2019 •

edited

Loading

alexander-fenster May 7, 2019 •

edited

Loading