-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ReadTimeoutException between requests #640
Fix ReadTimeoutException between requests #640
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I have some minor improvements that would be nice additions for minimal cost. Thanks for spotting this and fixing it. I'm not sure why I doubled up on the connection close handler.
There is a side note that we might want to put a timeout on how stale a connection could be before we recycle it. We're using a guava CacheBuilder so adding a timeout isn't too challenging. We SHOULD already get some of that because of the timeouts that we do on source connections... however, if we're playing back at a slower or variable speed, or the target has some different parameters, that might be an issue.
I'd rather invest time into managing retries well and find out how prevalent that issue is before worrying about it.
...ure/trafficReplayer/src/main/java/org/opensearch/migrations/replay/ClientConnectionPool.java
Show resolved
Hide resolved
|
||
@Override | ||
public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) { | ||
socketContext.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add a line beforehand for socketContext.addTraceException(cause, true)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was the socketContext NOT being closed when there was an exception? Does an exception skip unregistration?
I think it's safe to call close() twice, but can you check? If it isn't, we should have a check in this class (like nulling out the socketContext)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. Socket context would have been closed twice with this in cases an exception reaches ConnectionClosedListenerHandler. It doesn't occur in the timeout case since that handler is after this one.
It doesn't look like .close should be called multiple times, it seems that would double count metrics. I'll do a couple changes to fix this.
- remove the socketContext.close from exception caught
- change channelUnregistered -> channelInactive. Technically, unregistered can be called multiple times, but inactive is only called once on channel close
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our use, channel is only unregistered when it's closed, but there can be implementations where channel is unregistered from the event loop and re-registered again so inactive follows the contracts better
...c/test/java/org/opensearch/migrations/replay/datahandlers/NettyPacketToHttpConsumerTest.java
Outdated
Show resolved
Hide resolved
...c/test/java/org/opensearch/migrations/replay/datahandlers/NettyPacketToHttpConsumerTest.java
Outdated
Show resolved
Hide resolved
...c/test/java/org/opensearch/migrations/replay/datahandlers/NettyPacketToHttpConsumerTest.java
Show resolved
Hide resolved
...c/test/java/org/opensearch/migrations/replay/datahandlers/NettyPacketToHttpConsumerTest.java
Outdated
Show resolved
Hide resolved
8e37c54
to
201a7af
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #640 +/- ##
============================================
- Coverage 75.79% 75.75% -0.05%
Complexity 1543 1543
============================================
Files 168 168
Lines 6437 6446 +9
Branches 575 574 -1
============================================
+ Hits 4879 4883 +4
- Misses 1179 1183 +4
- Partials 379 380 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed all PR feedback. Thanks @gregschohn
Signed-off-by: Andre Kurait <akurait@amazon.com>
Signed-off-by: Andre Kurait <akurait@amazon.com>
Signed-off-by: Andre Kurait <akurait@amazon.com>
Signed-off-by: Andre Kurait <akurait@amazon.com>
201a7af
to
2ae2959
Compare
Description
Currently with the traffic replayer, in the NettyPacketToHttpConsumer, the ReadTimeoutHandler is coupled with the ConnectionClosedListenerHandler.
This means that we count the time in between request towards the ReadTimeout. This causes the channel to throw an exception while it is waiting to be re-used and may cause race conditions if the channel is reused exactly at the timeout interval.
This change places the ReadTimeoutHandler behind the handlers which remain between requests.
This change also fixes an issue where the BacksideHttpWatcherHandler was propagating exceptions through even though it shouldn't since it handles them through the callback finalization.
Issues Resolved
MIGRATIONS-1723
Is this a backport? If so, please add backport PR # and/or commits #
Testing
Ran existing and added new unit tests. Also ran E2E tests from #637 and verified no ReadTimeoutException in logs (they were occurring before this change)
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.