-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assorted bugfixes and improvements #547
Assorted bugfixes and improvements #547
Conversation
Signed-off-by: Greg Schohn <greg.schohn@gmail.com>
# Conflicts: # TrafficCapture/trafficReplayer/src/main/java/org/opensearch/migrations/replay/netty/BacksideHttpWatcherHandler.java # TrafficCapture/trafficReplayer/src/test/java/org/opensearch/migrations/replay/datahandlers/NettyPacketToHttpConsumerTest.java
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #547 +/- ##
============================================
+ Coverage 76.59% 76.64% +0.05%
- Complexity 1398 1414 +16
============================================
Files 155 155
Lines 5985 6033 +48
Branches 538 543 +5
============================================
+ Hits 4584 4624 +40
- Misses 1039 1044 +5
- Partials 362 365 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Signed-off-by: Greg Schohn <greg.schohn@gmail.com>
25dbe1a
to
22bee2a
Compare
context.setEndpoint(message.uri()); | ||
context.setHttpVersion(message.protocolVersion().toString()); | ||
return fillMap(map, message.headers(), message.content()); | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit for the future (this is still an improvement as is): I'd like to surface something other than NullPointerException when message is null
var reqCtx = rootContext.getTestConnectionRequestContext(1); | ||
var nphc = new NettyPacketToHttpConsumer(clientConnectionPool | ||
.buildConnectionReplaySession(reqCtx.getChannelKeyContext()), reqCtx); | ||
//nphc.consumeBytes("\r\n{\"\": \"\"}\r\n".getBytes(StandardCharsets.UTF_8)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove comment
Signed-off-by: Greg Schohn <greg.schohn@gmail.com>
Description
Put some protections around time windows. The default lookahead timeout is now 300 seconds. That value must be > the packet timeout value. If the lookahead was < the packet timeout, we could be waiting for streams to be expired but never get to the end of those streams, holding work remaining to be completed, resulting in a deadlock. Warnings are printed out to the user and exit is called if configured incorrectly.
Be more careful about when the trafficStreamLimiter should consider work to have been completed. Now that work is marked as done once the response is returned by the target. That may put more memory pressure on the system by processing many tuples, some of which may be outstatnding/buffered while more traffic is accumulated from the buffered stream. However, this allows work to continue which will also have a trickle-down effect to keep drawing messages from the stream so that connections will be closed or expired, preventing deadlock.
TrafficReplayer::waitForRemainingWork() now waits for all of the queued work to complete before checking the TrafficReplayer's maps of all remaining work. That gives the replayer's pending work a chance to gracefull finish after it was was backed up for concurrency/backpressure control to not swamp the targets. Otherwise, the limiter would be terminated harshly and a number of requests would never be run.
BacksideHttpWatcherHandler is a bit safer. Now it will run the callback in cases that the Http message wasn't received due to an error. The handler now extends SimpleChannelInboundHandler so that release will be called automatically after channelRead0 is called.
Make the progress that's normally printed to stdout also print to a file that only contains those lines (progress.log). Enhance those lines to print out a few more fields (the requests index, starting from 0 for the beginning of the process), the request and response sizes, and those values for both the source and target. Now source/target values are emitted next to each other, delimited by '/'. That makes for much less cognitive load when tailing the output.
Dates on log directories shouldn't include "{UTC}" any more. I'm not positive that they're right on edge conditions, but it's better than what was there before.
Squash an obscure pair of netty memory leaks when a header field wasn't present in the parsed HTTP message that would result in the message not being properly released.
Add the full requestId to a few more log messages. This required me to rotate some messages from one class to a calling class, but it should accomplish the same effect.
Tweak the behavior of the SimpleNettyHttpServer so that it can close connections when there was a failure.
Convert the NettyPacketToHttpConsumerTest to use the SimpleNettyHttpServer over the Sun backed one. That's to better support the new test
testThatPeerResetTriggersFinalizeFuture
. If doesn't send a malformed request in yet that will hang the server, but adjusting the body of the request to use 2 words rather than one word "badrequest" (yes, really, the HTTP state models for parsing can be really complex and specific).That same test, now that it's using the Netty server variant, no longer needs to worry about Date headers, which simplifies some of the logic... though it did require more careful handling to keep the headers in the correct order.
Allow sorting headers in HttpByteBufFormatter. Though nobody is using this today, in one incarnation of tests, it was being used & it seems that it could be a useful feature for other tests too.
Explicitly disable memory leak checks for KafkaRestartingTrafficReplayerTest. That was implicit before.
Category: Operational Code Improvements
Why these changes are required? To make it easier to diagnose bugs
What is the old behavior before changes and new behavior after changes? It should be a little bit easier to find out where issues are happening in the system. Some errors shouldn't be likely any more.
Issues Resolved
https://opensearch.atlassian.net/browse/MIGRATIONS-1643
Is this a backport? If so, please add backport PR # and/or commits #
Testing
[Please provide details of testing done: unit testing, integration testing and manual testing]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.