Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assorted bugfixes and improvements #547

Merged
merged 4 commits into from
Apr 3, 2024

Conversation

gregschohn
Copy link
Collaborator

@gregschohn gregschohn commented Apr 2, 2024

Description

  • Put some protections around time windows. The default lookahead timeout is now 300 seconds. That value must be > the packet timeout value. If the lookahead was < the packet timeout, we could be waiting for streams to be expired but never get to the end of those streams, holding work remaining to be completed, resulting in a deadlock. Warnings are printed out to the user and exit is called if configured incorrectly.

  • Be more careful about when the trafficStreamLimiter should consider work to have been completed. Now that work is marked as done once the response is returned by the target. That may put more memory pressure on the system by processing many tuples, some of which may be outstatnding/buffered while more traffic is accumulated from the buffered stream. However, this allows work to continue which will also have a trickle-down effect to keep drawing messages from the stream so that connections will be closed or expired, preventing deadlock.

  • TrafficReplayer::waitForRemainingWork() now waits for all of the queued work to complete before checking the TrafficReplayer's maps of all remaining work. That gives the replayer's pending work a chance to gracefull finish after it was was backed up for concurrency/backpressure control to not swamp the targets. Otherwise, the limiter would be terminated harshly and a number of requests would never be run.

  • BacksideHttpWatcherHandler is a bit safer. Now it will run the callback in cases that the Http message wasn't received due to an error. The handler now extends SimpleChannelInboundHandler so that release will be called automatically after channelRead0 is called.

  • Make the progress that's normally printed to stdout also print to a file that only contains those lines (progress.log). Enhance those lines to print out a few more fields (the requests index, starting from 0 for the beginning of the process), the request and response sizes, and those values for both the source and target. Now source/target values are emitted next to each other, delimited by '/'. That makes for much less cognitive load when tailing the output.

  • Dates on log directories shouldn't include "{UTC}" any more. I'm not positive that they're right on edge conditions, but it's better than what was there before.

  • Squash an obscure pair of netty memory leaks when a header field wasn't present in the parsed HTTP message that would result in the message not being properly released.

  • Add the full requestId to a few more log messages. This required me to rotate some messages from one class to a calling class, but it should accomplish the same effect.

  • Tweak the behavior of the SimpleNettyHttpServer so that it can close connections when there was a failure.

  • Convert the NettyPacketToHttpConsumerTest to use the SimpleNettyHttpServer over the Sun backed one. That's to better support the new test testThatPeerResetTriggersFinalizeFuture. If doesn't send a malformed request in yet that will hang the server, but adjusting the body of the request to use 2 words rather than one word "badrequest" (yes, really, the HTTP state models for parsing can be really complex and specific).

  • That same test, now that it's using the Netty server variant, no longer needs to worry about Date headers, which simplifies some of the logic... though it did require more careful handling to keep the headers in the correct order.

  • Allow sorting headers in HttpByteBufFormatter. Though nobody is using this today, in one incarnation of tests, it was being used & it seems that it could be a useful feature for other tests too.

  • Explicitly disable memory leak checks for KafkaRestartingTrafficReplayerTest. That was implicit before.

  • Category: Operational Code Improvements

  • Why these changes are required? To make it easier to diagnose bugs

  • What is the old behavior before changes and new behavior after changes? It should be a little bit easier to find out where issues are happening in the system. Some errors shouldn't be likely any more.

Issues Resolved

https://opensearch.atlassian.net/browse/MIGRATIONS-1643

Is this a backport? If so, please add backport PR # and/or commits #

Testing

[Please provide details of testing done: unit testing, integration testing and manual testing]

Check List

  • New functionality includes testing
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Greg Schohn <greg.schohn@gmail.com>
# Conflicts:
#	TrafficCapture/trafficReplayer/src/main/java/org/opensearch/migrations/replay/netty/BacksideHttpWatcherHandler.java
#	TrafficCapture/trafficReplayer/src/test/java/org/opensearch/migrations/replay/datahandlers/NettyPacketToHttpConsumerTest.java
Copy link

codecov bot commented Apr 3, 2024

Codecov Report

Attention: Patch coverage is 80.95238% with 20 lines in your changes are missing coverage. Please review.

Project coverage is 76.64%. Comparing base (5d05384) to head (5865ca3).
Report is 2 commits behind head on main.

Files Patch % Lines
.../opensearch/migrations/replay/TrafficReplayer.java 45.00% 9 Missing and 2 partials ⚠️
...ns/replay/traffic/source/TrafficStreamLimiter.java 62.50% 1 Missing and 2 partials ⚠️
...search/migrations/replay/HttpByteBufFormatter.java 83.33% 2 Missing ⚠️
...h/migrations/replay/ParsedHttpMessagesAsDicts.java 87.50% 1 Missing and 1 partial ⚠️
...replay/datahandlers/NettyPacketToHttpConsumer.java 90.90% 1 Missing ⚠️
...tions/replay/netty/BacksideHttpWatcherHandler.java 90.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #547      +/-   ##
============================================
+ Coverage     76.59%   76.64%   +0.05%     
- Complexity     1398     1414      +16     
============================================
  Files           155      155              
  Lines          5985     6033      +48     
  Branches        538      543       +5     
============================================
+ Hits           4584     4624      +40     
- Misses         1039     1044       +5     
- Partials        362      365       +3     
Flag Coverage Δ
unittests 76.64% <80.95%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@gregschohn gregschohn changed the base branch from main to api_poc April 3, 2024 01:36
@gregschohn gregschohn changed the base branch from api_poc to main April 3, 2024 01:36
@gregschohn gregschohn changed the title Fix hanging requests Improve the ability to... Apr 3, 2024
Signed-off-by: Greg Schohn <greg.schohn@gmail.com>
@gregschohn gregschohn force-pushed the FixHangingRequests branch from 25dbe1a to 22bee2a Compare April 3, 2024 03:58
@gregschohn gregschohn marked this pull request as ready for review April 3, 2024 04:00
@gregschohn gregschohn changed the title Improve the ability to... assorted bugfixes and improvements Apr 3, 2024
@gregschohn gregschohn changed the title assorted bugfixes and improvements Assorted bugfixes and improvements Apr 3, 2024
context.setEndpoint(message.uri());
context.setHttpVersion(message.protocolVersion().toString());
return fillMap(map, message.headers(), message.content());
try {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit for the future (this is still an improvement as is): I'd like to surface something other than NullPointerException when message is null

var reqCtx = rootContext.getTestConnectionRequestContext(1);
var nphc = new NettyPacketToHttpConsumer(clientConnectionPool
.buildConnectionReplaySession(reqCtx.getChannelKeyContext()), reqCtx);
//nphc.consumeBytes("\r\n{\"\": \"\"}\r\n".getBytes(StandardCharsets.UTF_8));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove comment

Signed-off-by: Greg Schohn <greg.schohn@gmail.com>
@gregschohn gregschohn merged commit d264a8c into opensearch-project:main Apr 3, 2024
7 checks passed
@gregschohn gregschohn deleted the FixHangingRequests branch April 11, 2024 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants