-
-
Notifications
You must be signed in to change notification settings - Fork 511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added error handling for context.Canceled in log reading code #2268
Added error handling for context.Canceled in log reading code #2268
Conversation
✅ Deploy Preview for testcontainers-go ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Hi @prateekdwivedi thanks for submitting this fix. I understand what you described so I think this PR could fix it. Could you please add a test reproducing what you explained in the |
Thank you so much for your review, @mdelapenya. I will add a test for this change and update you once it is done. |
Hey @mdelapenya, I followed your suggestion and added a test to check if this change works. Would you mind taking a look at this PR when you have a moment? Thank you! |
@mdelapenya : Sorry, I forgot to include the necessary import statements for the test case. This likely caused the test to fail during the build process. I have just pushed it a few minutes ago, and I believe it should work correctly this time. |
Could you run |
Sure, I ran |
@mdelapenya : I have added additional steps to the test to terminate the containers after the test is complete, addressing the issue raised in the |
Hey @prateekdwivedi I'm taking a deep look into it, as that pipeline is more or less failing in a consistent manner, although not always. I'll ping you once resolved |
Sure @mdelapenya , thank you so much for this update. |
Hey @mdelapenya, Would you happen to have any updates on the pipeline issue, by any chance? |
@prateekdwivedi I'm seeing errors that seem legit: e.g. |
@mdelapenya : Thanks for pointing that out! Based on the logs, it appears the lower timeout value for the test case was causing the failure. I've made the changes and hope it works now in the build process. |
@mdelapenya : I am not sure if the increase in timeout changes were reflected in the build process. When I committed the changes, the build process for Merge branch 'main' into context-canceled-logs had already started. Is it possible to restart the build process? I believe the two failures in the latest build process were caused by timeouts only. |
logconsumer_test.go
Outdated
|
||
// Context with cancellation functionality for handling interrupt (SIGINT) and | ||
// termination (SIGTERM) signals. | ||
ctx, cancel := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you want to trap the signals here ? Could it be enough to have a cancelleable context with ctx, cancel := context.WithCancel(...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JordanP :
Thank you for reviewing this PR. The reason for trapping the signal over there was to simulate an interrupt signal Ctrl+C
, as that was the main motivation behind creating this PR. The objective was to prevent system hangs when multiple containers with log consumers receive an interrupt signal. As, with the latest version ( v0.28.0 ), the system hangs if someone presses Ctrl + C
to stop multiple containers with log consumers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Could it be enough to have a cancelleable context with ctx, cancel := context.WithCancel(...) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JordanP : Thanks for your suggestion. Yes, I tried using context.WithCancel(...)
instead of signal.NotifyContext(...)
and it seems to have a similar effect in the test case. I have made the changes.
@prateekdwivedi thanks for addressing @JordanP 's changes, and thank you @JordanP for it 🙇 There are errors in the main pipeline that would need to be addressed: https://github.com/testcontainers/testcontainers-go/actions/runs/8184467740/job/22381550480?pr=2268. It seems the test is causing a timeout in the workflow, which is configured to be aborted in 30 min. Could you take a look? |
@mdelapenya : Sure, I have made some minor changes to the test case to ensure consistent results. |
@mdelapenya : Can you please re-run the build process now? This way, if there are still any issues, I can come up with a better solution over the weekend. |
Thanks @mdelapenya for running the build process. It seems the entire build process succeeded this time, except for one step. I am not sure if it was caused by the test case in this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for taking the time investigating this @prateekdwivedi. Great job! 💪
* main: (239 commits) Move the container and config tests into a test package (testcontainers#2242) Added error handling for context.Canceled in log reading code (testcontainers#2268) chore: updated docker compose version (testcontainers#2340) Add method for getting Weaviate's gRPC port (testcontainers#2339) chore: use withEnv in localstack module (testcontainers#2337) docs: fix wrong copy&paste (testcontainers#2338) fix: consul race on HTTP port (testcontainers#2336) chore(deps): bump mkdocs-material from 8.2.7 to 9.5.13 (testcontainers#2334) feat: add openfga module (testcontainers#2332) chore: retire dependabot (testcontainers#2325) chore: check that the new version is not empty (testcontainers#2331) chore: prepare for next minor development cycle (0.30.0) chore: use new version (v0.29.1) in modules and examples fix: incorrect version chore: prepare for next minor development cycle () chore: use new version (v0.29.0) in modules and examples generic.go: GenericContainer(): clearer error message (testcontainers#2327) chore: confirm support for new mongo images (testcontainers#2326) Add k3s WithManifest option (testcontainers#1920) chore(deps): bump google.golang.org/grpc in /modules/qdrant (testcontainers#2281) ...
What does this PR do?
This PR enhances the error handling in the log production loop of the existing code by adding support for context.Canceled error. The context.Canceled error is commonly used to signal the stoppage of containers. By incorporating the
errors.Is(err, context.Canceled)
check, the log production loop can now properly handle the context.Canceled error and continue execution as intended when the containers are stopped. This improvement ensures that the log processing is gracefully handled during container termination scenarios, enhancing the overall reliability and stability of the application.Why is it important?
When the Ctrl + C key combination is pressed, it leads to system hang and the display of the following error in the terminal:
Container log error: context canceled. Stopping log consumer: Headers out of sync
This error signifies that the log consumer within the container is encountering synchronization issues and is unable to effectively handle the cancellation signal. It occurs specifically when running a considerable number of containers simultaneously (count > 10). In our case, we utilize a sandbox environment where we execute this volume of containers and terminate them using Ctrl + C. Addressing this matter by implementing a minor modification will rectify the aforementioned issue.
Related issues
I couldn't find a specific issue related to this use case. If it is required, I can create an issue for it.
How to test this PR
This functionality can be tested by initiating multiple containers with a non-empty logConsumerCfg parameter (within testcontainers.ContainerRequest). Subsequently, these running containers can be halted by using the keyboard command Ctrl + C (context canceled). The anticipated behavior in this scenario is for all the containers to gracefully shutdown.