Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added error handling for context.Canceled in log reading code #2268

Merged
merged 18 commits into from
Mar 12, 2024

Conversation

prateekdwivedi
Copy link
Contributor

@prateekdwivedi prateekdwivedi commented Feb 26, 2024

What does this PR do?

This PR enhances the error handling in the log production loop of the existing code by adding support for context.Canceled error. The context.Canceled error is commonly used to signal the stoppage of containers. By incorporating the errors.Is(err, context.Canceled) check, the log production loop can now properly handle the context.Canceled error and continue execution as intended when the containers are stopped. This improvement ensures that the log processing is gracefully handled during container termination scenarios, enhancing the overall reliability and stability of the application.

Why is it important?

When the Ctrl + C key combination is pressed, it leads to system hang and the display of the following error in the terminal:

Container log error: context canceled. Stopping log consumer: Headers out of sync

This error signifies that the log consumer within the container is encountering synchronization issues and is unable to effectively handle the cancellation signal. It occurs specifically when running a considerable number of containers simultaneously (count > 10). In our case, we utilize a sandbox environment where we execute this volume of containers and terminate them using Ctrl + C. Addressing this matter by implementing a minor modification will rectify the aforementioned issue.

Related issues

I couldn't find a specific issue related to this use case. If it is required, I can create an issue for it.

How to test this PR

This functionality can be tested by initiating multiple containers with a non-empty logConsumerCfg parameter (within testcontainers.ContainerRequest). Subsequently, these running containers can be halted by using the keyboard command Ctrl + C (context canceled). The anticipated behavior in this scenario is for all the containers to gracefully shutdown.

Copy link

netlify bot commented Feb 26, 2024

Deploy Preview for testcontainers-go ready!

Name Link
🔨 Latest commit 59e3b3a
🔍 Latest deploy log https://app.netlify.com/sites/testcontainers-go/deploys/65eb4e8cb2bc360008de46f7
😎 Deploy Preview https://deploy-preview-2268--testcontainers-go.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@prateekdwivedi prateekdwivedi marked this pull request as ready for review February 26, 2024 19:51
@prateekdwivedi prateekdwivedi requested a review from a team as a code owner February 26, 2024 19:51
@mdelapenya
Copy link
Member

Hi @prateekdwivedi thanks for submitting this fix. I understand what you described so I think this PR could fix it. Could you please add a test reproducing what you explained in the how to test section, simulating the Ctrl+C hit with a context cancellation?

@prateekdwivedi
Copy link
Contributor Author

Hi @prateekdwivedi thanks for submitting this fix. I understand what you described so I think this PR could fix it. Could you please add a test reproducing what you explained in the how to test section, simulating the Ctrl+C hit with a context cancellation?

Thank you so much for your review, @mdelapenya. I will add a test for this change and update you once it is done.

@prateekdwivedi
Copy link
Contributor Author

Hi @prateekdwivedi thanks for submitting this fix. I understand what you described so I think this PR could fix it. Could you please add a test reproducing what you explained in the how to test section, simulating the Ctrl+C hit with a context cancellation?

Hey @mdelapenya, I followed your suggestion and added a test to check if this change works. Would you mind taking a look at this PR when you have a moment? Thank you!

@mdelapenya mdelapenya self-assigned this Feb 29, 2024
@mdelapenya mdelapenya added the bug An issue with the library label Feb 29, 2024
@prateekdwivedi
Copy link
Contributor Author

@mdelapenya : Sorry, I forgot to include the necessary import statements for the test case. This likely caused the test to fail during the build process. I have just pushed it a few minutes ago, and I believe it should work correctly this time.

@mdelapenya
Copy link
Member

Could you run make lint in the root dir? golangci-lint is complaining 🙏

@prateekdwivedi
Copy link
Contributor Author

Could you run make lint in the root dir? golangci-lint is complaining 🙏

Sure, I ran make lint in the root directory, which resulted in the deletion of an end line in the logconsumer_test file.

@prateekdwivedi
Copy link
Contributor Author

prateekdwivedi commented Mar 1, 2024

@mdelapenya : I have added additional steps to the test to terminate the containers after the test is complete, addressing the issue raised in the Test with reaper off step.

Reference: https://github.com/testcontainers/testcontainers-go/actions/runs/8102899760/job/22160066128?pr=2268

@mdelapenya
Copy link
Member

Hey @prateekdwivedi I'm taking a deep look into it, as that pipeline is more or less failing in a consistent manner, although not always. I'll ping you once resolved

@prateekdwivedi
Copy link
Contributor Author

Hey @prateekdwivedi I'm taking a deep look into it, as that pipeline is more or less failing in a consistent manner, although not always. I'll ping you once resolved

Sure @mdelapenya , thank you so much for this update.

@prateekdwivedi
Copy link
Contributor Author

Hey @prateekdwivedi I'm taking a deep look into it, as that pipeline is more or less failing in a consistent manner, although not always. I'll ping you once resolved

Sure @mdelapenya , thank you so much for this update.

Hey @mdelapenya, Would you happen to have any updates on the pipeline issue, by any chance?

@mdelapenya
Copy link
Member

@prateekdwivedi I'm seeing errors that seem legit: e.g. Test_MultiContainerLogConsumer_CancelledContext. Could you please take a look at https://github.com/testcontainers/testcontainers-go/actions/runs/8139342485/job/22242284212?pr=2268?

@prateekdwivedi
Copy link
Contributor Author

@prateekdwivedi I'm seeing errors that seem legit: e.g. Test_MultiContainerLogConsumer_CancelledContext. Could you please take a look at https://github.com/testcontainers/testcontainers-go/actions/runs/8139342485/job/22242284212?pr=2268?

@mdelapenya : Thanks for pointing that out! Based on the logs, it appears the lower timeout value for the test case was causing the failure. I've made the changes and hope it works now in the build process.

@prateekdwivedi
Copy link
Contributor Author

@prateekdwivedi I'm seeing errors that seem legit: e.g. Test_MultiContainerLogConsumer_CancelledContext. Could you please take a look at https://github.com/testcontainers/testcontainers-go/actions/runs/8139342485/job/22242284212?pr=2268?

@mdelapenya : I am not sure if the increase in timeout changes were reflected in the build process. When I committed the changes, the build process for Merge branch 'main' into context-canceled-logs had already started. Is it possible to restart the build process? I believe the two failures in the latest build process were caused by timeouts only.


// Context with cancellation functionality for handling interrupt (SIGINT) and
// termination (SIGTERM) signals.
ctx, cancel := signal.NotifyContext(context.Background(), os.Interrupt, syscall.SIGTERM)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want to trap the signals here ? Could it be enough to have a cancelleable context with ctx, cancel := context.WithCancel(...) ?

Copy link
Contributor Author

@prateekdwivedi prateekdwivedi Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JordanP :
Thank you for reviewing this PR. The reason for trapping the signal over there was to simulate an interrupt signal Ctrl+C, as that was the main motivation behind creating this PR. The objective was to prevent system hangs when multiple containers with log consumers receive an interrupt signal. As, with the latest version ( v0.28.0 ), the system hangs if someone presses Ctrl + C to stop multiple containers with log consumers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Could it be enough to have a cancelleable context with ctx, cancel := context.WithCancel(...) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JordanP : Thanks for your suggestion. Yes, I tried using context.WithCancel(...) instead of signal.NotifyContext(...) and it seems to have a similar effect in the test case. I have made the changes.

@prateekdwivedi prateekdwivedi requested a review from JordanP March 7, 2024 07:49
@mdelapenya
Copy link
Member

@prateekdwivedi thanks for addressing @JordanP 's changes, and thank you @JordanP for it 🙇

There are errors in the main pipeline that would need to be addressed: https://github.com/testcontainers/testcontainers-go/actions/runs/8184467740/job/22381550480?pr=2268. It seems the test is causing a timeout in the workflow, which is configured to be aborted in 30 min. Could you take a look?

@prateekdwivedi
Copy link
Contributor Author

@prateekdwivedi thanks for addressing @JordanP 's changes, and thank you @JordanP for it 🙇

There are errors in the main pipeline that would need to be addressed: https://github.com/testcontainers/testcontainers-go/actions/runs/8184467740/job/22381550480?pr=2268. It seems the test is causing a timeout in the workflow, which is configured to be aborted in 30 min. Could you take a look?

@mdelapenya : Sure, I have made some minor changes to the test case to ensure consistent results.

@prateekdwivedi
Copy link
Contributor Author

@prateekdwivedi thanks for addressing @JordanP 's changes, and thank you @JordanP for it 🙇

There are errors in the main pipeline that would need to be addressed: https://github.com/testcontainers/testcontainers-go/actions/runs/8184467740/job/22381550480?pr=2268. It seems the test is causing a timeout in the workflow, which is configured to be aborted in 30 min. Could you take a look?

@mdelapenya : Can you please re-run the build process now? This way, if there are still any issues, I can come up with a better solution over the weekend.

@prateekdwivedi
Copy link
Contributor Author

@prateekdwivedi thanks for addressing @JordanP 's changes, and thank you @JordanP for it 🙇
There are errors in the main pipeline that would need to be addressed: https://github.com/testcontainers/testcontainers-go/actions/runs/8184467740/job/22381550480?pr=2268. It seems the test is causing a timeout in the workflow, which is configured to be aborted in 30 min. Could you take a look?

@mdelapenya : Can you please re-run the build process now? This way, if there are still any issues, I can come up with a better solution over the weekend.

Thanks @mdelapenya for running the build process. It seems the entire build process succeeded this time, except for one step. I am not sure if it was caused by the test case in this PR.

Copy link
Member

@mdelapenya mdelapenya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for taking the time investigating this @prateekdwivedi. Great job! 💪

@mdelapenya mdelapenya merged commit 6c41653 into testcontainers:main Mar 12, 2024
98 checks passed
mdelapenya added a commit to JJCinAZ/testcontainers-go that referenced this pull request Mar 13, 2024
* main: (239 commits)
  Move the container and config tests into a test package (testcontainers#2242)
  Added error handling for context.Canceled in log reading code (testcontainers#2268)
  chore: updated docker compose version (testcontainers#2340)
  Add method for getting Weaviate's gRPC port (testcontainers#2339)
  chore: use withEnv in localstack module (testcontainers#2337)
  docs: fix wrong copy&paste (testcontainers#2338)
  fix: consul race on HTTP port (testcontainers#2336)
  chore(deps): bump mkdocs-material from 8.2.7 to 9.5.13 (testcontainers#2334)
  feat: add openfga module (testcontainers#2332)
  chore: retire dependabot (testcontainers#2325)
  chore: check that the new version is not empty (testcontainers#2331)
  chore: prepare for next minor development cycle (0.30.0)
  chore: use new version (v0.29.1) in modules and examples
  fix: incorrect version
  chore: prepare for next minor development cycle ()
  chore: use new version (v0.29.0) in modules and examples
  generic.go: GenericContainer(): clearer error message (testcontainers#2327)
  chore: confirm support for new mongo images (testcontainers#2326)
  Add k3s WithManifest option (testcontainers#1920)
  chore(deps): bump google.golang.org/grpc in /modules/qdrant (testcontainers#2281)
  ...
@prateekdwivedi prateekdwivedi deleted the context-canceled-logs branch March 19, 2024 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An issue with the library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants