-
Notifications
You must be signed in to change notification settings - Fork 619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some containers are not stopped during service update #1393
Comments
@gileri I was trying to reproduce this issue with a service that has desired count Thanks, |
Sorry, I wrongly formatted the code blocks ; they are now fixed and include :
I removed sensitive informations from the ecs-agent logs downloadable here. The service name is |
Hi @gileri, Sorry for my late response. I investigated on the logs and found out the root cause: the task is started and then there is a stop immediately. Agent sets the container to be stopped, and then docker sends a docker change event to Agent indicating the container is running, Agent is supposed to stop the container again in this case, and this is handle by this go routine. However, it only handles once due to some reasons. I will mark it as a bug. Thanks, |
Thank you @haikuoliu for the analysis ! I'm not familiar with go or ECS code but I sure can provide additional debug logs or tests. |
I think the logs that you provided are enough, the bug seems clear there and we will let you know when we fix it. I saw from logs that the containers in your task gets stopped too quick, this will cause the bug that container cannot be stopped. Try to avoid this situation will be a mitigation. Thanks for bringing this to our attention! |
closing issue, fix is included with latest release. |
Summary
Some containers that should be stopped (and are seen by ECS as stopped) when doing a service update stay up.
Description
We noticed that certain containers are not stopped during regular ECS deployments (new task definitions containing image changes).
To narrow down what could fail, the problem service update using :
ecs update-service --cluster <cluster> --service <service> --force-new-deployment
This service shouldn't allow concurrent containers anyway :
Expected Behavior
Containers part of a stopped task should be always be stopped.
Observed Behavior
Certain containers are sometimes not stopped (in around 1 in 5 services updates) and survive future service updates.
They are seen as stopped by ECS :
I've gone through ECS and docker logs and did not manage to pinpoint the source of this issue.
Environment Details
AMI : Amazon ECS-Optimized Amazon Linux AMI 2018.03.l
ECS agent version 1.17.3
Cluster of one EC2 instance, but also occuring on multiple-instances clusters.
The ECS instance has been rebooted, and
docker system purge --all
has been run minutes before the described occurence.Docker info :
Supporting Log Snippets
I've collected all logs using https://github.com/awslabs/ecs-logs-collector, but it proves time consuming to anonymize those logs. Please don't hesitate to ask for more details or logs.
There should be only
df56811ff7a
running (docker info
) :ECS data for the task containing container 257e5551a323 :
Container info for 257e5551a323:
The text was updated successfully, but these errors were encountered: