-
Notifications
You must be signed in to change notification settings - Fork 619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the task status and the container KnownStatus when ECS_PULL_DEPENDENT_CONTAINERS_UPFRONT is enabled #2800
Conversation
1a4083e
to
6f41413
Compare
6f41413
to
41c293f
Compare
agent/engine/docker_task_engine.go
Outdated
@@ -840,6 +840,12 @@ func (engine *DockerTaskEngine) pullContainer(task *apitask.Task, container *api | |||
|
|||
} | |||
|
|||
// Add the container to pulled container state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add update the comment to note this is for cached images?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is updated to
func (engine *DockerTaskEngine) pullContainer(task *apitask.Task, container *apicontainer.Container) dockerapi.DockerContainerMetadata {
...
if engine.imagePullRequired(engine.cfg.ImagePullBehavior, container, task.Arn) {...}
// No pull image is required, the cached image will be used.
// Add the container that uses the cached image to the pulled container state.
dockerContainer := &apicontainer.DockerContainer{
Container: container,
}
engine.state.AddPulledContainer(dockerContainer, task)
...
}
Thank you!
@@ -1722,6 +1722,43 @@ func TestPullAndUpdateContainerReference(t *testing.T) { | |||
assert.Equal(t, dockerapi.DockerContainerMetadata{}, metadata, "expected empty metadata") | |||
} | |||
|
|||
// TestPullAndUpdateContainerReferenceWithCachedImage checks whether a container is added to task engine state when | |||
// the container is an essential container, the image is cached and DependentContainersPullUpfront is enabled. | |||
func TestPullAndUpdateContainerReferenceWithCachedImage(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add a test case for the negative scenario? - image pull behavior is always
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The negative test case is added, named TestPullAndUpdateContainerReferenceFailedToGetImage
here
Thank you!
41c293f
to
5834062
Compare
agent/engine/docker_task_engine.go
Outdated
// Stop the task if the container is an essential container, | ||
// and the image is not available in both remote and local caches | ||
if container.IsEssential() { | ||
task.SetKnownStatus(apitaskstatus.TaskStopped) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd only change the desired status here as we don't know the task is stopped at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed task.SetKnownStatus(apitaskstatus.TaskStopped)
, and keep task.SetDesiredStatus(apitaskstatus.TaskStopped)
. Thank you!
} | ||
return dockerapi.DockerContainerMetadata{Error: metadata.Error} | ||
} | ||
seelog.Infof("Task engine [%s]: found cached image %s, use it directly for container %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if engine.client.InspectImage(container.Image)
doesn't return an error we can be sure the image exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I tracing back with the function call, I find function InspectImage
, defined in agent/dockerclient/dockerapi/docker_client.go, returns image data received from the client.ImageInspectWithRaw() call.
func (dg *dockerGoClient) InspectImage(image string) (*types.ImageInspect, error) {
defer metrics.MetricsEngineGlobal.RecordDockerMetric("INSPECT_IMAGE")()
client, err := dg.sdkDockerClient()
if err != nil {
return nil, err
}
imageData, _, err := client.ImageInspectWithRaw(dg.context, image)
return &imageData, err
}
The ImageInspectWithRaws
, defined in agent/vendor/github.com/docker/docker/client/image_inspect.go, returns low-level information about an image based on the given image name or ID. If the image cannot be found through docker engine api, 404 error response will be returned. Details can be found here.
// ImageInspectWithRaw returns the image information and its raw representation.
func (cli *Client) ImageInspectWithRaw(ctx context.Context, imageID string) (types.ImageInspect, []byte, error) {
if imageID == "" {
return types.ImageInspect{}, nil, objectNotFoundError{object: "image", id: imageID}
}
serverResp, err := cli.get(ctx, "/images/"+imageID+"/json", nil, nil)
if err != nil {
return types.ImageInspect{}, nil, wrapResponseError(err, serverResp, "image", imageID)
}
defer ensureReaderClosed(serverResp)
body, err := ioutil.ReadAll(serverResp.body)
if err != nil {
return types.ImageInspect{}, nil, err
}
var response types.ImageInspect
rdr := bytes.NewReader(body)
err = json.NewDecoder(rdr).Decode(&response)
return response, body, err
}
We have implemented this to verify whether the image exists or not in func imagePullRequired when ImagePullBehavior
is prefer-cached
.
func (engine *DockerTaskEngine) imagePullRequired(imagePullBehavior config.ImagePullBehaviorType,
container *apicontainer.Container,
taskArn string) bool {
switch imagePullBehavior {
case config.ImagePullOnceBehavior:
...
case config.ImagePullPreferCachedBehavior:
// If the behavior is prefer cached, don't pull if we found cached image
// by inspecting the image.
_, err := engine.client.InspectImage(container.Image)
if err != nil {
return true
}
seelog.Infof("Task engine [%s]: found cached image %s, use it directly for container %s",
taskArn, container.Image, container.Name)
return false
default:
// Need to pull the image for always and default agent pull behavior
return true
}
}
Based on these info, I think it is ok to verify cached images through engine.client.InspectImage(container.Image)
. Please let me know if anything is still missing here. Thank you!
|
||
// TestPullAndUpdateContainerReferenceFailedToGetImage checks whether a container is added to task engine state when | ||
// the container is an essential container, ImagePullBehavior is set to always and DependentContainersPullUpfront is enabled. | ||
func TestPullAndUpdateContainerReferenceFailedToGetImage(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably use a test table as the code seems largely duplicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to the table driven test here. Thank you!
5834062
to
3068aec
Compare
@@ -1685,41 +1686,119 @@ func TestUpdateContainerReference(t *testing.T) { | |||
} | |||
|
|||
// TestPullAndUpdateContainerReference checks whether a container is added to task engine state when | |||
// pullSucceeded and DependentContainersPullUpfront is enabled. | |||
// Test # | Image availability | DependentContainersPullUpfront | ImagePullBehavior |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Although I think we removed the UTs for when DependentContainersPullUpfront is NOT enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DependentContainersPullUpfrontDisabledWithRemoteImage
is added to TestPullAndUpdateContainerReference. Thank you!
…ontainers and ECS_PULL_DEPENDENT_CONTAINERS_UPFRONT=true. The task will be stopped if no cached image can be found locally.
3068aec
to
a7f975f
Compare
Summary
This PR fixes the pending task status when
ECS_PULL_DEPENDENT_CONTAINERS_UPFRONT
is enabled, but an essential container encounteredCannotPullContainerError
while pulling images remotely.By configuring the Agent environment variable
ECS_PULL_DEPENDENT_CONTAINERS_UPFRONT=true
, Agent will start to pull images for containers with dependencies before the dependsOn condition has been satisfied; however, if any reason prevents Agent to pull images, these containers will neither transit to KnownStatus: PULLED, nor reach to KnownStatus: CREATED/STOPPED, making tasks remain in PENDING state.An example task definition and a workflow are provided as follows.
In this PR, Agent will search the image from local cached images when Agent failed to pull the image from remote, and
ECS_IMAGE_PULL_BEHAVIOR
is not specified toalways
. Once the image is found locally, Agent will add the container to the pulled container state; otherwise, set the desired status of task toSTOPPED
in order to stop the task. The expected task stopped reason, the container status reason and the last task status are shown as follows.essential
container -> ECS_IMAGE_PULL_BEHAVIOR=default -> failed to pull the image from remote -> Task stopped reason on ECS control plane:Task failed to start
-> container B's status reason on ECS control plane:CannotPullContainerError
: Error response from daemon: pull access denied for customer, repository does not exist or may require 'docker login': denied: requested access to the resource is denied -> The task last status:STOPPED
Implementation details
Add logic to search local cached images in
agent/engine/docker_task_engine.go
whenECS_PULL_DEPENDENT_CONTAINERS_UPFRONT=true
ECS_IMAGE_PULL_BEHAVIOR
is not `alwaysSet the desire status of the task to STOPPED when
findCachedImage
to true -> Add the container to the pulled container stateDesiredStatus
of the task toTaskStopped
-> emit task event with an error -> return container metadata with errorTesting
Manual testing
Case 1: When the image for container B is not available in both remote and local cached images
Result:
/task
endpoint returns task metadata with the container A only.Task failed to start
CannotPullContainerError: Error response from daemon: pull access denied for customer, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
STOPPED
curl ${ECS_CONTAINER_METADATA_URI_V4}/task
Case 2: When the image for container B is only available in local cached images
Result:
/task
endpoint returns task metadata with two containers, and the container B hasKnownStatus: PULLED
.curl ${ECS_CONTAINER_METADATA_URI_V4}/task
Case 3: When the image for container B is available in both remote and local cached images, and
ECS_IMAGE_PULL_BEHAVIOR=once/prefer-cached
Result:
/task
endpoint returns task metadata with two containers, and the container B hasKnownStatus: PULLED
.Case 4: When the image for container B is not available in remote, and
ECS_IMAGE_PULL_BEHAVIOR=always
Result:
/task
endpoint returns task metadata with the container A only.Task failed to start
CannotPullContainerError: Error response from daemon: pull access denied for customer, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
STOPPED
More test cases and task status when
ECS_PULL_DEPENDENT_CONTAINERS_UPFRONT
is enabledNew tests cover the changes: yes
Update the unit test
TestPullAndUpdateContainerReference
with more scenarios.Note: Agent behavior remains the same for non-essential containers when
ECS_PULL_DEPENDENT_CONTAINERS_UPFRONT
is enabled.Description for the changelog
Bug - Fixed a task status deadlock and pulled container state for cached images when ECS_PULL_DEPENDENT_CONTAINERS_UPFRONT is enabled
Licensing
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.