Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Docker Inspect throws exit code 125 due to not finding some layer IDs in final built image. #20189

Closed
5 of 7 tasks
webjoaoneto opened this issue Jul 22, 2024 · 22 comments
Closed
5 of 7 tasks
Labels
Area: Release Area:RM RM task team bug regression This used to work, but a change in the service/tasks broke it. triage

Comments

@webjoaoneto
Copy link

New issue checklist

Task name

Docker@2

Task version

2.243.0

Issue Description

When docker push after update version to 2.243.0 raises this error on Docker push pipeline

/usr/bin/*** inspect -f {{.RootFS.Layers}}
Error: no names or ids specified
##[error]Error: no names or ids specified
##[error]Unhandled: The process '/usr/bin/***' failed with exit code 125
##[error]Error: The process '/usr/bin/***' failed with exit code 125
    at ExecState._setResult (/opt/app-root/app/_work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1249:25)
    at ExecState.CheckComplete (/opt/app-root/app/_work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1232:18)
    at ChildProcess.<anonymous> (/opt/app-root/app/_work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1160:19)
    at ChildProcess.emit (node:events:513:28)
    at maybeClose (node:internal/child_process:1100:16)
    at Process.ChildProcess._handle.onexit (node:internal/child_process:304:5)

The task pushes the docker image to the right place, but pipeline crashes because the command
docker inspect -f inspect -f {{.RootFS.Layers}} is not passing the image name as an next argument.

Fix:
We back to the version 2.240.2

Environment type (Please select at least one enviroment where you face this issue)

  • Self-Hosted
  • Microsoft Hosted
  • VMSS Pool
  • Container

Azure DevOps Server type

Azure DevOps Server (Please specify exact version in the textbox below)

Azure DevOps Server Version (if applicable)

No response

Operation system

ubuntu

Relevant log output

createdAt:2024-07-07T23:32:36Z; layerSize:9.54MB; createdBy:RUN /bin/sh -c set -eux; 	apt-get update; 	apt-get install -y --no-install-recommends 		ca-certificates 		netbase 		tzdata 	; 	rm -rf /var/lib/apt/lists/* # buildkit; layerId:<missing>
createdAt:2024-07-07T23:32:36Z; layerSize:0B; createdBy:ENV LANG=C.UTF-8; layerId:<missing>
createdAt:2024-07-07T23:32:36Z; layerSize:0B; createdBy:ENV PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin; layerId:<missing>
createdAt:2024-07-02T01:25:02Z; layerSize:0B; createdBy:/bin/sh -c #(nop)  CMD ["bash"]; layerId:<missing>
createdAt:2024-07-02T01:25:02Z; layerSize:77.8MB; createdBy:/bin/sh -c #(nop) ADD file:b24689567a7c604de93e4ef1dc87c372514f692556744da43925c575b4f80df6 in / ; layerId:<missing>
/usr/bin/*** inspect -f {{.RootFS.Layers}}
Error: no names or ids specified
##[error]Error: no names or ids specified
##[error]Unhandled: The process '/usr/bin/***' failed with exit code 125
##[error]Error: The process '/usr/bin/***' failed with exit code 125
    at ExecState._setResult (/opt/app-root/app/_work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1249:25)
    at ExecState.CheckComplete (/opt/app-root/app/_work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1232:18)
    at ChildProcess.<anonymous> (/opt/app-root/app/_work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1160:19)
    at ChildProcess.emit (node:events:513:28)
    at maybeClose (node:internal/child_process:1100:16)
    at Process.ChildProcess._handle.onexit (node:internal/child_process:304:5)
Finishing: Docker Push

Full task logs with system.debug enabled

 [REPLACE THIS WITH YOUR INFORMATION] 

Repro steps

No response

@github-actions github-actions bot added Area: Release triage Area: ABTT Akvelon Build Tasks Team area of work Task: Bash labels Jul 22, 2024
@flinox-testes
Copy link

Exactly same problem here... the version 2.240.2 it works.

@dcs-adam
Copy link

Same issue here. Error was there on 2.240.2, but allowed the task to complete successfully. On 2.243.0, the image is pushed to the registry, but the pipeline fails.

@MarkKharitonov
Copy link
Contributor

Same issue here as well.
Does anyone know if it is possible to invoke the previous version of the task?

@MarkKharitonov
Copy link
Contributor

Fix:
We back to the version 2.240.2

@webjoaoneto - how did you go back to 2.240.2 ?

@chrislanzara
Copy link

chrislanzara commented Jul 24, 2024

+1 on this. It's now halted our build pipelines for our projects.

We were on version 2.240.2 up to this afternoon (around midday - 12pm - UK time on the 24th July), then we seem to have gone to 2.243.0 and the stage fails now, but the image is pushed to the ACR. No changes to our docker config, azure pipeline config etc, this seems to be handled by the Docker@2 stage alone.

          - task: Docker@2
            displayName: Push Container to ACR
            continueOnError: false
            inputs:
              command: push
              repository: $(imageName)
              tags: $(tag)
              containerRegistry: dockerRegistryServiceConnection

I did see something like this a while ago (few months back), a very similar issue on the push command, but by the time I went back to it the issue had resolved itself and the stage was successful.

==============================================================================
Task         : Docker
Description  : Build or push Docker images, login or logout, start or stop containers, or run a Docker command
Version      : 2.243.0
Author       : Microsoft Corporation
Help         : https://aka.ms/azpipes-docker-tsg

...

/usr/bin/docker inspect -f {{.RootFS.Layers}}
"docker inspect" requires at least 1 argument.
See 'docker inspect --help'.

Usage:  docker inspect [OPTIONS] NAME|ID [NAME|ID...]

Return low-level information on Docker objects
##[error]"docker inspect" requires at least 1 argument.
##[error]See 'docker inspect --help'.
##[error]Usage:  docker inspect [OPTIONS] NAME|ID [NAME|ID...]
##[error]Return low-level information on Docker objects
##[error]Unhandled: The process '/usr/bin/docker' failed with exit code 1
##[error]Error: The process '/usr/bin/docker' failed with exit code 1
    at ExecState._setResult (/home/vsts/work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1249:25)
    at ExecState.CheckComplete (/home/vsts/work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1232:18)
    at ChildProcess.<anonymous> (/home/vsts/work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1160:19)
    at ChildProcess.emit (node:events:513:28)
    at maybeClose (node:internal/child_process:1100:16)
    at Process.ChildProcess._handle.onexit (node:internal/child_process:304:5)
Finishing: Push Container to ACR

I think the issues around docker inspect are a bit misleading for this directly because looking at our runs from earlier today when 2.240.2 was used, we got the same errors about the docker inspect command but the stage was allowed to complete, whereas under 2.243.0 the same docker inspect errors appear but then the Unhandled message appears and this then causes the stage to fail.

This is from a run where the stage completed successfully:

==============================================================================
Task         : Docker
Description  : Build or push Docker images, login or logout, start or stop containers, or run a Docker command
Version      : 2.240.2
Author       : Microsoft Corporation
Help         : https://aka.ms/azpipes-docker-tsg

...


/usr/bin/docker inspect -f {{.RootFS.Layers}}
"docker inspect" requires at least 1 argument.
See 'docker inspect --help'.

Usage:  docker inspect [OPTIONS] NAME|ID [NAME|ID...]

Return low-level information on Docker objects
##[error]"docker inspect" requires at least 1 argument.
##[error]See 'docker inspect --help'.
##[error]Usage:  docker inspect [OPTIONS] NAME|ID [NAME|ID...]
##[error]Return low-level information on Docker objects
Finishing: Push Container to ACR


Oddly this is only failing for a pipeline building an Angular image. We run 2.243.0 when building a C# API and that runs fine (the underscores are my own to shorten the lines):

Starting: ACR Push
==============================================================================
Task         : Docker
Description  : Build or push Docker images, login or logout, start or stop containers, or run a Docker command
Version      : 2.243.0
Author       : Microsoft Corporation
Help         : https://aka.ms/azpipes-docker-tsg

...

createdAt:2024-07-23T05:24:***5Z; layerSize:74.8MB; createdBy:/bin/sh -c #(nop) ADD file:6c4730e7b**___ebfb56a602 in / ; layerId:<missing>
/usr/bin/docker inspect 02de***9c***4fb___7d6bef3acb5b7*** -f {{.RootFS.Layers}}
[sha256:e078***___4***e***f sha256:2ea3b5___***2f sha256:ad8af89334___07fc206***9fc sha256:855e5***907d3ec93___78a***324c sha256:58fa834ef___9657363ae sha256:698640980e___35b248c5e sha256:5e***ee***___5f***50***20359f sha256:a2980f6c44a___98722208da sha256:e45***75fc8___d44d0a8a7a6206 sha256:f04d0d2___dfe867df***]
Finishing: ACR Push


Anyone able to help or is there any way to force the stage to use the previous 2.240.2 version?

Thanks.

@MarkKharitonov
Copy link
Contributor

OK, found it.
It is actually straightforward to use the older version, just use Docker@2.240.2

@lucasrcorreia
Copy link

I had the same problem here and the solution was to force the previous minor version in the yaml, simply by changing the code from:

- task: Docker@2

to:

- task: Docker@2.240.2

@chrislanzara
Copy link

OK, found it.

It is actually straightforward to use the older version, just use Docker@2.240.2

This worked! Thank you so much. One to remember for the future too...

@v-schhabra v-schhabra added Area:RM RM task team and removed Area: ABTT Akvelon Build Tasks Team area of work Task: Bash labels Jul 26, 2024
@v-schhabra
Copy link
Contributor

v-schhabra commented Jul 26, 2024

Hi @lucasrcorreia @webjoaoneto @MarkKharitonov @chrislanzara
Could you please share the complete debug logs of the failed pipeline by enabling system.debug to true?

@YodaDaCoda
Copy link

@v-schhabra i'm encountering this same issue. I've attached a log from a build today that shows the error with System.Debug set to true per docs.

I've reverted to Docker@2.240.2 for now and I can confirm this allows the build to pass (though the error messages RE docker inspect are still present).

docker.log

@Bodewes
Copy link

Bodewes commented Jul 31, 2024

I've the same problem. Pipelines that ran fine a week ago are now broken.

Changing the docker@2 task from

- task: Docker@2

to

- task: Docker@2.240.2

fixed it for now.

With command 'buildAndPush' the images are still pushed but the task fails with an error:

[error]Error: The process '/usr/bin/docker' failed with exit code 125
at ExecState._setResult (/azp/_work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1249:25)
at ExecState.CheckComplete (/azp/_work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1232:18)
at ChildProcess.<anonymous> (/azp/_work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1160:19)
at ChildProcess.emit (node:events:513:28)
at maybeClose (node:internal/child_process:1100:16)
at Process.ChildProcess._handle.onexit (node:internal/child_process:304:5)

@chrislanzara
Copy link

Hi @v-schhabra

A debug log for the Push to ACR stage is attached.
I've had to remove the names from the log so you'll see REDACTED in place of our project/application name.

push to acr failing log.txt

Run using the "Enable system diagnostics" checkbox on a pipeline run.

HTH

@v-schhabra
Copy link
Contributor

Hi @chrislanzara
Thanks for sharing the logs. We are investigating on this issue and will try to fix it soon.

@philipp-durrer-jarowa
Copy link

Same setup here (Azure DevOps agents on k8s using KEDA scaled jobs with podman) and same issue appearing since a few hours.

@Dom-Heal
Copy link

We have the same issue with agents on VMSS with container jobs using docker

@v-schhabra
Copy link
Contributor

v-schhabra commented Aug 1, 2024

Hi @chrislanzara @Dom-Heal @philipp-durrer-jarowa @Bodewes
Could someone please let us know why are we using podman? What we are trying to do using podman?
And can we use docker instead of podman and check if still the error occurs?

@chrislanzara
Copy link

Hi @v-schhabra,

Frankly, I'm not sure we are, or aware that we are.

Our pipeline references the Docker@2 stage only. I've included the build and dev deployment stage from our yaml file so you can see what we actually reference:


pool:
  vmImage: ubuntu-latest

stages:
  - stage: Build
    displayName: Build Dev
    variables:
      - group: ui-REDACTED-app
    jobs:
      - job: BuildContainer
        displayName: Build Container
        steps:
          - task: npmAuthenticate@0
            displayName: NPM Authentication
            inputs:
              workingFile: .npmrc
          - task: Docker@2
            displayName: Build Container
            continueOnError: false
            inputs:
              command: build
              Dockerfile: Dockerfile
              buildContext: .
              tags: $(tag)
              repository: $(imageName)
              containerRegistry: dockerRegistryServiceConnection
              arguments: '--build-arg BASEHREF=/ui/REDACTED/ --build-arg ENVIRONMENT=dev --build-arg KENDO_UI_LICENSE="$(KENDO_UI_LICENSE)" --build-arg NODE_OPTIONS=--max_old_space_size=16384'
          # Replace the line below with Docker@2.240.2, Docker@2 fails
          - task: Docker@2 
            displayName: Push Container to ACR
            continueOnError: false
            inputs:
              command: push
              repository: $(imageName)
              tags: $(tag)
              containerRegistry: dockerRegistryServiceConnection
      - job: StoreManifests
        displayName: Store K8s Manifests
        steps:
          - publish: k8s
            artifact: k8s
  - stage: DeployDev
    displayName: Deploy Dev
    condition: and(succeeded(), eq(variables.isDev, true))
    dependsOn: Build
    jobs:
      - deployment: DeployApp
        displayName: Deploy
        environment: dev.ui-dev
        strategy:
          runOnce:
            deploy:
              steps:
                - task: KubernetesManifest@0
                  displayName: AKS Create Registry Secret
                  inputs:
                    action: createSecret
                    secretType: dockerRegistry
                    secretName: REDACTED
                    dockerRegistryEndpoint: dockerRegistryServiceConnection
                  
                - task: KubernetesManifest@0
                  displayName: Deploy
                  inputs:
                    action: deploy
                    manifests: $(Pipeline.Workspace)/k8s/dev-deployment.yml
                    imagePullSecrets: |
                      REDACTED
                    containers: |
                      REDACTED.azurecr.io/$(imageName):$(tag)
  

We've had Docker@2 in our pipeline for quite a while now, and use it on C# APIs as well as Angular UX projects. I see that using the Docker@2 stages is still on the Microsoft docs website, for example

Our deployment target is an AKS instance running Kubernetes version 1.29.4:
image

We would have referenced the Microsoft Docs or the classic pipeline builder UI in DevOps when we originally set the pipelines up several years ago.

So unless Azure is doing something "under the covers", I'm not consciously aware that we are using podman, if we actually are.

If there is another way of doing it you want us to explore, I'm happy to help test, but can you offer any more specific instructions on what alternative you wish us to test please?

Thanks!

@Dom-Heal
Copy link

Dom-Heal commented Aug 1, 2024

Hi @chrislanzara @Dom-Heal @philipp-durrer-jarowa @Bodewes Could someone please let us know why are we using podman? What we are trying to do using podman? And can we use docker instead of podman and check if still the error occurs?

Hi @v-schhabra - We are not using podman, we are only using docker and this problem exists. Our agents run on Azure VMSS and the jobs run within docker containers. The docker push step is running inside the "docker" container job.

https://learn.microsoft.com/en-us/azure/devops/pipelines/process/container-phases?view=azure-devops
https://learn.microsoft.com/en-us/azure/devops/pipelines/yaml-schema/jobs-job-container?view=azure-pipelines

Hope this helps

@v-schhabra
Copy link
Contributor

Hi, We have started our investigation on this issue.
Previously, the task-lib didn't catch promise rejections earlier but this was fixed in version 4.0.2.
The issue is there in the Docker task itself, which isn't working correctly when layerId is missing.
The task is expecting layerid to proceed but layerid is missing in the logs.
Docker's developers says that it's expected behavior, so we probably need to change task's logic to handle cases when it's missing.

@v-schhabra v-schhabra changed the title [BUG]: Docker@2 Version 2.243.0 breaks docker push when using a node pool on kubernetes using podman [BUG]: Docker Inspect throws exit code 125 due to not finding some layer IDs in final built image. Aug 6, 2024
@merlynomsft merlynomsft added the regression This used to work, but a change in the service/tasks broke it. label Aug 8, 2024
@sergeykrulikovskiy
Copy link

Hello,

Are there any updates on this issue?

@v-schhabra
Copy link
Contributor

#20397 Fixes are created for this issue and will update here once it is deployed to all the rings.

@v-schhabra
Copy link
Contributor

The above fix introduced a regression so we created new fix for this issue.
#20516

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Release Area:RM RM task team bug regression This used to work, but a change in the service/tasks broke it. triage
Projects
None yet
Development

No branches or pull requests