Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REST API is failing with errors when listing containers after being in an inconsistent state #15526

Closed
benoitf opened this issue Aug 29, 2022 · 14 comments · Fixed by #15757
Closed
Assignees
Labels
HTTP API Bug is in RESTful API kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. podman-desktop pods

Comments

@benoitf
Copy link
Contributor

benoitf commented Aug 29, 2022

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

After starting/stopping/deleting containers, now I'm in an inconsistent state
When listing containers, I've the error error getting container from store

curl --unix-socket /Users/benoitf/.local/share/containers/podman/machine/podman-machine-default/podman.sock "http:/v1.41/containers/json?all=true"
{"cause":"container not known","message":"error getting container from store \"82b69ee0d0c46770aa7843332fc40a6e109d8a734bf5471411a12bb1efdfd2f1\": container not known","response":500}

Steps to reproduce the issue:

I don't know how to reproduce but it was just by doing start, stop and delete on containers and pods.

Note: Using a UI, I'm sending multiple events at the same time, so it means, start/stop/delete actions are occurring concurrently

Describe the results you received:
Error

Describe the results you expected:
No error

Additional information you deem important (e.g. issue happens only occasionally):

while the REST API is not working (throwing error)
I've podman container ps -a working

$ podman container ps -a
CONTAINER ID  IMAGE                                    COMMAND     CREATED     STATUS         PORTS       NAMES
82b69ee0d0c4  localhost/podman-pause:4.2.0-1660228937              3 days ago  Removing                   6af38dfe23d8-infra
ba85fdf27813  docker.io/library/mariadb:10             mariadbd    3 days ago  Up 3 days ago              mariadb

and if I try to inspect the infra container, I've:

$ podman container inspect 82b69ee0d0c4
Error: error getting container from store "82b69ee0d0c46770aa7843332fc40a6e109d8a734bf5471411a12bb1efdfd2f1": container not known

Output of podman version:

(paste your output here)

Output of podman info:

(paste your output here)

Package info (e.g. output of rpm -q podman or apt list podman):

(paste your output here)

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)

Yes/No

Additional environment details (AWS, VirtualBox, physical, etc.):

@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Aug 29, 2022
@benoitf
Copy link
Contributor Author

benoitf commented Aug 29, 2022

Here is a small script to reproduce the issue on macOS

Create 10 pods
Remove containers (without force)
Remove pods (without force)

At the end my podman machine is not working

$ podman pod ls
POD ID        NAME        STATUS      CREATED        INFRA ID      # OF CONTAINERS
9f23485e890c  apps-10     Error       3 minutes ago  b4bd995ced93  2
bbf8b24c9e8d  apps-9      Degraded    3 minutes ago  7d41d7241dc0  2
1ded17911434  apps-8      Degraded    3 minutes ago  c0b65da446f2  2
f17257ff14b9  apps-7      Degraded    3 minutes ago  40e66b4075bc  2
e24df32659c9  apps-6      Degraded    3 minutes ago  54dc373ac387  2
754ce615e62f  apps-5      Degraded    3 minutes ago  05fead96e1fb  2
c8aedbd82332  apps-4      Degraded    3 minutes ago  26e328c5fe65  2
69cfdaf4ec3c  apps-3      Degraded    3 minutes ago  1c324f056286  2
14256d96bb35  apps-2      Degraded    3 minutes ago  8ec46163d487  2
5fdab12b52a5  apps-1      Degraded    3 minutes ago  638aa7515ea1  2
$ podman pod rm bbf8b24c9e8d
Error: error freeing lock for container 7d41d7241dc077ef9391ca8bf1658921543ad143f3086dd4fef4e961dbf522bf: no such file or directory
$ podman ps -a
CONTAINER ID  IMAGE                                    COMMAND     CREATED        STATUS            PORTS       NAMES
638aa7515ea1  localhost/podman-pause:4.2.0-1660228937              4 minutes ago  Removing                      5fdab12b52a5-infra
8ec46163d487  localhost/podman-pause:4.2.0-1660228937              4 minutes ago  Removing                      14256d96bb35-infra
240c931822f8  docker.io/library/mariadb:10             mariadbd    4 minutes ago  Up 4 minutes ago              mariadb2
1c324f056286  localhost/podman-pause:4.2.0-1660228937              4 minutes ago  Removing                      69cfdaf4ec3c-infra
97e0a49fc072  docker.io/library/mariadb:10             mariadbd    4 minutes ago  Up 4 minutes ago              mariadb3
26e328c5fe65  localhost/podman-pause:4.2.0-1660228937              4 minutes ago  Removing                      c8aedbd82332-infra
f6bc901555c6  docker.io/library/mariadb:10             mariadbd    4 minutes ago  Up 4 minutes ago              mariadb4
05fead96e1fb  localhost/podman-pause:4.2.0-1660228937              4 minutes ago  Removing                      754ce615e62f-infra
c3ea227919ed  docker.io/library/mariadb:10             mariadbd    4 minutes ago  Up 4 minutes ago              mariadb5
54dc373ac387  localhost/podman-pause:4.2.0-1660228937              4 minutes ago  Removing                      e24df32659c9-infra
a4f787757474  docker.io/library/mariadb:10             mariadbd    4 minutes ago  Up 4 minutes ago              mariadb6
40e66b4075bc  localhost/podman-pause:4.2.0-1660228937              4 minutes ago  Removing                      f17257ff14b9-infra
7742263ebfd0  docker.io/library/mariadb:10             mariadbd    4 minutes ago  Up 4 minutes ago              mariadb7
c0b65da446f2  localhost/podman-pause:4.2.0-1660228937              4 minutes ago  Removing                      1ded17911434-infra
7cc87b50b6c8  docker.io/library/mariadb:10             mariadbd    4 minutes ago  Up 4 minutes ago              mariadb8
7d41d7241dc0  localhost/podman-pause:4.2.0-1660228937              4 minutes ago  Removing                      bbf8b24c9e8d-infra
b4bd995ced93  localhost/podman-pause:4.2.0-1660228937              4 minutes ago  Removing                      9f23485e890c-infra
118b13b4fb3d  docker.io/library/mariadb:10             mariadbd    4 minutes ago  Removing                      mariadb10
podman inspect 638aa7515ea1
Error: error getting container from store "638aa7515ea15cd1b987df24dcdbb0699a6b3f6da0be5842c4d48ce3a119e810": container not known

Here is the script

#!/bin/bash
for i in {1..10}
do
  podman run --name "mariadb${i}" --pod "new:apps-${i}" -e MYSQL_RANDOM_ROOT_PASSWORD=yes -d docker.io/library/mariadb:10
done

# try to remove all containers
all_containers=$(podman ps -a -q)
for containerId in $all_containers
do
  podman rm "${containerId}"
done

# remove all pods
all_pods=$(podman pod ls -q)
for podId in $all_pods
do
  podman pod rm "${podId}"
done


# now, list all containers calling REST API
echo "Call REST API"
curl --unix-socket "$HOME/.local/share/containers/podman/machine/podman-machine-default/podman.sock" "http:/v1.41/containers/json?all=true"

At the end it should display: container not known error when trying to list all the containers

@mheon
Copy link
Member

mheon commented Aug 29, 2022

We've handled this race condition already in CLI podman ps, but evidently the fix did not make it into the API. Should not be difficult, just need to ignore errors for containers that do not exist and exclude them from the output.

@mheon mheon added Good First Issue This issue would be a good issue for a first time contributor to undertake. HTTP API Bug is in RESTful API labels Aug 29, 2022
@benoitf
Copy link
Contributor Author

benoitf commented Aug 29, 2022

We've handled this race condition already in CLI podman ps, but evidently the fix did not make it into the API.

I can also reproduce with one pod using instructions in a shell sequentially.

$ podman run --name mariadb --pod new:apps -e MYSQL_RANDOM_ROOT_PASSWORD=yes -d mariadb:10
$ podman ps
CONTAINER ID  IMAGE                                    COMMAND     CREATED        STATUS            PORTS       NAMES
785f98902bd4  localhost/podman-pause:4.2.0-1660228937              3 seconds ago  Up 4 seconds ago              979f1ab2c5fe-infra
558ddb906df1  docker.io/library/mariadb:10             mariadbd    3 seconds ago  Up 4 seconds ago              mariadb
$ podman pod ps
POD ID        NAME        STATUS      CREATED         INFRA ID      # OF CONTAINERS
979f1ab2c5fe  apps        Running     28 seconds ago  785f98902bd4  2
$ podman pod rm 979f1ab2c5fe
Error: cannot remove container 558ddb906df1aafe541b2f8180e3204bdd2bd20f76de8200d94ec4763eb76d26 as it is running - running or paused containers cannot be removed without force: container state improper
$ podman pod rm -f 979f1ab2c5fe
Error: error freeing lock for container 785f98902bd480bb8d4fd08593a802f58f0cabd80cae7d0a56aa442a3728f601: no such file or directory

Now, everything is broken

@benoitf benoitf added the pods label Aug 29, 2022
@mheon mheon removed the Good First Issue This issue would be a good issue for a first time contributor to undertake. label Aug 29, 2022
@mheon
Copy link
Member

mheon commented Aug 29, 2022

Removing good first issue and self-assigning, that seems very serious.

@edsantiago
Copy link
Member

Looks like #15367

@mheon
Copy link
Member

mheon commented Aug 29, 2022

Probably unrelated @edsantiago - no pods involved there.

Remote podman pod rm -f is removing the infra container, but not any other containers...

@mheon
Copy link
Member

mheon commented Aug 29, 2022

It's removing the infra container despite dependencies on it being present. Serious bug, possibly present in non-remote Podman.

@mheon
Copy link
Member

mheon commented Aug 29, 2022

Alright, identified the cause. It's 384c235

Container removal is unordered and normal checks to make sure that dependency containers and the infra container are not removed until the pod is removed are not enforced as we are attempting to remove the pod.

Solution here is probably not fun. Going to need to restructure pod removal to work in a graph-traversal fashion.

@krystalcode
Copy link

@mheon in reference to #15740 , is there something that I can do to remove the pod, or do I have to wait until the release that will contain the fix?

@benoitf
Copy link
Contributor Author

benoitf commented Sep 12, 2022

my only workaround is to call podman system reset twice (loosing everything) and restarting podman

@mheon
Copy link
Member

mheon commented Sep 12, 2022

It is possible that a podman system renumber after a podman pod rm may bring things back into a sane state, but I have not personally verified this.

@krystalcode
Copy link

podman system renumber worked in my case, thanks.

@mheon
Copy link
Member

mheon commented Sep 13, 2022

#15757 should fix, but testing would be appreciated.

mheon added a commit to mheon/libpod that referenced this issue Sep 14, 2022
Originally, during pod removal, we locked every container in the
pod at once, did a number of validity checks to ensure everything
was safe, and then removed all the containers in the pod.

A deadlock was recently discovered with this approach. In brief,
we cannot lock the entire pod (or much more than a single
container at a time) without causing a deadlock. As such, we
converted to an approach where we just looped over each container
in the pod, removing them individually. Unfortunately, this
removed a lot of the validity checking of the earlier approach,
allowing for a lot of unintended bad things. Infra containers
could be removed while containers in the pod still depended on
them, for example.

There's no easy way to do validity checks while in a simple loop,
so I implemented a version of our graph-traversal logic that
currently handles pod start. This version acts in the reverse
order of startup: startup starts from containers which depend on
nothing and moves outwards, while removal acts on containers which
have nothing depend on them and moves inwards. By doing graph
traversal, we can guarantee that nothing is removed while
something that depends on it still exists - so the infra
container should be the last thing in a pod that is removed, for
example.

In the (unlikely) case that a graph of the pod's containers
cannot be built (most likely impossible without database editing)
the old method of pod removal has been retained to ensure that
even misbehaving pods can be forcibly evicted from the state.

I'm fairly confident that this resolves the problem, but there
are a lot of assumptions around dependency structure built into
the original pod removal code and I am not 100% sure I have
captured all of them.

Fixes containers#15526

Signed-off-by: Matthew Heon <matthew.heon@pm.me>
@djnotes
Copy link

djnotes commented Oct 5, 2022

Is this fully solved?
I'm still getting empty list of containers on Windows with the latest pre-release version.
Tried Reload and Force Reload with no success.

main ↪️ PluginSystem: received dom-ready event from the UI
2index.ts:835 main ↪️ error in engine Podman Error: (HTTP code 500) server error - error getting container from store "8c8946fd410672133bb499554bc401b6ad6c395920ed9119b147a9f018b58e2a": container not known 
    at C:\Users\user\AppData\Local\Programs\podman-desktop\resources\app.asar\packages\main\dist\index.js:20:172262
    at c (C:\Users\user\AppData\Local\Programs\podman-desktop\resources\app.asar\packages\main\dist\index.js:20:172593)
    at ha.buildPayload (C:\Users\user\AppData\Local\Programs\podman-desktop\resources\app.asar\packages\main\dist\index.js:20:172234)
    at IncomingMessage.<anonymous> (C:\Users\user\AppData\Local\Programs\podman-desktop\resources\app.asar\packages\main\dist\index.js:20:171784)
    at IncomingMessage.emit (node:events:539:35)
    at endReadableNT (node:internal/streams/readable:1345:12)
    at process.processTicksAndRejections (node:internal/process/task_queues:83:21)

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 13, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
HTTP API Bug is in RESTful API kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. podman-desktop pods
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants