Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cont-finish.d regression in v2.2.0.0+ using trap #363

Closed
bryanlatten opened this issue Dec 23, 2021 · 23 comments
Closed

cont-finish.d regression in v2.2.0.0+ using trap #363

bryanlatten opened this issue Dec 23, 2021 · 23 comments

Comments

@bryanlatten
Copy link

bryanlatten commented Dec 23, 2021

During container shutdown, for a service run script where the program has any trap, it now seems to block any cont-finish.d and s6-finish processes from executing. Removing the trap and/or downgrading s6-overlay restores the correct shutdown behavior.

Scenario

  • Example container (below) is run, which is only executing s6-pause
  • Shutdown simulated via docker stop.
  • Expect: "typical" s6 shutdown logs, including cont-finish.d and s6-finish statements (and expected s6-svwait timeout).
  • Actual: after run script's trap output, output stops.
  • Likely related: zombie processes also pile up when supervised services restart

Expected (seen with <= v2.1.0.2)

[s6-init] making user provided files available at /var/run/s6/etc...exited 0.
[s6-init] ensuring user provided files have correct perms...exited 0.
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] done.
[cont-init.d] executing container initialization scripts...
[cont-init.d] done.
[services.d] starting services
[services.d] done.
[run] Trap TERM received
[cont-finish.d] executing container finish scripts...
[cont-finish.d] done.
[s6-finish] waiting for services.
s6-svwait: fatal: timed out
[s6-finish] sending all processes the TERM signal.
[s6-finish] sending all processes the KILL signal and exiting.

Actual (seen with v2.2.0.0+)

[s6-init] making user provided files available at /var/run/s6/etc...exited 0.
[s6-init] ensuring user provided files have correct perms...exited 0.
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] done.
[cont-init.d] executing container initialization scripts...
[cont-init.d] done.
[services.d] starting services
[services.d] done.
[run] Trap TERM received

Dockerfile

FROM ubuntu
ADD https://github.com/just-containers/s6-overlay/releases/download/v2.2.0.0/s6-overlay-amd64-installer /tmp/
RUN chmod +x /tmp/s6-overlay-amd64-installer && /tmp/s6-overlay-amd64-installer /

RUN mkdir -p /etc/services.d/myapp

SHELL ["/bin/bash", "-c"]

RUN echo $'#!/usr/bin/execlineb -P \n\
trap\n\
{\n\
  term\n\
  {\n\
    echo [run] Trap TERM received\n\
  }\n\
}\n\
s6-pause' > /etc/services.d/myapp/run


ENTRYPOINT ["/init"]
@skarnet
Copy link
Contributor

skarnet commented Dec 23, 2021

Known issue, should be fixed in s6-overlay v3, which is in testing right now and should come out in a couple weeks or something, 😉

@bryanlatten
Copy link
Author

@skarnet thanks for the fast reply. looking forward to the release - let me know if you have it in a testable form to validate the fix.

@skarnet
Copy link
Contributor

skarnet commented Dec 24, 2021

Not right now unless you're willing to build the overlays yourself: it's in the v3 branch of the repository and you have some build instructions here (read the other text files in that directory as well).

Your Dockerfile will be slightly different: you'll need to ADD two tarballs, not one, and extract them instead of running an installer. If you have #!/usr/bin/execlineb as shebang in your scripts you'll want to extract the s6-overlay-symlinks-arch tarball as well. We'll document all that before the release.

@bryanlatten
Copy link
Author

@skarnet I've assembled the v3 branch (wow is it different). I'm not entirely sure what's happening during shutdown now...there's no logging whatsoever on my test case. Expanding here:

# Compilation environment
FROM alpine as builder

RUN apk update && apk add \
      git \
      make \
      gcc \
      tar \
      xz

RUN cd /tmp && \
    git clone --branch v3 https://github.com/just-containers/s6-overlay

WORKDIR /tmp/s6-overlay

RUN make

# Runtime testing environment
FROM alpine

ENV VERSION=3.0.0.0
ENV ARCH=x86_64

COPY --from=builder "/tmp/s6-overlay/output/s6-overlay-noarch-${VERSION}.tar.xz" /tmp/
COPY --from=builder "/tmp/s6-overlay/output/s6-overlay-symlinks-arch-${VERSION}.tar.xz" /tmp/
COPY --from=builder "/tmp/s6-overlay/output/s6-overlay-${ARCH}-${VERSION}.tar.xz" /tmp/

RUN tar -xvf "/tmp/s6-overlay-noarch-${VERSION}.tar.xz" --directory /
RUN tar -xvf "/tmp/s6-overlay-symlinks-arch-${VERSION}.tar.xz" --directory /
RUN tar -xvf "/tmp/s6-overlay-${ARCH}-${VERSION}.tar.xz" --directory /

RUN mkdir -p /etc/services.d/myapp

SHELL ["/bin/ash", "-c"]

RUN echo $'#!/usr/bin/execlineb -P \n\
trap\n\
{\n\
  term\n\
  {\n\
    echo [run] Trap TERM received\n\
  }\n\
}\n\
s6-pause' > /etc/services.d/myapp/run

RUN chmod +x /etc/services.d/myapp/run

ENTRYPOINT ["/init"]```

@skarnet
Copy link
Contributor

skarnet commented Jan 6, 2022

Sorry, I have a fever and am unable to think straight. I'll answer this as soon as I can.

@skarnet
Copy link
Contributor

skarnet commented Jan 6, 2022

@bryanlatten Works for me. Your Dockerfile doesn't declare a S6_LOGGING variable, so all the output goes to the container's stderr by default. Use the -t option to docker run so it displays to your terminal.

I am getting:

s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service legacy-services: starting
services-up: info: copying legacy longrun myapp (no readiness notification)
s6-rc: info: service legacy-services successfully started
^Cs6-rc: info: service legacy-services: stopping
[run] Trap TERM received
s6-svwait: fatal: timed out
s6-rc: info: service legacy-services successfully stopped
s6-rc: info: service legacy-cont-init: stopping
s6-rc: info: service legacy-cont-init successfully stopped
s6-rc: info: service fix-attrs: stopping
s6-rc: info: service fix-attrs successfully stopped
s6-rc: info: service s6rc-oneshot-runner: stopping
s6-rc: info: service s6rc-oneshot-runner successfully stopped
[run] Trap TERM received

The ^C happens when I press ^C in my terminal, which sends a SIGINT to the container's s6-svscan and triggers the shutdown sequence. During this sequence, first s6-rc brings the legacy-services service down, which sends a SIGTERM to myapp; the signal is caught by trap, which is why the service does not die and s6-svwait times out. Then, at the end of the shutdown sequence, a SIGTERM and a SIGKILL are sent to all the processes (including trap, still alive, which is why the message is printed again), then the container exits.

@bryanlatten
Copy link
Author

@skarnet if i build + run this provided dockerfile, here's what I get with ^C

➜  s6-overlay git:(v3) ✗ docker run -it --rm 101d20e3c9bb
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service legacy-services: starting
services-up: info: copying legacy longrun myapp (no readiness notification)
s6-rc: info: service legacy-services successfully started
^Cs6-linux-init-shutdown: fatal: unable to talk to shutdownd: Operation not permitted

and if I run docker stop from another terminal

➜  s6-overlay git:(v3) ✗ docker run -it --rm 101d20e3c9bb
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service legacy-services: starting
services-up: info: copying legacy longrun myapp (no readiness notification)
s6-rc: info: service legacy-services successfully started
➜  s6-overlay git:(v3) ✗

@bryanlatten
Copy link
Author

your shutdown sequence does fully match my expectations, and what was seen in the pre-regression version

@skarnet
Copy link
Contributor

skarnet commented Jan 6, 2022

My bad, I had not pushed the latest fixes. Done; please try rebuilding now, s6-overlay should pull the latest s6-linux-init.

@bryanlatten
Copy link
Author

I do see updates in v3 branch, but it hasn't yet change my outcome. Did you try building out my example dockerfile directly?

Still seeing

^Cs6-linux-init-shutdown: fatal: unable to talk to shutdownd: Operation not permitted

@skarnet
Copy link
Contributor

skarnet commented Jan 7, 2022

That's exactly what I did, and it's working for me. There's something else at play.
Could you please get a shell in the container (docker exec -it container_name sh) and give the output of s6-ps -H -o user,pid,vsize,rss,tty,s,start,cttime,args (precise list of all running processes in the container) and of ls -la /run/service/s6-linux-init-shutdownd? I'm particularly interested in the existence and permissions of a fifo pipe in that directory.
Thanks!

@bryanlatten
Copy link
Author

s6-ps:

/ # s6-ps -H -o user,pid,vsize,rss,tty,s,start,cttime,args
USER  PID   VSZ   RSS  TTY    STAT  START     CTTIME  COMMAND
root   67  1664  1060  pts/1  Ss    18:49:21  00.02s  ash
root   73   284   108  pts/1  R+    18:49:30  00.00s   \_ s6-ps -H -o user,pid,vsize,rss,tty,s,start,cttime,args
root    1   208    68  pts/0  Ss+   18:49:11  00.02s  /package/admin/s6/command/s6-svscan -d4 -- /run/service
root   16   212    64  pts/0  S+    18:49:11  00.00s  s6-supervise s6-linux-init-shutdownd
root   18   204     4  -      Ss    18:49:11  00.00s   \_ /package/admin/s6-linux-init/command/s6-linux-init-shutdownd -c /run/s6/basedir -g 3000 -C -B
root   22   212    64  pts/0  S+    18:49:11  00.00s  s6-supervise s6rc-fdholder
root   23   212    60  pts/0  S+    18:49:11  00.00s  s6-supervise s6rc-oneshot-runner
root   29   188     4  -      Ss    18:49:11  00.00s   \_ /package/admin/s6/command/s6-ipcserverd -1 -- /package/admin/s6/command/s6-ipcserver-access -v0 -E -l0 -i data/rules -- /package/admin/s6/command/s6-sudod -t 30000 -- /package/admin/s6-rc/command/s6-rc-oneshot-run -l ../.. --
root   57   212    64  pts/0  S+    18:49:11  00.00s  s6-supervise myapp
root   59   192     4  -      Ss    18:49:11  00.00s   \_ trap  term   echo   [run]   Trap   TERM   received    s6-pause
root   60   172     4  -      S     18:49:11  00.00s       \_ s6-pause

ls -la

/ # ls -la /run/service/s6-linux-init-shutdownd
total 20
drwxr-xr-x    4 root     root          4096 Jan  9 18:49 .
drwxr-xr-x    4 root     root          4096 Jan  9 18:49 ..
drwx-ws--T    2 root     root          4096 Jan  9 18:49 event
prw-------    1 root     root             0 Jan  9 18:49 fifo
-rwxr-xr-x    1 root     root           144 Jan  9 18:49 run
drwx------    2 root     root          4096 Jan  9 18:49 supervise

@skarnet
Copy link
Contributor

skarnet commented Jan 10, 2022

It's all looking good and I can't see a difference with what I did.
I'm sorry, but can you make sure that your image is rebuilt with the f81929f723e9e73bd67a7f788118181164a332e9 commit of s6-overlay v3, and try again? It should pull the 3151e4828a08586cdee65766c94a70488bdb0a86 commit of s6-linux-init, as shown by the conf/versions file.

If it's still not working, please add apk add strace to your Dockerfile, and paste the output of strace s6-linux-init-shutdown -r now run from a shell in your container. At this point only strace will help me figure out what's going on.

@bryanlatten
Copy link
Author

@skarnet hope you're feeling better! No change with the commit SHA (performed by doing a clone then git reset --hard $SHA, but here are my summary behaviors:

  • "Bare" docker run + ^C works appropriately
  • Attaching a terminal (-it) during docker run breaks ^C
  • docker stop still does not work or log anything with or without an attached terminal

Running without -it: ^C works as expected

➜  s6-overlay git:(v3) ✗ docker run --rm testsha
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service legacy-services: starting
services-up: info: copying legacy longrun myapp (no readiness notification)
s6-rc: info: service legacy-services successfully started
^Cs6-rc: info: service legacy-services: stopping
[run] Trap TERM received
s6-svwait: fatal: timed out
s6-rc: info: service legacy-services successfully stopped
s6-rc: info: service legacy-cont-init: stopping
s6-rc: info: service legacy-cont-init successfully stopped
s6-rc: info: service fix-attrs: stopping
s6-rc: info: service fix-attrs successfully stopped
s6-rc: info: service s6rc-oneshot-runner: stopping
s6-rc: info: service s6rc-oneshot-runner successfully stopped
[run] Trap TERM received

Running with -it: ^C does not work as expected

➜  s6-overlay git:(v3) ✗ docker run -it --rm testsha
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service legacy-services: starting
services-up: info: copying legacy longrun myapp (no readiness notification)
s6-rc: info: service legacy-services successfully started
^Cs6-linux-init-shutdown: fatal: unable to talk to shutdownd: Operation not permitted

@bryanlatten
Copy link
Author

Strace outputs with an without terminal (spoiler alert - they are the same):

strace-no-terminal.txt
strace-terminal.txt

@skarnet
Copy link
Contributor

skarnet commented Jan 11, 2022

That's very interesting. 😁
I was testing without -i: you can check that with docker run -t, it works.
It seems you have uncovered a bug that manifests when stdin is kept in the container. I will investigate and fix it.

@bryanlatten
Copy link
Author

bryanlatten commented Jan 11, 2022

@skarnet thanks! just making sure you also have the non-functional docker stop on your radar too...it's quite curious.

@skarnet
Copy link
Contributor

skarnet commented Jan 16, 2022

That was actually a very complex issue, that sent me down a deep, deep rabbit hole. I eventually got to the bottom of it though, and I think it's fixed now.

Please rebuild with the latest commit of s6-overlay, and tell me if it works for you. The docker stop issue is fixed as well: it was related.

@bryanlatten
Copy link
Author

@skarnet just rebuilt:

  • docker run + ^C (bare)
  • docker run + ^C with -t
  • docker run + ^C with -i
  • docker run + ^C with -it
  • docker stop

@skarnet
Copy link
Contributor

skarnet commented Jan 16, 2022

That's expected, and unfortunately, I can't do anything about it.

The thing is, if you have a CMD, then docker run -it + ^C will correctly interrupt your CMD and shut down the container.
But if you have no CMD, then the controlling terminal disappears after the container is initialized, and ^C doesn't do anything anymore. This is a different situation from the other ones, where ^C is not handled by a controlling terminal passed to the container, but by docker run itself (which sends a SIGINT to the container's init, not to the foreground process group).

Yes, it's complicated. 😄

Fortunately, I don't think it matters, because if you're running a container without a CMD, then there is no reason at all to use an interactive terminal - there is no benefit in running docker run -it. So I think the behaviour here is acceptable.

@skarnet
Copy link
Contributor

skarnet commented Jan 19, 2022

v3.0.0.0 is out! The pre-built tarballs aren't yet accessible because @jprjr still needs to set up the CI thing, but the source is available in master.
If it's working for you @bryanlatten, can I close this issue?

@bryanlatten
Copy link
Author

bryanlatten commented Jan 20, 2022

@skarnet congrats on the release! I suppose the root issue is solved, but the secondary effect of an unhandled -it makes me a bit concerned from a regression standpoint, since this worked "fine" in previous versions. Happy to keep testing any other recommendations you have

@skarnet
Copy link
Contributor

skarnet commented Jan 20, 2022

-it working "fine" in previous versions was, I'm afraid, an accidental side effect of two combining errors. 😅
The only recommendation I have is "don't use -it for non-interactive runs, in particular when you have no CMD". The current difference in behaviour is something that's caused by the way docker run works, not by the way s6-overlay works - sorry about that, the current iteration is really the best I can do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants