Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weblate cannot start container in docker swarm mode #2816

Closed
2 tasks done
FSeidinger-XI opened this issue Nov 19, 2024 · 15 comments · Fixed by #2820 or #2821
Closed
2 tasks done

Weblate cannot start container in docker swarm mode #2816

FSeidinger-XI opened this issue Nov 19, 2024 · 15 comments · Fixed by #2820 or #2821
Labels
question This is more a question for the support than an issue.

Comments

@FSeidinger-XI
Copy link

Describe the issue

I use the official docker image with a compose file in docker swarm. In this context the container cannot start, due to permissions issues regarding them tmpfs mounts.

I already tried

  • I've read and searched the documentation.
  • I've searched for similar filed issues in this repository.

Steps to reproduce the behavior

  1. Create a compose file on base of the official weblate docker-compose file.
  2. Deploy the weblate stack in docker swarm with docker stack deploy weblate -c compose.yml
  3. See with docker stack ps weblate that the weblate container does not come up. The redis and postgres containers however are starting fine.
  4. See the log outputs for the weblate container:
weblate_weblate.1.i53r2717rrgi@bosmang    | Starting Weblate 5.8.3...
weblate_weblate.1.i53r2717rrgi@bosmang    | /app/bin/start: 111: cannot create /tmp/localtime: Read-only file system

Expected behavior

The container should start and maybe fix the permissions upon start.

Screenshots

No response

Exception traceback

No response

How do you run Weblate?

Other

Weblate versions

Latest 5.8 on docker ce engine version 27.3.1

Weblate deploy checks

No response

Additional context

The docker compose file reference for docker swarm mode (docker stack deploy) is different to the local container (docker compose). In swarm mode there is no way of configuring the uid or gid for the mounted tmpfs file system.

I use the following compose file:

networks:
  weblate:

volumes:
  weblate-postgres-data:
    external: true

  weblate-redis-data:
    external: true

  weblate-data:
    external: true

  weblate-cache:
    external: true

services:
  database:
    image: postgres:17-alpine
  
    networks:
      - weblate

    volumes:
      - weblate-postgres-data:/var/lib/postgresql/data

    env_file:
      - /srv/cloud/weblate/environment

    deploy:
      placement:
        constraints:
          - node.role != manager

  cache:
    image: redis:7-alpine

    networks:
      - weblate

    volumes:
    - weblate-redis-data:/data

    command: [redis-server, --save, '60', '1']

    deploy:
      placement:
        constraints:
          - node.role != manager

  weblate:
    image: weblate/weblate:5.8

    networks:
      - weblate
  
    volumes:
      - weblate-data:/app/data
      - weblate-cache:/app/cache     
      - type: tmpfs
        target: /tmp
      - type: tmpfs
        target: /run

    env_file:
      - /srv/cloud/weblate/environment

    depends_on:
    - database
    - cache

    deploy:
      placement:
        constraints:
          - node.role != manager
          - node.labels.capacity != small
@nijel
Copy link
Member

nijel commented Nov 19, 2024

There should be a tmpfs mounted there. Do you have it in the compose file? Does it get mounted?

@FSeidinger-XI
Copy link
Author

FSeidinger-XI commented Nov 19, 2024

Yes it it there in the long form.

volumes:
      - type: tmpfs
        target: /tmp
      - type: tmpfs
        target: /run

It's the equivalent of

tmpfs:
    - /run
    - /tmp

I tried this form also, but changed because I thought I can set additional parameters with the long form. But tmpfs has none.

@nijel
Copy link
Member

nijel commented Nov 19, 2024

As you don't have the Weblate container with a read-only root, you can skip all those tmpfs volumes, these are only needed in case root is read-only.

But still, it is strange that it fails with "Read-only file system". If it was a permission issue, it should fail with "Permission denied". This way it looks like tmpfs is actually not mounted.

@nijel nijel added the question This is more a question for the support than an issue. label Nov 19, 2024
Copy link

This issue has been marked as a question by a Weblate team member. Why? Because it belongs more to the professional Weblate Care or community Discussions than here. We strive to answer these reasonably fast here, too, but purchasing the support subscription is more responsible and faster for your business. And it makes Weblate stronger as well. Thanks!

In case your question is already answered, making a donation is the right way to say thank you!

@nijel nijel transferred this issue from WeblateOrg/weblate Nov 19, 2024
@FSeidinger-XI
Copy link
Author

FSeidinger-XI commented Nov 19, 2024

As you don't have the Weblate container with a read-only root, you can skip all those tmpfs volumes, these are only needed in case root is read-only.

But still, it is strange that it fails with "Read-only file system". If it was a permission issue, it should fail with "Permission denied". This way it looks like tmpfs is actually not mounted.

You are right. I changed the entrypoint to a sleep and connected to the container with docker exec. The root file system and tmp looks like that:

weblate@55257ce84844:/$ ls -al
total 60
drwxr-xr-x   1 root root 4096 Nov 19 19:27 .
drwxr-xr-x   1 root root 4096 Nov 19 19:27 ..
drwxr-xr-x   1 root root 4096 Nov  7 12:38 app
lrwxrwxrwx   1 root root    7 Oct 16 02:00 bin -> usr/bin
drwxr-xr-x   2 root root 4096 Aug 14 18:10 boot
drwxr-xr-x   5 root root  340 Nov 19 19:27 dev
-rwxr-xr-x   1 root root    0 Nov 19 19:27 .dockerenv
drwxr-xr-x   1 root root 4096 Nov 19 19:27 etc
drwxrwx---   1 root root 4096 Nov  7 08:57 home
lrwxrwxrwx   1 root root    7 Oct 16 02:00 lib -> usr/lib
lrwxrwxrwx   1 root root    9 Oct 16 02:00 lib64 -> usr/lib64
drwxr-xr-x   2 root root 4096 Oct 16 02:00 media
drwxr-xr-x   2 root root 4096 Oct 16 02:00 mnt
drwxr-xr-x   1 root root 4096 Nov  7 08:57 opt
dr-xr-xr-x 416 root root    0 Nov 19 19:27 proc
drwx------   1 root root 4096 Oct 19 03:06 root
drwxrwx---   3 root root 4096 Nov 19 19:27 run
lrwxrwxrwx   1 root root    8 Oct 16 02:00 sbin -> usr/sbin
drwxr-xr-x   2 root root 4096 Oct 16 02:00 srv
dr-xr-xr-x  13 root root    0 Nov 19 19:27 sys
drwxrwxrwt   3 root root 4096 Nov 19 19:27 tmp
drwxr-xr-x   1 root root 4096 Oct 16 02:00 usr
drwxr-xr-x   1 root root 4096 Nov  7 08:57 var
weblate@55257ce84844:/$ cd tmp
weblate@55257ce84844:/tmp$ ls -al
total 16
drwxrwxrwt 3 root root 4096 Nov 19 19:27 .
drwxr-xr-x 1 root root 4096 Nov 19 19:27 ..
-rw-r--r-- 1 root root 2298 May  3  2024 localtime
drwxrwx--- 2 root root 4096 Nov  7 12:38 nginx
weblate@55257ce84844:/tmp$

So for sure there is nothing mounted. And the default content of the image with localtime and nginx having ownership of root:root with basically 640 or 750 is the reason, the container cannot start.

I'm not 100% sure but I read somewhere that the tmpfs is ignored in docker swarm mode. The reason is, that local mounts, and tmpfs is a special case of that, make no sense in a distributed environment. The recommendation for docker swarm is to use shared mounts based on shared file systems or images.

@nijel
Copy link
Member

nijel commented Nov 19, 2024

Isn't some real volume used instead of tmpfs? The localtime file is older than the container, what would indicate it. It also doesn't have 0770 permissions, as it has in the container.

@FSeidinger-XI
Copy link
Author

FSeidinger-XI commented Nov 19, 2024

Isn't some real volume used instead of tmpfs? The localtime file is older than the container, what would indicate it. It also doesn't have 0770 permissions, as it has in the container.

I investigated further. When running the container, docker allocates two local volumes for /tmp and /run and mounts them into the container. I guess that is from the volume directives in the weblate Dockerfile. Here is the draft from the docker container inspect output:

        "Mounts": [
            {
                "Type": "bind",
                "Source": "/etc/timezone",
                "Destination": "/etc/timezone",
                "Mode": "",
                "RW": false,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/etc/localtime",
                "Destination": "/etc/localtime",
                "Mode": "",
                "RW": false,
                "Propagation": "rprivate"
            },
            {
                "Type": "volume",
                "Name": "weblate-data",
                "Source": "",
                "Destination": "/app/data",
                "Driver": "wetopi/rbd:latest",
                "Mode": "",
                "RW": true,
                "Propagation": ""
            },
            {
                "Type": "volume",
                "Name": "weblate-cache",
                "Source": "",
                "Destination": "/app/cache",
                "Driver": "wetopi/rbd:latest",
                "Mode": "",
                "RW": true,
                "Propagation": ""
            },
            {
                "Type": "volume",
                "Name": "23abfa154535879e088a4e17e9a77e26ffc4d913b934c3e73541bb426d74a9ba",
                "Source": "/var/lib/docker/volumes/23abfa154535879e088a4e17e9a77e26ffc4d913b934c3e73541bb426d74a9ba/_data",
                "Destination": "/tmp",
                "Driver": "local",
                "Mode": "",
                "RW": true,
                "Propagation": ""
            },
            {
                "Type": "volume",
                "Name": "e76bcb15385bc3ce10312625a01bb4a285ee4ca49ea81a5e201ef77a3b4f6736",
                "Source": "/var/lib/docker/volumes/e76bcb15385bc3ce10312625a01bb4a285ee4ca49ea81a5e201ef77a3b4f6736/_data",
                "Destination": "/run",
                "Driver": "local",
                "Mode": "",
                "RW": true,
                "Propagation": ""
            }
        ],

As it is the default for local volumes, the content of the container is copied to the volume before mounting. Then the /tmp root directory gets the timestamp of the local volume (/var/lib/docker/volumes/xxxxx/_data) on the file system of the host and the content is the copied content from the container, hence the timestamps of May 3 for localtime and Nov 7 for nginx.

As I said, the tmpfs config is ignored in docker swarm.

So for me there would be three possible solutions:

  1. Drop the volumes for /tmp and /run in the weblate Dockerfile
  2. Override the user to be root.
  3. Create an own image from the official one without these two volumes.

@FSeidinger-XI
Copy link
Author

And here some more investigation to proof my hypothesis. I created a simple Dockerfile with the following content:

FROM weblate/weblate:5.8

RUN ls -al /tmp /run

The output of running docker buildx build --progress=plain --no-cache . shows:

#0 building with "default" instance using docker driver

#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 83B done
#1 DONE 0.0s

#2 [internal] load metadata for docker.io/weblate/weblate:5.8
#2 DONE 0.0s

#3 [internal] load .dockerignore
#3 transferring context: 2B 0.0s done
#3 DONE 0.0s

#4 [1/2] FROM docker.io/weblate/weblate:5.8
#4 CACHED

#5 [2/2] RUN ls -al /tmp /run
#5 0.414 + ls -al /tmp /run
#5 0.421 /run:
#5 0.421 total 12
#5 0.421 drwxrwx--- 1 root root 4096 Nov  7 07:57 .
#5 0.421 drwxr-xr-x 1 root root 4096 Nov 19 22:39 ..
#5 0.421 -rwxrwx--- 1 root root    0 Nov  7 07:57 adduser
#5 0.421 drwxrwx--- 1 root root 4096 Oct 16 00:00 lock
#5 0.421 
#5 0.421 /tmp:
#5 0.421 total 16
#5 0.421 drwxrwxrwt 1 root root 4096 Nov  7 11:38 .
#5 0.421 drwxr-xr-x 1 root root 4096 Nov 19 22:39 ..
#5 0.421 -rwxrwx--- 1 root root  114 Nov  7 11:38 localtime
#5 0.421 drwxrwx--- 2 root root 4096 Nov  7 11:38 nginx
#5 DONE 0.4s

#6 exporting to image
#6 exporting layers 0.1s done
#6 writing image sha256:8176d2226be081619156bbcf753326c569d128e9fc807ab43858a28c39194fe1 0.0s done
#6 DONE 0.1s

As expected the /tmp and /run directories of the container images are not clean.

@nijel
Copy link
Member

nijel commented Nov 20, 2024

Indeed, they are not clean, it probably could be fixed.

The /run and /tmp volumes are needed to support read-only root (see 384b4de, #1831
and #1840).

nijel added a commit to WeblateOrg/docker-compose that referenced this issue Nov 20, 2024
This can cause problems in some setups and should not be needed as the
container expects these to be empty.

See WeblateOrg/docker#2816
nijel added a commit to nijel/docker that referenced this issue Nov 20, 2024
This can cause issus in some setups as the content is copied, but some
permissions seem to be lost.

Fixes WeblateOrg#2816
nijel added a commit to nijel/docker that referenced this issue Nov 20, 2024
This can cause issus in some setups as the content is copied, but some
permissions seem to be lost.

Fixes WeblateOrg#2816
@FSeidinger-XI
Copy link
Author

FSeidinger-XI commented Nov 20, 2024

Indeed, they are not clean, it probably could be fixed.

The /run and /tmp volumes are needed to support read-only root (see 384b4de, #1831 and #1840).

I see two major problems with your approach.

Temporary files

The /tmp directory is used for short lived data of users and processes. It usually is managed by the system by some kind of upstart/cron script in a full fledged OS. Such an infrastructure is usually not part of a container image and should not be.

If an application has the need for temporary or volatile data according to the Filesystem Hierarchy Standard, Chapter 2, the /var filesystem should be used. For weblate then /var/lib/weblate could be a good way to go there.

Run-time variable data

According the /run directory the Filesystem Hierarchy Standard, Chapter 2, that it contains system information data describing the system since it was booted. This is also an infrastructure for a full fledged OS.

Exposing this as a volume and mapping it, e.g. to the /run directory of the host might expose very sensitive data, like the docker socket.

Also /var/lib/weblate might be a good place to store such data from the viewpoint of a container.

Maybe taking the /app/cache volume into this concept might then an architectural approach to think about it.

Conclusion

My investigation with the current image structure of the 8.5 versions show that even if you try to alter the image and breed one of your own, cannot be done. Removing the volumes or content of the directories are forbidden to he docker build because of the read only approach.

So I'm a little stuck here on how to use the docker image in a swarm mode context. Do you have any suggestions for me?

BTW, gave a minor sponsoring from my company for WeblateOrg to honor your effort.

nijel added a commit to nijel/docker that referenced this issue Nov 20, 2024
This can cause issus in some setups as the content is copied, but some
permissions seem to be lost.

Fixes WeblateOrg#2816
@nijel
Copy link
Member

nijel commented Nov 20, 2024

While I understand your architecture points, the reality is more tricky as Weblate runs many tools internally, and some of them might have /tmp hard-coded. There is no easy way to audit this, so writable /tmp needs to stay to avoid breaking something.

The question is whether adding volumes for these as in 384b4de is necessary. These are really only to be used with tmpfs and nothing else.

Copy link

The issue you have reported is now resolved. If you don’t feel it’s right, please follow its labels to get a clue for further steps.

  • In case you see a similar problem, please open a separate issue.
  • If you are happy with the outcome, don’t hesitate to support Weblate by making a donation.

1 similar comment
Copy link

The issue you have reported is now resolved. If you don’t feel it’s right, please follow its labels to get a clue for further steps.

  • In case you see a similar problem, please open a separate issue.
  • If you are happy with the outcome, don’t hesitate to support Weblate by making a donation.

nijel added a commit to nijel/docker that referenced this issue Nov 20, 2024
This reverts commit 384b4de.

These volumes are only intended to be used as tmpfs, but when adding
them as VOLUME in Dockerfile, these will end up as real volumes in the
swarm mode.

Fixes WeblateOrg#2816
@FSeidinger-XI
Copy link
Author

FSeidinger-XI commented Nov 20, 2024

While I understand your architecture points, the reality is more tricky as Weblate runs many tools internally, and some of them might have /tmp hard-coded. There is no easy way to audit this, so writable /tmp needs to stay to avoid breaking something.

Got that and accept the constraints.

The question is whether adding volumes for these as in 384b4de is necessary. These are really only to be used with tmpfs and nothing else.

But once again, tmpfs does not work with docker swarm. They are ignored and converted to local volumes and there is no way to go around that, because this is the intended behaviour of the docker daemon in swarm mode.

What I understand from this discussion, and let us be transparent and open here, is that you cannot or will not support weblate on docker swarm. Fair enough, that is the decision of the weblate project but should be noted in the documentation that docker swarm is currently not supported.

@nijel
Copy link
Member

nijel commented Nov 20, 2024

The tmpfs is only needed with a read-only file system. So, IMHO the only problem are additional volumes which are not supposed to be used as real volumes. I think #2821 should address the issue by removing the additional volumes. Let's see if the tests pass there and it doesn't break any of the existing expectations.

nijel added a commit that referenced this issue Nov 20, 2024
This reverts commit 384b4de.

These volumes are only intended to be used as tmpfs, but when adding
them as VOLUME in Dockerfile, these will end up as real volumes in the
swarm mode.

Fixes #2816
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question This is more a question for the support than an issue.
Projects
None yet
2 participants