Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't supervise the docker binary with systemd #64

Open
jcgruenhage opened this issue Jan 7, 2019 · 41 comments
Open

Don't supervise the docker binary with systemd #64

jcgruenhage opened this issue Jan 7, 2019 · 41 comments
Labels
suggestion This issue is a feature request

Comments

@jcgruenhage
Copy link
Contributor

The docker binary is just a rest client that is talking to the docker daemon, which means that you aren't supervising the services but just the docker binary. I'm not sure what the reason for this is, but it means you can't use this playbook on alpinelinux, voidlinux, gentoo and possibly more.

I have multiple suggestions on how to solve this:

  • Switch to a different container stack, that doesn't rely on a background daemon to run containers (podman.io for example), which interacts more nicely with supervision suites, and add support for more supervisors
  • Don't supervise the docker binary but let the docker daemon manage the containers completely (which it will anyway, it's just a matter of not wrapping it in systemd)

I strongly prefer option 1 for my usecases, but since podman isn't available to most users, that probably won't be possible.

@spantaleev
Copy link
Owner

That's right.. Unfortunately, we are not really supervising the actual containers, but just the Docker client.

Still, having systemd "supervise" is useful because:

  • all output generated by each container propagates to its corresponding systemd serivce. The output is then logged in a standard way (journald). We intentionally suppress Docker logging (--log-driver=none) to prevent double-logging (once by Docker and once by systemd) or going out of disk space in /var/lib/docker due to excessive logging.

  • the Docker client will die when the container process dies.. Which will report to the systemd service and an automated restart will happen, as configured

Docker is an implementation detail (admittedly, one we can't easily get away from - see the next section below).
We'd rather Docker is not managing containers, restarts, logging or anything of that sort by itself. The fact that we can't really supervise the actual process is not ideal, but changing to "just let Docker do it all" (option 2 from your description) sounds even worse.


About option 1, I've thought about going for another container stack, but it seems like it won't be easy (or possible).

I've thought about rkt before, but it lacked some features. Things may be better now - I haven't checked.

Looking at podman, it doesn't seem like it's available for most distros, like you said.

Even if it is, we need to check if it supports everything that we use (or if we can squeeze into whatever it supports). While running multiple containers and having them talk to one another should be possible with podman too, we also rely on some implementation details in Docker (embedded DNS resolver being at 127.0.0.11 -- we use this in matrix-nginx-proxy vhosts).

We'll also probably want to replace all docker_container use and docker run commands we have in the playbook with an alternative. It looks like a podman module is making its way into Ansible, but even that one is not released yet.

So.. this seems like a hard problem to solve.. Despite this systemd + Docker supervision issue (which is ugly, but doesn't seem to cause trouble), Docker seems to help us more than it's getting in the way.


I'm curious, what exactly is the problem with using this playbook to install on Gentoo, Alpine Linux, etc.?
Isn't it possible to use Docker and systemd there (in general)?
Of course, we'll need to adapt some things (package installs, possible some paths, etc.) before this playbook can fully install all services on another distro.

@jcgruenhage
Copy link
Contributor Author

We'd rather Docker is not managing containers, restarts, logging or anything of that sort by itself. The fact that we can't really supervise the actual process is not ideal, but changing to "just let Docker do it all" (option 2 from your description) sounds even worse.

The thing is though, docker will do all the things anyway. When the process outputs something, it will be passed to the docker daemon, which routes the logs using it's internal logging code. While the logs aren't persisted by docker and just fetched by the rest client, we still have a very large amount of layers before any logs are written:

  1. Process, logs originate here
  2. Container runtime (runc usually)
  3. Container engine (docker daemon)
  4. Management CLI (docker binary)
  5. Supervisor (systemd)
  6. Logging (journald)
  7. (Optionally) syslogd

If we instead tel the container to use the syslog driver for logs, we can skip steps 4-6. The logs will be processed by docker anyway, the question is whether we use what docker offers.

When a container dies, the docker daemon will take a look at it's restart policy and act accordingly. We don't set one, so this (hopefully) won't do anything, but docker will still look whether it's supposed to do it.

Considering that we run everything in docker, using either docker_service or docker_container for running containers makes more sense IMO than wrapping it in another layer of systemd which doesn't give us any benefits.

what exactly is the problem with using this playbook to install on Gentoo, Alpine Linux, etc.?

It's that these distros don't use systemd. Gentoo and Alpine Linux use OpenRC, VoidLinux uses runit. Letting the docker engine manage the containers would fix this.

About option 1, I've thought about going for another container stack, but it seems like it won't be easy (or possible).

podman tries to be 100% compatible with the docker cli, to the point where just doing alias docker=podman would work in nearly all cases. We can't use the docker_* modules then, because they use the rest api instead of the cli, but if we use podman commands everywhere it should work.

Podman isn't stable though, it does have the occasional bug (like docker but a bit more often).

rkt isn't being continued, it was developed by CoreOS, which was bought by RedHat. RedHat has been working on podman for a while now, which supports nearly everything that rkt supports but is more modern and relies less on systemd, while it still works with systemd for those who want to use it.

@spantaleev
Copy link
Owner

I see your point. It does make sense to use Docker and have support for those other (non-systemd) distros as well. Especially given that the name of the playbook is matrix-docker-ansible-deploy and not matrix-systemd-docker-ansible-deploy ;)

However, systemd is a available and the go-to way for managing services on all popular and major distros which one would want to use on a server (RedHat/CentOS, Debian, Ubuntu, ..).

systemd and journald are also fairly common tools for controlling those services.

By using systemd, we follow the common path of how services on these servers are managed. It's a little ugly underneath due to how Docker runs (client talking to daemon), but it doesn't seem like our abstraction breaks so far. That is to say: container process dies (service is restarted); you manually restart the systemd service (the Docker container gets restarted); etc.

It's a working abstraction that lets people manage their servers using standard tooling that they're already used to.

If we let Docker manage all services, it means:

  • we get locked-in to Docker even more (well, we are anyway)

  • people need to use other tools to manage their services

  • standard logging configuration (journald) no longer applies. It's all shifted to Docker

  • service dependencies cannot be defined. How do we start Postgres before Synapse? Or maybe we don't? We start Synapse and then, if unfortunate to start before Postgres, let it fail and be restarted later?


Supporting other distros would be nice though. Most are systemd powered, so those would be easy.
Maybe we can have the playbook generate openrc/runit scripts for the others?


Hmm.. I guess both of these solutions don't sound ideal yet..

@jcgruenhage
Copy link
Contributor Author

However, systemd is a available and the go-to way for managing services on all popular and major distros which one would want to use on a server (RedHat/CentOS, Debian, Ubuntu, ..).

My reason for using voidlinux is that it doesn't use systemd. More and more applications start to depend on systemd without the need to do so, just because it's available most of the time. If we ever want to migrate away from systemd, we will be stuck in hell, because everything breaks.

By using systemd, we follow the common path of how services on these servers are managed. It's a little ugly underneath due to how Docker runs (client talking to daemon), but it doesn't seem like our abstraction breaks so far. That is to say: container process dies (service is restarted); you manually restart the systemd service (the Docker container gets restarted); etc.

Yes, it works, but adding unnecessary layers that make the whole setup more fragile just so that you can use systemctl and journalctl doesn't seem worth it.

systemd and journald are also fairly common tools for controlling those services.

Adding a guide where people can check commands to use for docker instead would help here too.

standard logging configuration (journald) no longer applies. It's all shifted to Docker

You can tell docker to log using syslog, which can then go into journald or rsyslogd or simple files. You don't need to keep the logs in docker.

service dependencies cannot be defined. How do we start Postgres before Synapse? Or maybe we don't? We start Synapse and then, if unfortunate to start before Postgres, let it fail and be restarted later?

Docker supports service dependencies, else the services just fail and get restarted until their dependencies are up.

@fbruetting
Copy link

fbruetting commented Aug 25, 2019

Please also don’t forget that podman allows rootless containers and is therefore much more secure! While Docker container exploits might give attackers root privileges, that won’t be the case when podman is used. And as we’ve known, Matrix sadly is not known to be very much secured or hardened (neither do I know about an audit). Additionally, multimedia applications are known to be amongst the most vulnerable types of software. So I’d consider the usage of podman as a vast security enhancement. And you might also consider that pods allow even further possibilities like Kubernetes, rkt, etc.

@spantaleev
Copy link
Owner

None of our containers run with a root user inside.

As far as I understand (which may be wrong), the benefit of "rootless containers" is that preparing and triggering container start can be done with a non-root user.

I guess Docker containers running with a non-root user inside (--user=...) already severely reduce the chance of escalating to root. Additionally, we drop capabilities as well (--cap-drop=ALL). I don't think (which may be wrong again) that the fact that root is required to initially set up the container is a big deal, given that there's no root usage after that. Of course, the container is monitored by some docker shim process which runs as root - that's probably not ideal, but it also doesn't sound too bad.

I think getting root access and escaping the container, etc., is already hard enough with our setup -- much much harder than running all these services services bare, without a container (which is how most people run Synapse, etc.).

I like that podman doesn't require a daemon to run as is much more natural and would be happy if we could migrate to that.

The problem with podman is:

  • it's not widely available on the various distributions we support - Debian 9+ / Ubuntu 16.04+ / CentOS 7+. Things may have improved lately. Time is also passing and we may drop some old distro version at some point, helping our case for podman support.
  • networking between services is a pain (hardcoding IPs around or just using --network=host is off-putting). Or maybe there's some better way I'm not aware of?

@jcgruenhage
Copy link
Contributor Author

https://github.com/containers/libpod/blob/master/install.md shows the supported distros. Only Debian is missing from those. CentOS has it available in their extras repository and Ubuntu has a PPA available here. You could theoretically write a bit of ansible to build and install podman on Debian, with the guide available here.

@jcgruenhage
Copy link
Contributor Author

jcgruenhage commented Aug 26, 2019

About networking: --network=host is really off-putting, but you can generate the IPs for the containers with ansible. Would it still be off-putting then? You can still use DNS names, by injecting entries into /etc/hosts in the containers

@fbruetting
Copy link

fbruetting commented Aug 26, 2019

@spantaleev Thanks for the detailed explanation! Seems like not as bad as I thought.

Regarding the distros: Why does anyone want to run podman in Alpine? Isn’t that just an OS image for containers? And I also would never target Gentoo for server software. 😅️ CentOS 8 will have podman installed by default, as far as I know.

Regarding the network: All containers in a pod are implicitly connected to each other, so that you can use localhost to reach e.g. your database container. The containers just have to connect to each other via the respective ports. And just the published port of the pod provides external access to the inside.

Here is a Nextcloud setup script I wrote yesterday and it runs fine. The Postgres port is not visible to the outside (at least I hope so very much), only via the published pod port 8080 you’re able to connect to the Apache service inside the Nextcloud container, which runs on port 80.

POD_NAME=nextcloud

podman pod create \
  --name                ${POD_NAME} \
  --publish             8080:80

podman create \
  --pod                 ${POD_NAME} \
  --name                ${POD_NAME}_nextcloud \
  --mount               "type=volume,source=${POD_NAME}_nextcloud,destination=/var/www/html" \
  --env                 POSTGRES_HOST=localhost \
  --env                 POSTGRES_DB=nextcloud \
  --env-file            db.env \
  nextcloud

podman create \
  --pod                 ${POD_NAME} \
  --name                ${POD_NAME}_postgres \
  --mount               "type=volume,source=${POD_NAME}_postgres,destination=/var/lib/postgresql/data" \
  --env-file            db.env \
  postgres

podman pod start nextcloud

@jcgruenhage
Copy link
Contributor Author

@fbruetting

Why does anyone want to run podman in Alpine? Isn’t that just an OS image for containers?

No, it's a Linux Distro. The most popular place where it's being deployed is probably as a container image, but you can very well use it outside of containers too.

All containers in a pod are implicitly connected to each other, so that you can use localhost to reach e.g. your database container. The containers just have to connect to each other via the respective ports.

That would mean that all containers need to be in the same pod though, which is not great.

@fbruetting
Copy link

fbruetting commented Aug 26, 2019

You should be able to connect several pods to each other as well as to other containers.

@jcgruenhage
Copy link
Contributor Author

Yes, sure, but then why merge those containers into a pod anyway?

@jcgruenhage
Copy link
Contributor Author

You don't need pods to connect containers to each other

@fbruetting
Copy link

fbruetting commented Aug 26, 2019

Why have data structures when you can have all variables separate on their own?

Answer is encapsulation, logical grouping (especially if you have a lot of containers or more instances of the same service) and managability. That’s also the reason why Docker compose exists. Why should someone do things complicated?

@jcgruenhage
Copy link
Contributor Author

Yes, pods do make sense for some usecases, but where do they make sense for this specific case?

@Samonitari
Copy link
Contributor

Hi all!

It was a good read! Any chance on continuing the discussion?
Although I am not well-versed in containers, migrating to podman seems productive.

@jcgruenhage

Yes, pods do make sense for some usecases, but where do they make sense for this specific case?

Just as @fbruetting said, for encapsulation. Maybe if you run a matrix server instance, and nothing else, bare containers are fine. But this is only an isolated project in theory, and mostly not isolated if deployed. See, the docs of this project is so good, that it anticipates you have other stuff hosted at the same server...
In my case, I am struggling with setting up this project, alongside a Nextcloud/postifx/dovecot/etc based ecloud instance. The latter is set up by a shell script (although development is moving towards Ansible), uses docker-compose (podman should be possible there too), and the whole server setup is starting to give me headaches.
Sometimes, I wish both of them would run without containers, as systemd-services.

In an ideal world, I would run both of them in separate pods, and be able to, i.e.: upgrade matrix without creating mailserver downtime, and vica-versa.
On second thought, no, maybe the containerless would be best still, as non-root user.
But definitely podman after that.

Regarding systemd.
I am okay with it, as a OpenSUSE user. The good thing about podman, you could use systemd over that. I am all in for the idea, that you already have PID1/service_manager/init_daemon (be it systemd or other), so throw out the container_manager_daemon. You run "services" in those containers anyways, don't you - mostly. Although it needs work, but generating service files (or whatever needed for OpenRC and friends) based on podman configs should be possible too.
By the way, options are good thing to have, and systemd is definitely overused and bloated in places, but I disagree with you though.
If we want to migrate to something_better (assuming it will exist), migrating everything from systemd is MUCH easier, than migrating from initV, upstart, OpenRC, runit, systemd, and other whatthehell. Oh yeah, upstart is not used anymore, wonder why.

@fbruetting
Copy link

Sometimes, I wish both of them would run without containers, as systemd-services.

You can do systemd containers with systemd-nspawnd, nsbox already uses that for example. 😛

If you can go for containers, please forget installing on the host. There are no brakes on the hype train for a reason! 😄

@mooomooo
Copy link

As a proof of concept, I wrangled the ansible playbook into generating a docker-compose.yml instead, and it seems to work just fine. (The process of wrangling, however, is not fine at all -- the following process is an extreme hack based on me not knowing anything about ansible or docker-compose when I started.)

It does just the basic setup, with none of the optional additions (I haven't tried, maybe they would work?). Perhaps someone more keyed in than me can adapt this more cleanly. The key points:

  • Every docker run command that would have gone in a systemd script becomes a service in the docker-compose file. Requires/After/Wants become a depends_on, every service gets a restart: always, and nothing else matters.
  • Those docker-compose service snippets get put in {{ matrix_base_data_path }}/compose
  • Additional docker-compose snippets are also made there for networks (that were otherwise direct docker commands in the anisble)
  • Anything that would have gone in /usr/local/bin gets put in {{ matrix_base_data_path }}/bin
  • Other docker run commands in the ansible also get turned into scripts in bin
  • We don't need to do any package management or anything, so I added a tag on the few ansible modules that we do use (or at least, the ones that don't muck with the base system at all). But unfortunately ansible doesn't allow for tags = setup_all AND safetodo (it only does OR) so I manually added a bunch of --skip-tags, and compared the outputs of --list-tasks to make sure I didn't do anything extra than setup.

You'll need the python3 script attached.
composify.py.gz
Put it in the directory within which you'll git clone this repo.

Set up your matrix user and groups on your system on your own. Make sure you include those in the host_vars file, along with the path that will contain all the generated config files

Then the step by step procedure:

# git clone https://github.com/spantaleev/matrix-docker-ansible-deploy.git
  • create/edit matrix-docker-ansible-deploy/inventory/host and matrix-docker-ansible-deploy/inventory/host_vars/* as usual
  • Make sure you include in the host_vars/*/vars.yml file --
    • matrix_user_username: "username"
    • matrix_user_uid: ###
    • matrix_user_gid: ###
    • matrix_base_data_path: "path-to-matrix-config-files"
$ python3 composify.py matrix-docker-ansible-deploy
$ cd matrix-docker-ansible-deploy
$ ansible-playbook -i inventory/hosts setup.yml --tags=safetodo --skip-tags=self-check,run-postgres-vacuum,import-media-store,run-postgres-synapse-janitor,upgrade-postgres,import-postgres,import-sqlite-db,register-user,update-user-password,start 
$ cd "path-to-matrix-config-files"
$ ./bin/matrix-initialize 

Everything above can in theory be done on any computer, then all you actually need are the files in path-to-matrix-config-files to get put on the actual host. But all the paths are hardcoded so you may want to do everything on the host computer anyway.

$ ./bin/matrix-initialize-key
$ ./bin/matrix-initialize-certs
  • if you're going to be doing this a bunch, then save the ./ssl folder somewhere and just copy it back instead of re-initializing certs every time to avoid letsencrypt timing out.
$ docker-compose up 
  • it may complain about permissions, sudo chown -R xxx:xxx . if necessary
  • it may complain about a ports: being wrong in the docker-compose.yml file, if so delete the line in there that has a ports: but nothing after it. I tried to get rid of it using a fancy sed script, but I don't know what I'm doing.

That should be it! You should be able to control your install using docker-compose. Add a user using ./bin/matrix-synapse-register-user, etc.

@mooomooo
Copy link

I have one main request that would make this process a lot less hackish, that I can't really figure out on my own easily. Can we have separate tags for the tasks that configure the host vs tasks that configure the containers? Then we can only run the tag that handles the containers, along with the .service -> .yml switch, and then the containers can be managed however people want. Of course, this means users and cron (and anything else?) would not be automatically configured.

@mooomooo
Copy link

Incidentally, with #418 this gets a lot cleaner, with two helper python scripts:
addsafetodo.py.gz : adds tag safetodo to the tasks we want to run.
composify.py.gz : converts a directory of .service files into a docker-compose.yml.

  1. git clone
  2. Create entries in inventory/host_vars/*/vars.yml: e.g.
matrix_systemd_path: "{{ matrix_base_data_path }}/compose"
matrix_cron_path: "{{ matrix_base_data_path }}/cron"
matrix_local_bin_path: "{{ matrix_base_data_path }}/bin"
  1. Make the directories pointed to above if they don't already exist
  2. python3 addsafetodo.py matrix-docker-ansible-deploy
  3. $ cd matrix-docker-ansible-deploy && ansible-playbook -i inventory/hosts setup.yml --tags=safetodo --skip-tags=self-check,run-postgres-vacuum,import-media-store,run-postgres-synapse-janitor,upgrade-postgres,import-postgres,import-sqlite-db,register-user,update-user-password,start
  4. python3 composify.py <what you used for matrix_systemd_path above>

and now you have your docker-compose.yml in the current directory! You'll still need to matrix-initialize-key, do the certs thing, and create users as above.

@spantaleev
Copy link
Owner

Wow, you've spent a lot of time on this! Happy to hear you're finding an alternative way to make use of this playbook! Hopefully others will find it useful as well.

If it proves useful to others alike, in the future we could probably even add your scripts to the playbook and introduce some new tags to help run them.

@christianlupus
Copy link
Contributor

I just entered the issue here again. I have the issue that I need to run multiple instances on one machine. Using a pure container-based approach (without systemd) is a requirement as the systemd scripts are overwritten otherwise.

I consider the best option (in my current position and given my current knowledge) to be using docker-compose. I know that the scripts from @mooomooo exist but I would like to be able to update the ansible scripts from time to time. This is simpler if the corresponding files for docker-compose are generated directly.

My personal suggestion is to start a new branch here in the repo that gets rid of the systemd stuff completely. One could suggest to migrate to the new structure as soon as it is stable enough. Then a legacy code base could be kept in a separate branch and the newly branch can be renamed to master or whatever. As the main data is stored in the containers' volumes, the difference is only on the host's configuration.

Normally, I would start a fork migrating the current state to the docker-compose version. However, I would like to know from you @spantaleev if this is something you would be comfortable with. Otherwise it will consume quite some work and be one of many stalled and soulless open source forks/projects.

To sum up things: I would migrate the current systemd-based approach to a docker-composed one. The logging would be done using journald to be compatible as much as possible. Some documentation can be added if the need is there how to start/stop a service using docker-compose (however this should not be necessary as we are using Ansible anyways, right?).

@ptman
Copy link
Contributor

ptman commented Jul 21, 2020

systemd supports instantiation, e.g. matrix-synapse@example.com

systemd (with systemctl, journalctl) is standard tooling on most linux distributions. System administrators should be able to keep using the standard tools as much as possible. You still need standard tooling, e.g. cron. Or how would you replace those with docker-compose? Standard tooling is less complex than standard tooling + docker-compose.

-1 for docker-compose
(+1 for podman or similar instead of/in addition to docker to get rid of dockerd)

@spantaleev
Copy link
Owner

I also think that adjusting the existing systemd setup in a way that's part of the playbook would probably be better (and won't need any maintenance, once done).

When going with systemd, besides instantiation, we can also go for service prefixing/suffixing. instance-1-matrix-synapse.service, instance-2-matrix-synapse.service, etc. The playbook can be adjusted to support such prefixes (or suffixes) and prevent systemd service conflict.

Similarly, we'd need to figure out some other things (cronjobs conflicting, etc.)

As for reverse proxying, a multi-instance installation would probably need to disable matrix-nginx-proxy and use an external webserver (something we already support). In such a setup, we currently expose ports by services on localhost, so we'll get some port conflicts when multiple installations try to expose exactly the same ports. I believe all (or at least most) of these port numbers are overrideable, so such port conflicts can be worked around. We may even make this easier (automatic), by introducing some new variable (matrix_base_port_number: 0), and calculating all ports (by all roles) based on that. This way you can bump it for each installation and have all ports readjust. Of course, that would take some reworking as well.

@christianlupus
Copy link
Contributor

The idea of the using the instantiation feature of systemd is a nice idea. I had not thought of that yet.

What else (except for cron) do we need? If doing it the real container-based way, cron will run in a container, too. So no dependencies on the host needed.

The main point in this whole discussion is the fact that systemd assumes to control a process while with docker involved, it only controls the client process. Thus systemd has no clue about the fact that the "real business" is happening somewhere completely different. This causes a whole bunch of drawbacks (see above). Therefore I suggested to drop the systemd requirement in the long-term and use some other management tool instead.

If we want to stick with docker (as the name of the repo suggests), it should be something that is capable of using docker features. Here, systemd is not fitting very well. I know docker-compose is fitting. I have not used podman yet. My quick research lead to the impression, that it's focus is to run a container as a non-privileged user (on the host) faking root privileges in the containers. This is nice from the perspective of a developer who do not need root access to test something out.

In fact, as far as I understand things, docker-compose can be converted comparably easyly to pod descriptions. So, a first step towards #520 might in fact be the generation of valid docker-compose files. I see that docker is not the only toolset regarding containers but currently it is at least one of the most used ones. If in a later stage dockerd is replaced by some other toolset, I am not against this in general. However, this is better to be discussed in #520 or similar.

I would say that users of an ansible script are no simple script kiddies that just hack in systemctl restart .... but have at least a basic understanding what they are doing. It would even be possible to define tags in the script to selectively restart/reload individual services from ansible. If you use ansible you should stop messing on the machine manually but use the tool full-hearted.

If you really fear that a completely learning-resistant admin needs systemd by all means, you could add small service files that call docker-compose accordingly in a oneshot manner. Just similar to our current approach with the docker client invocation.

@aaronraimist
Copy link
Contributor

Please also don’t forget that podman allows rootless containers and is therefore much more secure! While Docker container exploits might give attackers root privileges, that won’t be the case when podman is used.

I guess Docker containers running with a non-root user inside (--user=...) already severely reduce the chance of escalating to root. Additionally, we drop capabilities as well (--cap-drop=ALL). I don't think (which may be wrong again) that the fact that root is required to initially set up the container is a big deal, given that there's no root usage after that. Of course, the container is monitored by some docker shim process which runs as root - that's probably not ideal, but it also doesn't sound too bad.

Docker seems to have a rootless mode nowadays. https://docs.docker.com/engine/security/rootless/

@d-513
Copy link

d-513 commented Feb 17, 2022

This is a deal breaker for me, I don't want a dozen systemd services on my system. I'm using alpine for my container host as well, most containers use alpine anyway, so no need to add the additional heavy stuff normal distros come with.

I could use another distro, but I don't really want to manage this this way.

@spantaleev
Copy link
Owner

You can work on adding openrc service scripts to the playbook. They could work similarly to how our systemd .services work (invoking docker run ..).

@d-513
Copy link

d-513 commented Feb 17, 2022

You can work on adding openrc service scripts to the playbook. They could work similarly to how our systemd .services work (invoking docker run ..).

Yeah well, I would prefer to just create containers with Podman. Might look into that but could require a fork to be viable

@spantaleev
Copy link
Owner

Podman with the dnsname plugin may be becoming a viable alternative to Docker.

Someone in our Matrix room said they'd be experimenting with that soon, so the playbook may be getting Podman support. Still, I don't see us switching away from using a service manager to start the containers.

@d-513
Copy link

d-513 commented Feb 17, 2022

Podman with the dnsname plugin may be becoming a viable alternative to Docker.

Someone in our Matrix room said they'd be experimenting with that soon, so the playbook may be getting Podman support. Still, I don't see us switching away from using a service manager to start the containers.

Yeah then I might end up forking it and just replacing the systemd stuff with podman_container for my personal deployment purposes

@DanH42
Copy link

DanH42 commented Feb 18, 2022

I'd been sort of meaning to leave a more detailed writeup here at some point, but since this issue has been getting a bit of traffic again, I may as well brain dump a high-level overview of how I successfully used the playbook in this repo to get a systemd-less Matrix setup. Last January, I used @mooomooo's Python script to spin up a Matrix server managed entirely through a docker-compose file, and it's been running great ever since. I can try to go back and refresh my memory to provide more details if someone would like, but here's what I remember now:

The Python script needed some small tweaks to get it working since it hasn't been kept up to date with this repo, but I seem to recall they were pretty obvious fixes; I'm not a super strong Python dev and I didn't have any trouble turning error messages into fixes. I assume in the year that's passed since I did any of this, more changes will probably be needed.

I ran the process in a local throwaway VM completely isolated from the target server. At no point did Ansible touch that server, nor does that server have any systemd dependencies. It was theorized earlier that this method was probably possible, but I don't think anyone ever confirmed that they'd gotten it to work previously. The process I used looked basically like this:

  • Spin up a brand new VM on a local machine. I used CentOS 7.
  • Install Ansible and other prereqs (I forget if I preinstalled Docker or let the playbook do it, but it shouldn't matter)
  • Follow mooomooo's instructions
  • Fix the Python scripts until the errors go away (I don't remember details here), re-running as needed
    • Once I had a working pair of Python scripts, I think I blew away the VM and started over from scratch just to make sure my past partial runs didn't leave anything in a weird state, as I've often seen Ansible do. Unsure if this was important.
  • Once you've got a valid docker-compose.yaml file, move it and the corresponding matrix/ config directory out of the VM and on to a real server with Docker and docker-compose
  • Optionally, clean up the formatting a bit in the generated docker-compose.yaml file
  • docker-compose up -d
  • Manually run any DB actions that would normally be run by the playbook to initialize tables and whatnot (I don't remember details here either, but I used SQL scripts I grabbed directly from this repo that weren't too hard to find)

The only extra step I had to take that wasn't mentioned in the instructions for the Python conversion script was that very last step. The Ansible playbook expects to be able to initialize databases itself, since it assumes it can talk to the database when it runs. To be fair, it can, it's just talking to a database that's about to be blown away. In a proper Docker setup, these steps would normally be run by the container that needs them on its first boot, but since I only needed them this one time, I just exec'd a shell in the postgres container and stuck them in by hand.

You can probably avoid the DB init issue by running the Ansible playbook directly on the server you're deploying to, as was its original intention. I wanted to host Matrix on a server that was already running some other important stuff, and I didn't take the time to read and understand this playbook enough to be fully confident that it wasn't going to do anything disruptive (messing with my existing Docker installation, creating new system users, restarting services, etc) and I didn't want to take any chances.

Over the past year, I've added services and upgraded others entirely inside my docker-compose.yaml file, which has been just as painless as managing any other collection of services with docker-compose (if you didn't like it before, you probably won't like it here either). The initial conversion process may involve a pretty janky Python script that's converting between two formats that were never meant to be converted between, but the end result has been great. I started off having never used Matrix before, and a weekend of effort got me a maintainable config that gives me a small but stable multi-user Matrix homeserver with a handful of bridges that's seen daily use for a year and change.

To each their own I guess, but I probably wouldn't be running Matrix today if I had to manage it through systemd services (or any other init system, really) and use Ansible to manage component versions. Being able to manage the entire stack (runtime, process monitoring, logs, networking, updates, etc) with a single tool that I'm already comfortable with is way easier as far as I'm concerned. For example, if I'm not getting all my logs as plain flat files in /var/log/, I'd strongly prefer to manage them using Docker's tooling than with journalctl, which I still can't use successfully without regularly consulting the man pages after all these years of being essentially forced to manage my systems with it. Docker's logging system also doesn't inherit journald's longstanding critical bug that causes it to lose track of log lines of processes that just exited (frequently only dropping the error message you're looking for, but still keeping everything before it, making troubleshooting extremely confusing). I've used Ansible professionally for years and always found it to be fragile, messy, and a pain to manage at basically any scale, but this repo has FAR more complete documentation on the general process and details of setting up a usable Matrix server than I was able to find anywhere else, and I think there's a lot of value in being able to carry that over to a docker-compose setup rather than starting from scratch. Maybe that happens by way of someone keeping the composify Python scripts up-to-date with the rest of the repo, or maybe we get a way of selecting an init system in the playbook just like you'd use to specify systemd vs openrc, and have a third target for docker-compose.

@spantaleev
Copy link
Owner

Thanks for the writeup!

We try not to contaminate the system too much. We have variables that can be toggled to prevent the playbook from installing Docker (matrix_docker_installation_enabled: false).

We only create a matrix user and group on the system, which own the files in /matrix. We try to start containers with these users, so that when they write data to the filesystem, it'd be owned by matrix:matrix outside.

We currently contaminate /etc/systemd/system with .service and .timer files. To someone who wishes not to use systemd, this may seem bad. It may even fail, due to the /etc/systemd/system directory tree not existing.

This is configurable using matrix_systemd_path though. You may be able to use something like matrix_systemd_path: /matrix/systemd to prevent dirtying up /etc/systemd on a non-systemd system.

I suppose you can automate your setup by:

  • making the playbook install docker-compose for you, in addition to Docker (optional - you can do it manually, if you prefer)
  • using matrix_systemd_path: /matrix/systemd - so that no /etc/systemd contamination will happen
  • running some composify script as an "after task" - it converts the .service files to a docker-compose.yml somewhere
  • having the playbook not try to start services (you can avoid the --tags=start tag.. or we can introduce a variable to have it do something else.. perhaps run a custom command)

@christianlupus
Copy link
Contributor

May I suggest a variant of this approach that might suffice for all users (hopefully):

We could create a docker-compose.yml file in /matrix. Then, the main containers can be managed via docker-compose internally. Those users that do not prefer the systems approach can stop here (aka any further steps are conditional).
To manage the services, there can be the same systemd unit files as they are there currently. The difference is that these need to run docker-compose instead of plain docker. The logs can be redirected to journald from within docker-compose.

The benefit would be that

  1. Docker manages and keeps track of the services. Problems during startup are no longer existing.
  2. Systemd users can use their tools
  3. Replacement of docker-compose by the podman equivalent should be straightforward
  4. Users without systemd are satisfied

I would be willing to give this a try if it might feasible and acceptable for most users. However I am not willing to rework the playbook and get a "No, we do not want docker-compose at all" answer.

@spantaleev
Copy link
Owner

Docker manages and keeps track of the services. Problems during startup are no longer existing.

How so? We still need to use depends_on, etc., for defining service order, just like we use After for systemd.


The Docker-Compose situation is somewhat messy. It's one more thing to install. Then there's v1 and v2, as well as compose-switch which exposes Compose v2 (the docker compose command) as a docker-compose command.

Having Docker act as a service manager is somewhat ugly.. it doesn't seem like it should do that. Just like systemd should not be supervising docker run (because docker run just forwards commands to the Docker daemon), it feels like a program for spawning containers should not be acting as a service manager.

From what I've heard, podman-compose does not work well (yet).

@christianlupus
Copy link
Contributor

Docker manages and keeps track of the services. Problems during startup are no longer existing.

How so? We still need to use depends_on, etc., for defining service order, just like we use After for systems.

I had experienced sometimes problems with not-completely started instances as some containers failed to start (or were up too early, IDK). I had to issue systemctl start matrix-... manually to get it working again (or run the complete ansible-playbook).

The Docker-Compose situation is somewhat messy. It's one more thing to install. Then there's v1 and v2, as well as compose-switch which exposes Compose v2 (the docker compose command) as a docker-compose command.

That is true, that it means one more installation requirement. It is installed by ansible thus no big problem fo the end user. Or are you concerned about breaking changes in the future?

Having Docker act as a service manager is somewhat ugly.. it doesn't seem like it should do that. Just like systemd should not be supervising docker run (because docker run just forwards commands to the Docker daemon), it feels like a program for spawning containers should not be acting as a service manager.

I think we have different notions of what which program is doing. Why would you call docker a service manager in this setup?
It is by design that the docker server is an abstraction layer that separates the containers from the user. So, starting, monitoring, and management of containers are tasks of the docker daemon.
In contrast, the systemd is a service management to start, monitor, and manage system services/programs like e.g. the X11 server, a Apache HTTP server, or even the docker daemon.

When docker starts, it restores the states of all containers running before shutting down (appropriate container configuration assumed). So, if there are a few containers running just before shutdown, the docker daemon will spawn such containers upon restarting. Why do you call it a service manager now? A container manager, yes, that is its inherent task.

From what I've heard, podman-compose does not work well (yet).

I did not look into podman. I am a happy user of docker and docker-compose for various services I am running. I just heard it popping up now and then here and wanted to pick up the topic and offer an option here as well.

@Dima-Kal
Copy link

Dima-Kal commented Mar 9, 2022

What are the limitations of sharing a docker-compose.yml file or template? why is a python script needed or ansible?
I want to have a clean server without ansible installed and only load a docker-compose.yml into portainer to do all managing of containers needed for matrix

@ptman
Copy link
Contributor

ptman commented Mar 9, 2022

@Dima-Kal ansible isn't installed on the server, it's installed on a machine that uses ssh to access the server. But this whole project is an ansible playbook, so getting rid of that makes no sense. You should probably start a new project if you want to go without ansible.

@Samonitari
Copy link
Contributor

I think we have different notions of what which program is doing. Why would you call docker a service manager in this setup?
It is by design that the docker server is an abstraction layer that separates the containers from the user. So, starting, monitoring, and management of containers are tasks of the docker daemon.
In contrast, the systemd is a service management to start, monitor, and manage system services/programs like e.g. the X11 server, a Apache HTTP server, or even the docker daemon.

I think the your differentiation is a bit arbitrary.
Many people - like the users of this playbook with default settings - run Nginx through docker (not Apache thank God), which IS a typical service like you said.
Single purpose containers for Nginx, synapse, ma1sd, postgresql are a thin layer on top of things that are packaged with systemd files in any reasonable distribution (and this abstraction often gets in the way, except during deployment). Are these services familiar?

AFAIC the better version would be without docker at all. But the distribution fragmentation makes it much more difficult to write a multi-distro playbook (I had to add like 4 repos to OpenSuse leap 15.3 in a non-docker matrix deployment), so thanks for @spantaleev and all the contributors!
(On a side note, systemd (and Linux) can do most of the security separation docker gets glorified for, so I prefer "native" packages for a non-scaling deployment. L'art pour L'art containering, and you don't even get a frickin' less inside any of those Alpine crap)

@kevinveenbirkenbach
Copy link

Hey,
first I want to say thanks for developing such a sophisticated tool.
Are there some new developments concerning the docker-compose implementation?
I would be happy to use docker compose.
Over the last years I developed this software to have a all in one solution for setting up various platform services with ansible and docker compose.
At the moment I'm forced to use my own role to setup docker and implement all bridges etc. manual with this role.
I would prefer to use this role instead. It uses this repository as a base.
The problem with this repository for me is, that it is not sufficient encapsulated from the linux system.
I also don't understand why systemd services on the system level are needed. Theoretically this could all be done inside containers. I don't understand the architectural idea behind using the systemd services. If you could explain me how this benefits compared to using docker compose I would be super glad.
Looking forward to your reply.
Best
Kevin

@spantaleev
Copy link
Owner

It's great to hear about your software and I'm sorry that us not being based on docker-compose natively makes things harder for you!

Some attempts at giving answers are below:

  1. Why Ansible and not a static docker-compose.yml file?

This playbook currently manages about 100 components - most of which optional.

All of these services are possibly interconnected and wired together in dynamic ways - something that a static docker-compose.yml file and an env file cannot do. The Ansible playbook has lots of "code" to manage all this complexity.

Replacing the whole playbok with a huge "static" docker-compose file wouldn't work.

  1. Why not keep Ansible, but have it generate a docker-compose.yml file and ultimately run that?

This may be an alternative. Have Ansible do all it needs to do, but then generate one huge docker-compose.yml file and then have it start it somehow (via systemd or otherwise).

This adds an extra dependency on docker-compose. Thankfully, it's not awful Python software anymore, so such a dependency is not so terrible.

Generating the docker-compose.yml file would need to be done using some "central" role which all other roles inject data into. We don't like such huge central components in the playbook. Each role should be as independent as possible, with the playbook wiring them together.

For this reason, matrix-nginx-proxy is going away soon in our bye-bye-nginx-proxy branch.

Yes, the playbook could wire various services into a huge "docker compose configuration" variable in a similar way to how services are injected into variables like: devture_systemd_service_manager_services_list_auto, matrix_homeserver_container_extra_arguments_auto, etc., via the group_vars/matrix_servers file.

Right now, disabling a service allows each role to clean up after itself as it wishes (stopping its systemd services, etc). If roles just inject services into docker-compose.yml and something else starts it later, orphan containers may be left around. As a workaround, roles would not only need to inject configuration, but also possible stop containers (just like they stop and rmove systemd services right now). It's possible, but a different way of doing things - one that is more complex, I believe.

So, it may be possible and it may be similar to what we do now, but.. it's a different way of doing things.. And we consider it an uglier and "less native" way (at least on systemd-based distros).

  1. Why not have the playbook just start containers directly?

Each role in the playbook could start and stop containers and have Docker manage containers and auto-restarts, etc.

There are a few problems with this:

  • As far as I know, Docker does not support dependencies between containers (no depends_on as in Docker Compose). It may be fine for simple stuff, but we heavily rely on dependency definitions
  • See Why systemd below
  1. Why systemd?

I also don't see any we need to have Docker (or docker-compose) supervise services and dependencies between them, when the host already has a much better system for doing that - systemd.

Side-story: Yes, systemd cannot really supervise the container process it starts, because docker run merely tells the Docker daemon to start a container.. so the systemd service that "started" the container does not really own that child container process. In practice, this Docker implementation problem (which Podman solves) doesn't matter. Even right now, systemd can efficiently handle service dependencies/ordering, restarting of failed services, logging via systemd-journald, etc. All services that the playbook creates feel "native" on systemd-based distros.

Docker itself cannot manage service dependencies from what I know. It's just Docker Compose with its depends_on and other such configuration that can do it.

Relying on just Docker and systemd allows us to nicely support all distros which are based on these 2 technologies, which covers 99.99% of people. Some niche distros do not meet this criteria, unfortunately, but.. we can't support everything.

Also, using just Docker + systemd, possibly allows us to also support Podman as an alternative (see #520). It's still a pipe dream, but it may happen some day. However, if we add Docker Compose into the mix, we'd need to hope that Podman would also play nicely with our "compose file" via podman-compose (or whatever alternative they are trying to build).


In the end, there's always tradeoffs and people/setups that get excluded.

Some wish to run on a distro without systemd.. Others wish to run with Podman.. Others wish to use docker-compose. Others wish to run on Kubernetes.. Others wish to run on HashiCorp Nomad. Others hate Ansible and would rather this tool were written in something else.

One project cannot possible accommodate everyone. The fact that this Ansible playbook is currently the most popular deployment choice is proof that for most people, the tradeoffs were made correctly:

  • few people run non-systemd distros
  • few people wish to deploy Matrix on Kubernetes and that's why the Kubernetes charts are so immature (in terms of number of supported services, etc.)
  • few people require docker-compose added to the mix, and that's why there's no popular project doing that
  • few people need Podman and Podman is a pain in the ass (and alias docker=podman is a lie), and that's why Podman support has neither landed here (via a PR), nor an alternative playbook has appeared

If the majority of people's requirements were different, this playbook would have been dead and something else would have taken off in its place.

That said, I'm not against this playbook trying to accommodate some of these other communities and requirements. It's just.. a difficult problem to support everything amd most of us have no incentive to work on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
suggestion This issue is a feature request
Projects
None yet
Development

No branches or pull requests