Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman-3.3.0 - Changes to default /etc/hosts handling are breaking container workloads #11282

Closed
srcshelton opened this issue Aug 19, 2021 · 19 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. stale-issue

Comments

@srcshelton
Copy link
Contributor

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

With podman-3.2.3 and prior, podman could (optionally) write hosts entries to the container /etc/hosts file.

With the podman-3.3.0 release-candidates, this behaviour has changed and /etc/hosts within the container is now bind-mounted by default. This is causing failures with a wide-spread and unpredictable blast-radius as a variety of services are revealed to write a new temporary file and then move this over /etc/hosts, which is now failing.

There's the --no-hosts option which prevents and /etc/host being written at all, but there doesn't appear to be any option to revert to the pre-3.3.0 functionality.

I've only tested very briefly, but new failures are with containers performing software builds/deployments (which legitimately try to deploy a new /etc/hosts file) and - randomly - the spampd service which it turns out tries to replace /etc/hosts on startup and fails otherwise with a message reading sed: can't move '/etc/hostsCGNQu3' to '/etc/hosts': Device or resource busy.

In general, I'd hope any potentially-breaking change such as this would be opt-in rather than opt-out, especially when first introduced.

Output of podman version:

podman version 3.3.0-rc3

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.22.0
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: cgroupfs
  cgroupVersion: v2
  conmon:
    package: app-emulation/conmon-2.0.29
    path: /usr/bin/conmon
    version: 'conmon version 2.0.29, commit: 7e6de6678f6ed8a18661e1d5721b81ccee293b9b'
  cpus: 8
  distribution:
    distribution: gentoo
    version: unknown
  eventLogger: file
  hostname: dellr330
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.13.7-gentoo
  linkmode: dynamic
  memFree: 2497138688
  memTotal: 67267272704
  ociRuntime:
    name: crun
    package: app-emulation/crun-0.21
    path: /usr/bin/crun
    version: |-
      crun version 0.21
      commit: c4c3cdf2ce408ed44a9e027c618473e6485c635b
      spec: 1.0.0
      +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_AUDIT_WRITE,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_MKNOD,CAP_NET_BIND_SERVICE,CAP_NET_RAW,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: app-emulation/slirp4netns-1.1.12
    version: |-
      slirp4netns version 1.1.12
      commit: 7a104a101aa3278a2152351a082a6df71f57c9a3
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.1
  swapFree: 25234563072
  swapTotal: 25769787392
  uptime: 195h 15m 38.63s (Approximately 8.12 days)
registries:
  localhost:5000:
    Blocked: false
    Insecure: true
    Location: localhost:5000
    MirrorByDigestOnly: false
    Mirrors: []
    Prefix: localhost:5000
  search:
  - docker.io
  - docker.pkg.github.com
  - quay.io
  - public.ecr.aws
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 24
    paused: 0
    running: 22
    stopped: 2
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev
  graphRoot: /space/podman/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageStore:
    number: 69
  runRoot: /var/run/podman
  volumePath: /space/podman/volumes
version:
  APIVersion: 3.3.0-rc3
  Built: 1629323346
  BuiltTime: Wed Aug 18 22:49:06 2021
  GitCommit: 88559c197da3d05c7758920bce90d07e0f066101
  GoVersion: go1.16.7
  OsArch: linux/amd64
  Version: 3.3.0-rc3

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md)

Yes

@openshift-ci openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Aug 19, 2021
@Luap99
Copy link
Member

Luap99 commented Aug 19, 2021

Can you provide a reproducer, podman always bind mounts a custom /etc/hosts by default. This is not new.

@rhatdan
Copy link
Member

rhatdan commented Aug 24, 2021

@srcshelton Are you still seeing this issue?

@srcshelton srcshelton changed the title podman-3.3.0_rc* - Changes to default /etc/hosts handling is breaking container workloads podman-3.3.0 - Changes to default /etc/hosts handling is breaking container workloads Aug 24, 2021
@srcshelton
Copy link
Contributor Author

This is also an issue in podman-3.3.0 release.

I'm working on a reproducer, but something has definitely changed since podman-3.2.3 :(

@srcshelton srcshelton changed the title podman-3.3.0 - Changes to default /etc/hosts handling is breaking container workloads podman-3.3.0 - Changes to default /etc/hosts handling are breaking container workloads Aug 24, 2021
@srcshelton
Copy link
Contributor Author

Hmm - so this is weird...

The general problem is:

$ sudo podman run --name portage docker.io/gentoo/portage:latest true
$ sudo podman container run -it --name stage3 --volumes-from portage:ro docker.io/gentoo/stage3:amd64-nomultilib /bin/bash
# CONFIG_PROTECT="-*" FEATURES="-ipc-sandbox -network-sandbox -pid-sandbox" emerge -v baselayout

… but I now see that even with podman-3.2.3 this fails with:

!!! copy /var/tmp/portage/sys-apps/baselayout-2.7/image/etc/hosts -> /etc/hosts failed.
!!! [Errno 16] Device or resource busy: b'/etc/hosts#new' -> b'/etc/hosts'

>>> Failed to install sys-apps/baselayout-2.7, Log file:

>>>  '/var/tmp/portage/sys-apps/baselayout-2.7/temp/build.log'

However, there's a process to update spamassassin/spampd which with podman-3.2.3 outputs the following:

Update available for channel updates.spamassassin.org: 1892539 -> 1892562
http: (curl) GET http://sa-update.verein-clean.net/1892562.tar.gz, success
http: (curl) GET http://sa-update.verein-clean.net/1892562.tar.gz.sha512, success
http: (curl) GET http://sa-update.verein-clean.net/1892562.tar.gz.asc, success
Update was available, and was downloaded and installed successfully

… whereas the exact same thing under podman-3.3.0 instead fails with:

sed: can't move '/etc/hostskVTNq3' to '/etc/hosts': Device or resource busy

But previously both used to work!

So there appears to be some regression that I didn't spot at the time which causes the baselayout build to fail when attempting to overwrite /etc/hosts which came in with podman-3.2.3 or before, but then a further issue which is reliably impacting the spamassassin update process which has changed between podman-3.2.3 and podman-3.3.0.

@srcshelton
Copy link
Contributor Author

Ah - I've done some digging, and the code causing the sed error above is only executed if the system's hostname doesn't already exist in /etc/hosts - so the difference actually appears to be that, with the identical invocation options, an image run with podman-3.2.3 gives a truthy response to hostname -s and this same value appears in /etc/hosts; whereas with podman-3.3.0 a truthy response to hostname -s is also found, but the value doesn't appear in /etc/hosts.

@srcshelton
Copy link
Contributor Author

srcshelton commented Aug 25, 2021

… although having said this, something has changed as with podman-3.2.3 all builds succeed whereas with podman-3.3.0 and the exact same code, all builds fail with the above error.

Observations:

When containers are started with --network=host, with podman-3.2.3, /etc/hosts ends with:

127.0.1.1 <hostname> <hostname> <container_name>

… whereas this entry does not exist for the identical invocation with podman-3.3.0.

Even if run with --hostname=<hostname>, podman-3.3.0 no longer writes an entry for the specified name to /etc/hosts when --network=host is also specified.

Example cases:

(… with --hostname=testhost)

podman version Additional /etc/hosts entries with container-networking Additional /etc/hosts entries with host-networking
3.2.3 172.18.9.233 testhost great_tereshkova 172.18.0.1 host.containers.internal 127.0.1.1 dellr330 testhost pensive_jemison
3.3.0 172.18.9.234 testhost nervous_johnson 172.18.0.1 host.containers.internal (none)

(I'm not sure how this impacts on the baselayout failure above - but it may be reacting differently if the apparent system hostname is not in /etc/hosts compared to when it is...)

@mheon
Copy link
Member

mheon commented Aug 25, 2021

OK, this one was intended - see #10319

I listed this as a bugfix and not a change in the release notes given it was unintended behavior.

@mheon
Copy link
Member

mheon commented Aug 25, 2021

I wonder if we need a flag or option to add these entries back.

@srcshelton
Copy link
Contributor Author

I've also just realised that across various rebuilds of the physical host itself, I somehow ended up with a generic /etc/hosts file without any entries for the host itself 😲

Restoring these entries actually seems to have fixed the problem, including (for reasons still not entirely clear to me…) the baselayout build issue.

@mheon
Copy link
Member

mheon commented Aug 28, 2021

That is a separate issue, please open a new GH issue with full details.

@srcshelton
Copy link
Contributor Author

I wonder if we need a flag or option to add these entries back.

There's already the --no-hosts option... perhaps this could be updated to something along the lines of --hosts=<none|add> or a corresponding --write-hosts boolean added (which would perhaps better maintain backwards compatibility, but add yet more options)?

But if feels desirable to me to provide an option to add the container name to /etc/hosts with host networking, otherwise the container can't resolve it's own hostname without manually adding the missing entry.

@Luap99
Copy link
Member

Luap99 commented Sep 8, 2021

Can you use --add-host name:ip?

@adelton
Copy link
Contributor

adelton commented Sep 15, 2021

The change from #10319 / #11118 also breaks rootless containers.

Consider the following use-case where we set the hostname for a rootless pod and an IP address for a rootless container with --add-host. (We also use sudo ip link / sudo ip netns to configure that IP address to actually work for rootless containers, see https://github.com/freeipa/freeipa-container/blob/master/tests/run-master-and-replica.sh#L114-L128 for the full setup if interested.)

With podman-3.2.3-1.fc34.x86_64 I get:

$ podman pod create --name test-hostname --hostname machine.example.test --add-host machine.example.test:172.29.0.1
40b60852dfd48d40ff084d230572dd33c3fcf72ad8efc57a54aa42af98ef5a62
$ podman run --rm --pod test-hostname registry.fedoraproject.org/fedora:34 cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
172.29.0.1 machine.example.test
# used by slirp4netns
10.0.2.100	machine.example.test 40b60852dfd4-infra
10.0.2.2 host.containers.internal
127.0.1.1 machine.example.test brave_nightingale
$ podman run --rm --pod test-hostname registry.fedoraproject.org/fedora:34 python3 -c 'import socket; print(socket.gethostbyaddr("machine.example.test"))'
('machine.example.test', ['brave_nightingale'], ['127.0.1.1'])

Not the best result, my 172.29.0.1 machine.example.test is overshadowed by that 127.0.1.1 line, but at least gethostbyaddr gives me the correct hostname.

However, with podman-3.3.1-1.fc34.x86_64, that is now completely broken:

$ podman pod create --name test-hostname --hostname machine.example.test --add-host machine.example.test:172.29.0.1
be4669ba40ba018668c5aa057299f70328a3cdc9b9326b79a041ccdf4121b4ea
$ podman run --rm --pod test-hostname registry.fedoraproject.org/fedora:34 cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
172.29.0.1 machine.example.test
# used by slirp4netns
10.0.2.100	machine.example.test be4669ba40ba-infra
10.0.2.2 host.containers.internal
127.0.0.1 machine.example.test cranky_ritchie
$ podman run --rm --pod test-hostname registry.fedoraproject.org/fedora:34 python3 -c 'import socket; print(socket.gethostbyaddr("machine.example.test"))'
('localhost', ['localhost.localdomain', 'localhost4', 'localhost4.localdomain4'], ['127.0.0.1'])

I'd really appreciate it if podman stopped changing the automagic logic and just gave users tools to configure exactly what they want.

@mheon
Copy link
Member

mheon commented Sep 15, 2021

Can you open a new bug about this? This seems like a different issue (adding duplicate hosts entries under some circumstances).

If you really want to manage /etc/hosts entirely yourself, we provide this ability already (via the --no-hosts flag).

@adelton
Copy link
Contributor

adelton commented Sep 15, 2021

Thanks, I've filed #11596 now.

The problem with --no-hosts is, prevents me from using --add-host. So I could not really manage the /etc/hosts content because I would not be able to populate it from outside of the container, and in read-only containers the /etc/hosts is read-only so within the container I could not change it either.

adelton added a commit to adelton/freeipa-container that referenced this issue Sep 23, 2021
adelton added a commit to adelton/freeipa-container that referenced this issue Sep 27, 2021
@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Oct 18, 2021

This is what we have currently. In the main branch, doe this fix the original issue?

$ ./bin/podman run fedora cat /etc/hosts 
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
# used by slirp4netns
10.0.2.100	ab21f5748bdd infallible_galileo
10.0.2.2 host.containers.internal

The name of the container and id are configured to the containers address. The host.containers.internal, will point at the default (first) ip of the host machine, if it can figure it out.

@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Nov 18, 2021

Since I never heard back, I am going to consider this issue fixed. Reopen if I am mistaken.

@rhatdan rhatdan closed this as completed Nov 18, 2021
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. stale-issue
Projects
None yet
Development

No branches or pull requests

5 participants