Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libct/cg/sd: fix dbus connection leak #2936

Closed
wants to merge 3 commits into from

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented May 4, 2021

TL;DR: fixes a regression (open fd leak) caused by PR #2923. This is an alternative to #2937.

When running libcontainer/integration/TestSystemdFreeze 2000 times,
I got "too many open files" error after a while:

[vagrant@localhost integration]$ sudo -E PATH=$PATH ./integration.test -test.v -test.run Freeze -test.count 2000
<...>
=== RUN   TestFreeze
--- PASS: TestFreeze (0.28s)
=== RUN   TestSystemdFreeze
--- PASS: TestSystemdFreeze (0.42s)
=== RUN   TestFreeze
    utils_test.go:86: exec_test.go:536: unexpected error: container_linux.go:380: starting container process caused: process_linux.go:338: starting init process command caused: fork/exec /proc/self/exe: too many open files
        
--- FAIL: TestFreeze (0.16s)
=== RUN   TestSystemdFreeze
    utils_test.go:86: exec_test.go:536: unexpected error: container_linux.go:380: starting container process caused: process_linux.go:338: starting init process command caused: fork/exec /proc/self/exe: too many open files

<...>

--- FAIL: TestFreeze (0.00s)
=== RUN   TestSystemdFreeze
    utils_test.go:86: exec_test.go:536: unexpected error: container_linux.go:380: starting container process caused: process_linux.go:338: starting init process command caused: open /dev/null: too many open files
        
--- FAIL: TestSystemdFreeze (0.16s)
=== RUN   TestFreeze
    utils_test.go:86: exec_test.go:512: unexpected error: untar error "fork/exec /bin/sh: too many open files": ""
        
--- FAIL: TestFreeze (0.00s)
=== RUN   TestSystemdFreeze
    utils_test.go:86: exec_test.go:536: unexpected error: container_linux.go:364: creating new parent process caused: container_linux.go:496: including execfifo in cmd.Exec setup caused: open /tmp/libcontainer548959440/test/exec.fifo: too many open files
       
 --- FAIL: TestSystemdFreeze (0.17s)
=== RUN   TestFreeze
    utils_test.go:86: exec_test.go:512: unexpected error: untar error "open /dev/null: too many open files": ""
        
<...>

Indeed, systemd cgroup manager never closes its dbus connection.

This was not a problem before PR #2923 because we only had a single connection
for the whole runtime. Now it is per manager instance, so we leak a
connection (apparently two sockets) per cgroup manager instance.
One possible way to fix this is to go back to using a single global dbus connection. That is implemented by #2937. The alternative, that this PR implements, is to

  1. Close the connection from cgroup manager's Destroy (assuming the
    cgroup manager won't be used after Destroy -- even in case it will be,
    a new connection will be initiated). This helps for cases when
    the container (and its cgroup) are actually removed.

  2. Set a finalizer in newDbusConnManager(), so once it is garbage
    collected, the closeConnection() method is called. This helps for
    cases of a long-running runtime (such as kubernetes) which uses
    a libcontainer/cgroup/systemd manager to create (but not remove)
    a cgroup.

A test case is added in a separate commit. Before the fix, it always fails
like this:

        exec_test.go:2030: extra fd 8 -> socket:[659703]
        exec_test.go:2030: extra fd 11 -> socket:[658715]
        exec_test.go:2033: found 2 extra fds after container.Run
    --- FAIL: TestFdLeaksSystemd (0.20s)

@kolyshkin kolyshkin added this to the 1.0.0-rc94 milestone May 4, 2021
@kolyshkin kolyshkin force-pushed the dbus-cm-leak branch 2 times, most recently from 39ff971 to 5237064 Compare May 5, 2021 00:03
@kolyshkin kolyshkin mentioned this pull request May 5, 2021
7 tasks
@kolyshkin
Copy link
Contributor Author

The alternative is to revert back to a single per-runtime connection, rather than per-cgroup manager. Let me work on that.

Comment on lines +26 to +28
runtime.SetFinalizer(m, func(d *dbusConnManager) {
d.closeConnection()
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a better solution than switching back to a single-connection setup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only issue with this PR I can think of is, if a runtime creates a lot of containers and keeps references to all of them (and thus to cgroup manager for every container), there will be N*2 opened sockets rather than just 2.

The issue with the other PR is it mixes root and rootless connection into the same variable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubelet could be operating on up to ~200 pods (typical per node limit) on a single node so there is the overhead of having to create/destroy connections to dbus frequently. We'll have to measure the impact of that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Takes about 1 ms to connect on my laptop.

Here is a a benchmark:

package systemd

import (
	"os"
	"testing"

	systemdDbus "github.com/coreos/go-systemd/v22/dbus"
)

func benchmarkDbusConn(b *testing.B, closeConn bool) {
	cm := newDbusConnManager(os.Getuid() != 0)

	var c *systemdDbus.Conn
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		var err error
		if closeConn {
			cm.closeConnection()
		}
		c, err = cm.getConnection()
		if err != nil {
			b.Fatal(err)
		}
	}
	b.Log(c.GetManagerProperty("Version"))
	cm.closeConnection()
}

func BenchmarkDbusConnReuse(b *testing.B) {
	benchmarkDbusConn(b, false)
}

func BenchmarkDbusConnReopen(b *testing.B) {
	benchmarkDbusConn(b, true)
}

and the results:

kir@kir-rhat systemd]$ go test -v -run 345 -bench .
goos: linux
goarch: amd64
pkg: github.com/opencontainers/runc/libcontainer/cgroups/systemd
cpu: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
BenchmarkDbusConnReuse
    dbus_test.go:25: "v246.10-1.fc33" <nil>
    dbus_test.go:25: "v246.10-1.fc33" <nil>
    dbus_test.go:25: "v246.10-1.fc33" <nil>
    dbus_test.go:25: "v246.10-1.fc33" <nil>
    dbus_test.go:25: "v246.10-1.fc33" <nil>
BenchmarkDbusConnReuse-4    	78536199	        13.88 ns/op
BenchmarkDbusConnReopen
    dbus_test.go:25: "v246.10-1.fc33" <nil>
    dbus_test.go:25: "v246.10-1.fc33" <nil>
    dbus_test.go:25: "v246.10-1.fc33" <nil>
    dbus_test.go:25: "v246.10-1.fc33" <nil>
BenchmarkDbusConnReopen-4   	    1520	    804216 ns/op
PASS
ok  	github.com/opencontainers/runc/libcontainer/cgroups/systemd	3.421s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely, there's also some other overhead, most notably opened fds.

cyphar
cyphar previously approved these changes May 5, 2021
Copy link
Member

@cyphar cyphar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@kolyshkin
Copy link
Contributor Author

Rebased (minor conflicts in libct/int), addressed a review nit.

@kolyshkin
Copy link
Contributor Author

CI failure is a known flake (#2907) which is appearing a lot lately.

AkihiroSuda
AkihiroSuda previously approved these changes May 6, 2021
kolyshkin added 3 commits May 6, 2021 12:22
Commit 47ef9a1 forgot to wrap GetManagerProperty("ControlGroup")
into retryOnDisconnect. Since there's one other user of
GetManagerProperty, add getManagerProperty wrapper and use it.

Fixes: 47ef9a1

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
When running libcontainer/integration/TestSystemdFreeze 2000 times,
I got "too many open files" error. Indeed, systemd cgroup manager
never closes its dbus connection.

This was not a problem before because we only had a single connection
for the whole runtime. Now it is per manager instance, so we leak a
connection (apparently two sockets) per cgroup manager instance.

The fix is to

1. Close the connection from cgroup manager's Destroy (assuming the
   cgroup manager won't be used after Destroy -- in case it will be,
   a new connection will be initiated). This helps for cases when
   the container (and its cgroup) are actually removed.

2. Set a finalizer in newDbusConnManager(), so once it is garbage
   collected, the closeConnection() method is called. This helps for
   cases of a long-running runtime (such as kubernetes) which uses
   libcontainer's cgroup manager to create a cgroup.

A test case is to be added in the next commit.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Add a test to check that container.Run do not leak file descriptors.

Before the previous commit, it fails like this:

    exec_test.go:2030: extra fd 8 -> socket:[659703]
    exec_test.go:2030: extra fd 11 -> socket:[658715]
    exec_test.go:2033: found 2 extra fds after container.Run

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@AkihiroSuda
Copy link
Member

Merged #2937

@AkihiroSuda AkihiroSuda closed this May 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants