Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In Docker, RustBoard can't bind to ::1 #4801

Closed
wchargin opened this issue Mar 19, 2021 · 3 comments · Fixed by #4804
Closed

In Docker, RustBoard can't bind to ::1 #4801

wchargin opened this issue Mar 19, 2021 · 3 comments · Fixed by #4804
Labels
core:rustboard //tensorboard/data/server/... type:bug

Comments

@wchargin
Copy link
Contributor

On current tensorflow/tensorflow@nightly:

$ docker run --rm -it tensorflow/tensorflow@sha256:31775f136e5fe2d50ae075100dff335e9ae0954b8f9b529791a8bf739f3dd415 sh
# rustboard="$(python -c 'import tensorboard_data_server as tds; print(tds.server_binary())')"
# "${rustboard}" --version
rustboard 0.4.0
# mkdir logs
# "${rustboard}" --logdir logs  
Error: Os { code: 99, kind: AddrNotAvailable, message: "Cannot assign requested address" }
# "${rustboard}" --logdir logs --host ::0
listening on [::]:6806
^C
# "${rustboard}" --logdir logs --host ::1
Error: Os { code: 99, kind: AddrNotAvailable, message: "Cannot assign requested address" }
# "${rustboard}" --logdir logs --host 0.0.0.0
listening on 0.0.0.0:6806
^C
# "${rustboard}" --logdir logs --host 127.0.0.1
listening on 127.0.0.1:6806
^C

It looks like we can bind to IPv4 wildcard and loopback, and IPv6
wildcard, but not IPv6 loopback. Poking around the internet, this seems
to be a common issue with Docker, but I can’t find an explanation;
everyone just says to bind to IPv4 instead.

We should do something nicer here: presumably, attempt to bind to IPv6,
and fall back to IPv4.

(Thanks to @tgolsson for the original report!)

@wchargin wchargin added type:bug core:rustboard //tensorboard/data/server/... labels Mar 19, 2021
@wchargin
Copy link
Contributor Author

Workaround in the meantime: apply the following patch inside your
container:

--- /usr/local/lib/python3.6/dist-packages/tensorboard/data/server_ingester.py	2021-03-19 16:52:22.000000000 +0000
+++ /usr/local/lib/python3.6/dist-packages/tensorboard/data/server_ingester.py.new	2021-03-19 17:19:44.675063636 +0000
@@ -116,6 +116,7 @@
             "--reload=%s" % reload,
             "--samples-per-plugin=%s" % samples_per_plugin,
             "--port=0",
+            "--host=127.0.0.1",
             "--port-file=%s" % (port_file_path,),
             "--die-after-stdin",
         ]

Or, if launching TensorBoard directly from the docker(1) cmdline,
invoke it like this:

$ docker run --rm -it tensorflow/tensorflow@sha256:31775f136e5fe2d50ae075100dff335e9ae0954b8f9b529791a8bf739f3dd415 sh -c 'p=/usr/local/lib/python3.6/dist-packages/tensorboard/data/server_ingester.py; awk '\''1; /port=0/ { printf "% 12s%s\n", "", "\"--host=127.0.0.1\"," }'\'' "$p" >"$p.tmp"; mv "$p.tmp" "$p"; tensorboard --logdir foobar --load_fast=true'

wchargin added a commit that referenced this issue Mar 19, 2021
Summary:
We now interpret `--host` as a string that can be `getaddrinfo`d rather
than requiring it to be a literal IP address. Thus, we can support
addresses like `localhost` that [may have multiple resolutions][play]:

```
[src/main.rs:3] "localhost:6006".to_socket_addrs().unwrap().collect::<Vec<_>>() = [
    127.0.0.1:6006,
    [::1]:6006,
]
```

[play]: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=922073bea0c690fd3d249ff3d822cde6

Thus, we change the default `--host` from `::1` to `localhost` to
support systems where `::1` may not be available. Should fix #4801.

We also improve the error message on bind failure: it’s pretty-printed
instead of `Debug`-formatted, and it includes the problematic address.

Test Plan:
From `tensorboard/data/server/`, run `cargo build`, then reproduce the
failure in Docker:

```
$ img=tensorflow/tensorflow@sha256:31775f136e5fe2d50ae075100dff335e9ae0954b8f9b529791a8bf739f3dd415
$ v="$PWD/target/debug:/rustboard-bin:ro"
$ docker run --rm -it -v "$v" "$img" /rustboard-bin/rustboard --logdir /tmp --host ::1
fatal: failed to bind to ("::1", 6806): Cannot assign requested address (os error 99)
```

…then try again with default `--host` and note that it works:

```
$ docker run --rm -it -v "$v" "$img" /rustboard-bin/rustboard --logdir /tmp
listening on 127.0.0.1:6806
```

(You may need to `docker kill $(docker ps -q)` the thing afterward.)

wchargin-branch: rust-host-getaddrinfo
wchargin-source: 0b612771320a6f14203137fbed1b6fc627b6d1c1
@tgolsson
Copy link

Thank you @wchargin, that patch worked perfectly.

@wchargin
Copy link
Contributor Author

Awesome; glad to hear it! Excited to hear any more feedback you may have
to offer now that you’ve gotten the thing to work, and I’ll get a proper
fix out for the IPv6 issue when I can.

wchargin added a commit that referenced this issue Mar 20, 2021
Summary:
We now interpret `--host` as a string that can be `getaddrinfo`d rather
than requiring it to be a literal IP address. Thus, we can support
hostnames like `localhost` that [may have multiple resolutions][play]:

```
[src/main.rs:3] ("localhost", 6006).to_socket_addrs().unwrap().collect::<Vec<_>>() = [
    127.0.0.1:6006,
    [::1]:6006,
]
```

[play]: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=15504b49f87e983b4748ca976dcb3bd1

Thus, we change the default `--host` from `::1` to `localhost` to
support systems where `::1` may not be available. Should fix #4801.

We also improve the error message on bind failure: it’s pretty-printed
instead of `Debug`-formatted, and it includes the problematic address.

Test Plan:
From `tensorboard/data/server/`, run `cargo build`, then reproduce the
failure in Docker:

```
$ img=tensorflow/tensorflow@sha256:31775f136e5fe2d50ae075100dff335e9ae0954b8f9b529791a8bf739f3dd415
$ v="$PWD/target/debug:/rustboard-bin:ro"
$ docker run --rm -it -v "$v" "$img" /rustboard-bin/rustboard --logdir /tmp --host ::1
fatal: failed to bind to ("::1", 6806): Cannot assign requested address (os error 99)
```

…then try again with default `--host` and note that it works:

```
$ docker run --rm -it -v "$v" "$img" /rustboard-bin/rustboard --logdir /tmp
listening on 127.0.0.1:6806
```

(You may need to `docker kill $(docker ps -q)` the thing afterward.)

wchargin-branch: rust-host-getaddrinfo
wchargin added a commit that referenced this issue Mar 22, 2021
Summary:
The main `tensorboard(1)` now proxies any `--extra_data_server_flags` to
the data server binary invoked as a subprocess. This would have been
useful as a less invasive workaround for #4801, and may serve as a
similar escape hatch in the future.

The argument to `--extra_data_server_flags` undergoes POSIX word
splitting, so you can pass multiple arguments by separating them by
spaces, quoting appropriately. Thus:

```
tensorboard --extra_data_server_flags="--host 'localhost or something'"
```

You can’t override flags that `tensorboard(1)` already passes, though,
because the data server will complain about conflicting arguments.
Thus, this interface is marked as unstable and is not intended to ever
be stabilized.

Test Plan:
Unit tests suffice.

wchargin-branch: cli-extra-data-server-flags
wchargin-source: 614d5bd9939c1833a816ff9a08363ffd8596c244
wchargin added a commit that referenced this issue Mar 30, 2021
Summary:
The main `tensorboard(1)` now proxies any `--extra_data_server_flags` to
the data server binary invoked as a subprocess. This would have been
useful as a less invasive workaround for #4801, and may serve as a
similar escape hatch in the future.

The argument to `--extra_data_server_flags` undergoes POSIX word
splitting, so you can pass multiple arguments by separating them by
spaces, quoting appropriately. Thus:

```
tensorboard --extra_data_server_flags="--host 'localhost or something'"
```

You can’t override flags that `tensorboard(1)` already passes, though,
because the data server will complain about conflicting arguments.
Thus, this interface is marked as unstable and is not intended to ever
be stabilized.

Test Plan:
Unit tests suffice.

wchargin-branch: cli-extra-data-server-flags
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core:rustboard //tensorboard/data/server/... type:bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants