In Docker, RustBoard can't bind to `::1` #4801

wchargin · 2021-03-19T17:24:26Z

On current tensorflow/tensorflow@nightly:

$ docker run --rm -it tensorflow/tensorflow@sha256:31775f136e5fe2d50ae075100dff335e9ae0954b8f9b529791a8bf739f3dd415 sh
# rustboard="$(python -c 'import tensorboard_data_server as tds; print(tds.server_binary())')"
# "${rustboard}" --version
rustboard 0.4.0
# mkdir logs
# "${rustboard}" --logdir logs  
Error: Os { code: 99, kind: AddrNotAvailable, message: "Cannot assign requested address" }
# "${rustboard}" --logdir logs --host ::0
listening on [::]:6806
^C
# "${rustboard}" --logdir logs --host ::1
Error: Os { code: 99, kind: AddrNotAvailable, message: "Cannot assign requested address" }
# "${rustboard}" --logdir logs --host 0.0.0.0
listening on 0.0.0.0:6806
^C
# "${rustboard}" --logdir logs --host 127.0.0.1
listening on 127.0.0.1:6806
^C

It looks like we can bind to IPv4 wildcard and loopback, and IPv6
wildcard, but not IPv6 loopback. Poking around the internet, this seems
to be a common issue with Docker, but I can’t find an explanation;
everyone just says to bind to IPv4 instead.

We should do something nicer here: presumably, attempt to bind to IPv6,
and fall back to IPv4.

(Thanks to @tgolsson for the original report!)

The text was updated successfully, but these errors were encountered:

wchargin · 2021-03-19T17:26:31Z

Workaround in the meantime: apply the following patch inside your
container:

--- /usr/local/lib/python3.6/dist-packages/tensorboard/data/server_ingester.py	2021-03-19 16:52:22.000000000 +0000
+++ /usr/local/lib/python3.6/dist-packages/tensorboard/data/server_ingester.py.new	2021-03-19 17:19:44.675063636 +0000
@@ -116,6 +116,7 @@
             "--reload=%s" % reload,
             "--samples-per-plugin=%s" % samples_per_plugin,
             "--port=0",
+            "--host=127.0.0.1",
             "--port-file=%s" % (port_file_path,),
             "--die-after-stdin",
         ]

Or, if launching TensorBoard directly from the docker(1) cmdline,
invoke it like this:

$ docker run --rm -it tensorflow/tensorflow@sha256:31775f136e5fe2d50ae075100dff335e9ae0954b8f9b529791a8bf739f3dd415 sh -c 'p=/usr/local/lib/python3.6/dist-packages/tensorboard/data/server_ingester.py; awk '\''1; /port=0/ { printf "% 12s%s\n", "", "\"--host=127.0.0.1\"," }'\'' "$p" >"$p.tmp"; mv "$p.tmp" "$p"; tensorboard --logdir foobar --load_fast=true'

Summary: We now interpret `--host` as a string that can be `getaddrinfo`d rather than requiring it to be a literal IP address. Thus, we can support addresses like `localhost` that [may have multiple resolutions][play]: ``` [src/main.rs:3] "localhost:6006".to_socket_addrs().unwrap().collect::<Vec<_>>() = [ 127.0.0.1:6006, [::1]:6006, ] ``` [play]: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=922073bea0c690fd3d249ff3d822cde6 Thus, we change the default `--host` from `::1` to `localhost` to support systems where `::1` may not be available. Should fix #4801. We also improve the error message on bind failure: it’s pretty-printed instead of `Debug`-formatted, and it includes the problematic address. Test Plan: From `tensorboard/data/server/`, run `cargo build`, then reproduce the failure in Docker: ``` $ img=tensorflow/tensorflow@sha256:31775f136e5fe2d50ae075100dff335e9ae0954b8f9b529791a8bf739f3dd415 $ v="$PWD/target/debug:/rustboard-bin:ro" $ docker run --rm -it -v "$v" "$img" /rustboard-bin/rustboard --logdir /tmp --host ::1 fatal: failed to bind to ("::1", 6806): Cannot assign requested address (os error 99) ``` …then try again with default `--host` and note that it works: ``` $ docker run --rm -it -v "$v" "$img" /rustboard-bin/rustboard --logdir /tmp listening on 127.0.0.1:6806 ``` (You may need to `docker kill $(docker ps -q)` the thing afterward.) wchargin-branch: rust-host-getaddrinfo wchargin-source: 0b612771320a6f14203137fbed1b6fc627b6d1c1

tgolsson · 2021-03-19T20:46:43Z

Thank you @wchargin, that patch worked perfectly.

wchargin · 2021-03-19T20:48:20Z

Awesome; glad to hear it! Excited to hear any more feedback you may have
to offer now that you’ve gotten the thing to work, and I’ll get a proper
fix out for the IPv6 issue when I can.

Summary: We now interpret `--host` as a string that can be `getaddrinfo`d rather than requiring it to be a literal IP address. Thus, we can support hostnames like `localhost` that [may have multiple resolutions][play]: ``` [src/main.rs:3] ("localhost", 6006).to_socket_addrs().unwrap().collect::<Vec<_>>() = [ 127.0.0.1:6006, [::1]:6006, ] ``` [play]: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=15504b49f87e983b4748ca976dcb3bd1 Thus, we change the default `--host` from `::1` to `localhost` to support systems where `::1` may not be available. Should fix #4801. We also improve the error message on bind failure: it’s pretty-printed instead of `Debug`-formatted, and it includes the problematic address. Test Plan: From `tensorboard/data/server/`, run `cargo build`, then reproduce the failure in Docker: ``` $ img=tensorflow/tensorflow@sha256:31775f136e5fe2d50ae075100dff335e9ae0954b8f9b529791a8bf739f3dd415 $ v="$PWD/target/debug:/rustboard-bin:ro" $ docker run --rm -it -v "$v" "$img" /rustboard-bin/rustboard --logdir /tmp --host ::1 fatal: failed to bind to ("::1", 6806): Cannot assign requested address (os error 99) ``` …then try again with default `--host` and note that it works: ``` $ docker run --rm -it -v "$v" "$img" /rustboard-bin/rustboard --logdir /tmp listening on 127.0.0.1:6806 ``` (You may need to `docker kill $(docker ps -q)` the thing afterward.) wchargin-branch: rust-host-getaddrinfo

Summary: The main `tensorboard(1)` now proxies any `--extra_data_server_flags` to the data server binary invoked as a subprocess. This would have been useful as a less invasive workaround for #4801, and may serve as a similar escape hatch in the future. The argument to `--extra_data_server_flags` undergoes POSIX word splitting, so you can pass multiple arguments by separating them by spaces, quoting appropriately. Thus: ``` tensorboard --extra_data_server_flags="--host 'localhost or something'" ``` You can’t override flags that `tensorboard(1)` already passes, though, because the data server will complain about conflicting arguments. Thus, this interface is marked as unstable and is not intended to ever be stabilized. Test Plan: Unit tests suffice. wchargin-branch: cli-extra-data-server-flags wchargin-source: 614d5bd9939c1833a816ff9a08363ffd8596c244

Summary: The main `tensorboard(1)` now proxies any `--extra_data_server_flags` to the data server binary invoked as a subprocess. This would have been useful as a less invasive workaround for #4801, and may serve as a similar escape hatch in the future. The argument to `--extra_data_server_flags` undergoes POSIX word splitting, so you can pass multiple arguments by separating them by spaces, quoting appropriately. Thus: ``` tensorboard --extra_data_server_flags="--host 'localhost or something'" ``` You can’t override flags that `tensorboard(1)` already passes, though, because the data server will complain about conflicting arguments. Thus, this interface is marked as unstable and is not intended to ever be stabilized. Test Plan: Unit tests suffice. wchargin-branch: cli-extra-data-server-flags

wchargin added type:bug core:rustboard //tensorboard/data/server/... labels Mar 19, 2021

wchargin mentioned this issue Mar 19, 2021

Fast data loading feedback (--load_fast=true; “RustBoard”) #4784

Open

wchargin mentioned this issue Mar 19, 2021

rust: support and default to --host localhost #4804

Merged

wchargin closed this as completed in #4804 Mar 20, 2021

wchargin mentioned this issue Mar 22, 2021

tensorboard-data-server 0.6.0 #4810

Merged

wchargin mentioned this issue Mar 22, 2021

cli: add --extra_data_server_flags #4811

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In Docker, RustBoard can't bind to `::1` #4801

In Docker, RustBoard can't bind to `::1` #4801

wchargin commented Mar 19, 2021

wchargin commented Mar 19, 2021

tgolsson commented Mar 19, 2021

wchargin commented Mar 19, 2021

In Docker, RustBoard can't bind to ::1 #4801

In Docker, RustBoard can't bind to ::1 #4801

Comments

wchargin commented Mar 19, 2021

wchargin commented Mar 19, 2021

tgolsson commented Mar 19, 2021

wchargin commented Mar 19, 2021

In Docker, RustBoard can't bind to `::1` #4801

In Docker, RustBoard can't bind to `::1` #4801