Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorBoard 2.0.2 #2970

Merged
merged 10 commits into from
Nov 25, 2019
Merged

TensorBoard 2.0.2 #2970

merged 10 commits into from
Nov 25, 2019

Conversation

wchargin
Copy link
Contributor

Cherrypicks:

wchargin and others added 10 commits November 25, 2019 13:32
Summary:
This commit adds an RPC definition by which the uploader can connect to
the frontend web server at the start of an upload session. This resolves
a number of outstanding issues:

  - The frontend can tell the uploader which backend server to connect
    to, rather than requiring a hard-coded endpoint in the uploader.
  - The frontend can tell the uploader how to generate experiment URLs,
    rather than requiring the backend server to provide this information
    (which it can’t, really, in general).
  - The frontend can check whether the uploader client is recent enough
    and instruct the end user to update if it’s not.
  - The frontend can warn the user about transient issues in case the
    service is down, degraded, under maintenance, etc.

An endpoint `https://tensorboard.dev/api/uploader` on the server will
provide this information.

Test Plan:
Unit tests suffice.

wchargin-branch: uploader-serverinfo-protos
Summary:
The new `tensorboard dev list` command prints links to your experiments.
This is implemented by repurposing the `StreamExperiments` export RPC,
which only includes experiment IDs. We can expand this to additionally
show useful metadata: experiment creation time and last-modified time;
total number of scalars; counts of runs, tags, or time series; and
selected run and tag names could all be useful to include.

Test Plan:
Ran `tensorboard dev list` on an account with 12 experiments and an
account with no experiments, starting from both logged-in and logged-out
states. Verified that the printed experiment links resolve correctly.
Verified that the normal export flow still works.

wchargin-branch: uploader-list
Test Plan:
Running against a local frontend server, `tensorboard/2.1.0a0` shows up
in the server logs where previously there was `python-requests/2.22.0`.
Unit tests also included.

wchargin-branch: uploader-user-agent
Summary:
This commit integrates the new `ServerInfo` RPC with the uploader. It’s
not currently enabled by default: the current behavior is the same as
the existing behavior, except that experiment URLs now properly have a
trailing slash. We’ll soon remove the hard-coded API backend endpoint
behavior to enable this by default.

Test Plan:
Running a test frontend and a test backend, we observe the following
behavior with different arguments:

| `--origin` | `--api_endpoint` | → | URL origin | Backend |
|------------|------------------|---|------------|---------|
| empty      | empty            |   | prod       | prod    |
| empty      | prod             |   | prod       | prod    |
| empty      | test             |   | prod       | test    |
| test       | empty            |   | test       | test    |
| test       | test             |   | test       | test    |
| test       | prod             |   | test       | prod    |

Here, “test” in the `--origin` column is like `http://localhost:8080`,
and “test” in the `--api_endpoint` column is like `localhost:10000`.
Note that the no-argument case is equivalent to the explicitly-empty
argument case because both arguments have empty default values.

Explicitly specifying `--origin https://tensorboard.dev`, with any value
of `--api_endpoint`, fails with “Corrupt response from backend” because
server-side support has not yet been rolled out. This is expected.

Specifying `--origin http://localhost:0` or any other unreachable host
fails with `ECONNREFUSED` and a nice message.

My test frontend is configured to reject clients below version 2.0.0 and
warn on clients below version 2.0.1. Changing the local `version.py` to
`2.0.0a0` or `2.0.1a0` exercises these cases.

Finally, double-checked that building the Pip package, installing it,
and running `tensorboard dev list` properly uses the production backend
and prints URLs that resolve to the production frontend.

wchargin-branch: uploader-serverinfo-request
Summary:
This extends the `StreamExperiments` RPC such that the client can
specify a set of additional metadata fields that the server should
include, like “creation time” or “number of scalar points”.

The format is both forward- and backward-compatible. Servers are
expected to send responses with both `experiment_ids` and `experiments`
until we drop support for clients that do not support `experiments`, at
which point they need only send `experiments`.

Test Plan:
Unit test added to simulate the future behavior of servers.

wchargin-branch: streamexperiments-metadata
Summary:
We’ve deployed production servers that support the handshake protocol
specified in tensorflow#2878 and implemented on the client in tensorflow#2879. This commit
enables that protocol by default.

Test Plan:
Running `bazel run //tensorboard -- dev list` still properly connects
and prints valid URLs. Re-running with the TensorBoard version patched
to `2.0.0a0` (in `version/version.py`) properly causes a handshake
failure. Setting `--origin` to point to a non-prod frontend properly
connects to the appropriate backend. Setting `--api_endpoint` to point
to a non-prod backend connects directly, skipping the handshake, and
printing `https://tensorboard.dev` URLs. Specifying both `--origin` and
`--api_endpoint` performs the handshake and overrides the backend
server only, printing URLs corresponding to the listed frontend.

Running `git grep api.tensorboard.dev` no longer finds any code results.

As a double check, building the Pip package and running it in a new
virtualenv still has a working `tensorboard dev upload` flow.

wchargin-branch: uploader-handshake
Summary:
This commit teaches the uploader to display experiment metadata included
in `StreamExperiments` responses by supported servers. For servers
without this support, the change is a backward-compatible no-op.

The format is intentionally undocumented and not under any compatibility
guarantees, but is designed to be easily parseable for ad hoc usage. For
instance, this simple one-liner finds experiments with lots of points so
that the user can delete them:

```
tensorboard dev list |
    awk '$1 == "Id" { id = $2 } $1 == "Scalars" && $2 > 1000 { print id }'
```

Test Plan:
Running against current prod, which does not yet support the new RPCs,
the behavior is unchanged:

```
$ bazel run //tensorboard -- dev list
https://tensorboard.dev/experiment/IAVF94GPSWWBTvonQe4kgQ/
https://tensorboard.dev/experiment/LiQNYkOHRSGEWj42xtgtjA/
<snip>
Total: 12 experiment(s)
```

Running against a local server with support for the new RPCs, we see
lots of additional data (tested on both Linux and Windows):

```
$ bazel run //tensorboard -- dev --origin http://localhost:8080 --grpc_creds_type ssl_dev list
http://localhost:8080/experiment/WtPawgPIQXi2SZ1fQszOFA/
	Id         WtPawgPIQXi2SZ1fQszOFA
	Created    2019-11-25 10:30:18 (23 seconds ago)
	Updated    2019-11-25 10:30:39 (just now)
	Scalars    18814
	Runs       21
	Tags       7
http://localhost:8080/experiment/jD7Qc7l6S8Wy5gWKYTAHOA/
	Id         jD7Qc7l6S8Wy5gWKYTAHOA
	Created    2019-11-13 18:32:06
	Updated    2019-11-13 18:32:06
	Scalars    0
	Runs       0
	Tags       0
http://localhost:8080/experiment/do8uvvEOSNWOUEANmQIprQ/
	Id         do8uvvEOSNWOUEANmQIprQ
	Created    2019-11-13 18:15:25
	Updated    2019-11-13 18:15:37
	Scalars    3208
	Runs       8
	Tags       4
<snip>
Total: 9 experiment(s)
```

Also tested that the `tensorboard dev export` service still works
against both old and new servers.

wchargin-branch: uploader-list-metadata
Copy link
Contributor

@psybuzz psybuzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wchargin wchargin merged commit 2ef7dd2 into tensorflow:2.0 Nov 25, 2019
@wchargin wchargin deleted the release-2.0.2 branch November 25, 2019 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants