Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uploader: add --json flag to the list command #3480

Merged
merged 11 commits into from
Apr 7, 2020
16 changes: 16 additions & 0 deletions tensorboard/uploader/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,21 @@ py_test(
],
)

py_library(
name = "formatters",
srcs = ["formatters.py"],
srcs_version = "PY3",
)

py_test(
name = "formatters_test",
srcs = ["formatters_test.py"],
deps = [
":formatters",
"//tensorboard:test",
],
)

py_binary(
name = "uploader",
srcs = ["uploader_main.py"],
Expand All @@ -60,6 +75,7 @@ py_library(
":dev_creds",
":exporter_lib",
":flags_parser",
":formatters",
":server_info",
":uploader_lib",
":util",
Expand Down
5 changes: 5 additions & 0 deletions tensorboard/uploader/flags_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,11 @@ def define_flags(parser):
"list", help="list previously uploaded experiments"
)
list_parser.set_defaults(**{SUBCOMMAND_FLAG: SUBCOMMAND_KEY_LIST})
list_parser.add_argument(
"--json",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bileschi As pointed out by @wchargin, "prior art" varies. So I opted to err on the side of conciseness. Also see my earlier comments regarding this.

action="store_true",
help="print the experiments as JSON objects",
)

export = subparsers.add_parser(
"export", help="download all your experiment data"
Expand Down
99 changes: 99 additions & 0 deletions tensorboard/uploader/formatters.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 2019 → 2020 (and test)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done here and in formatters_test.py.

#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Helpers that format experiment metadata as strings."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import collections
import json

EXPERIMENT_METADATA_URL_JSON_KEY = "url"
ExperimentMetadataField = collections.namedtuple(
"ExperimentMetadataField",
("json_key", "readable_name", "value", "formatter"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused by the name collision of "formatter". Can this one be "field_formatter" or "to_readable" or similar?

Also, it still feels a bit intertwined to pass a lambda that is specifically used only in the ReadableFormatter case. I think you can remove the field entirely and automate this. In ReadableFormatter.format_experiment, you could:

if not metadata_field.value and not typeof(metadata_field.value) == 'int':
  readable_value = "[No %s]".format(metadata_field.readable_name)
else:
  readable_value = str(metadata_field.value)

The typecheck there is a little iffy (because, as written, None will trigger the substitution even for fields that should be int, and a future float-valued field containing 0.0 would trigger too). I guess ExperimentMetadataField could carry a Type enum, but that seems overwrought. I think it'd be fine to gloss over all that for now (maybe with a code comment about the limitations).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a formatter interface sounds reasonable to me, but this doesn’t
really feel like the right structure. The ExperimentMetadataField
struct is tightly coupled to the two implementations of formatters. The
readable_name nad formatter fields are only relevant to the
ReadableFormatter, and the json_key field is only relevant to the
JsonFormatter. And as written, although the name Experiment* is
everywhere, nothing actually knows about experiments.

Why introduce a new type with a big list of vague “fields” when we
already have the experiment proto? The interface could just be

class ExperimentFormatter(metaclass=abc.ABCMeta):
    def format(self, experiment):
        """Format an experiment.

        Args:
          experiment: An `experiment_pb2.Experiment` value.

        Returns:
          A string.
        """
        pass

The ReadableExperimentFormatter would encapsulate the readable field
names and how to format them, and the JsonExperiment would encapsulate
the JSON keys and how to order and indent them. The tests would be on
the same output that we would actually see from the binary rather than
just an arbitrary set of fields defined in the test case.

Then, you could remove the giant block of field definitions in
uploader_main.py, too: just print(formatter.format(experiment)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done effecting the refactoring that you suggested. This revision has pushed a lot of the variables related to the formatting into formatters.py, which is indeed nicer than before.

Also note that I changed the signature of the format_experiment() method slightly so that it takes two input arguments: experiment (a experiment_pb2.Experiment proto) and experiment_url (a string). Getting the latter from an Experiment proto requires the server_info module, which I think shouldn't be known to the formatter module, which is purely concerned with string formatting.

)


class BaseExperimentMetadataFormatter(object):
"""Abstract base class for formatting experiment metadata as a string."""

def format_experiment(self, experiment_metadata):
"""Format a list of `ExperimentMetadataField`s as a representing string.

Args:
experiment_metadata: A list of `ExperimentMetadataField`s that
describes an experiment.

Returns:
A string that represents the `experiment_metadata`.
"""
raise NotImplementedError()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Python it’s generally preferable to pass in abstract base class
methods, because otherwise implementations cannot call super().(),
which precludes cooperative multiclassing.

If you want to ensure that all concrete subclasses actually implement
this, consider making this an actual abc.ABCMeta rather than just
calling it an abstract base class in the docstring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.



class ReadableFormatter(BaseExperimentMetadataFormatter):
"""A formatter implementation that outputs human-readable text."""

def __init__(self, name_column_width):
"""Constructor of ReadableFormatter.

Args:
name_column_width: The width of the column that contains human-readable
field names (i.e., `readable_name` in `ExperimentMetadataField`).
Must be greater than the longest human-readable field name.
"""
super(ReadableFormatter, self).__init__()
self._name_column_width = name_column_width

def format_experiment(self, experiment_metadata):
output = []
for metadata_field in experiment_metadata:
if metadata_field.json_key == EXPERIMENT_METADATA_URL_JSON_KEY:
output.append(metadata_field.value)
else:
output.append(
"\t%s %s"
% (
metadata_field.readable_name.ljust(
self._name_column_width
),
metadata_field.formatter(metadata_field.value),
)
)
return "\n".join(output)


class JsonFormatter(object):
"""A formatter implementation: outputs metadata of an experiment as JSON."""

def __init__(self, indent):
"""Constructor of JsonFormatter.

Args:
indent: Size of indentation (in number of spaces) used for JSON
formatting.
"""
super(JsonFormatter, self).__init__()
self._indent = indent

def format_experiment(self, experiment_metadata):
return json.dumps(
collections.OrderedDict(
(metadata_field.json_key, metadata_field.value)
for metadata_field in experiment_metadata
),
indent=self._indent,
)
124 changes: 124 additions & 0 deletions tensorboard/uploader/formatters_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# Copyright 2019 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
# Lint as: python3
"""Tests for tensorboard.uploader.formatters."""

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

from tensorboard import test as tb_test
from tensorboard.uploader import formatters


class TensorBoardExporterTest(tb_test.TestCase):
def testReadableFormatterWithNonemptyNameAndDescription(self):
data = [
formatters.ExperimentMetadataField(
formatters.EXPERIMENT_METADATA_URL_JSON_KEY,
"URL",
"http://tensorboard.dev/deadbeef",
str,
),
formatters.ExperimentMetadataField(
"name",
"Name",
"A name for the experiment",
lambda x: x or "[No Name]",
),
formatters.ExperimentMetadataField(
"description",
"Description",
"A description for the experiment",
lambda x: x or "[No Description]",
),
]
formatter = formatters.ReadableFormatter(12)
output = formatter.format_experiment(data)
lines = output.split("\n")
self.assertLen(lines, 3)
self.assertEqual(lines[0], "http://tensorboard.dev/deadbeef")
self.assertEqual(lines[1], "\tName A name for the experiment")
self.assertEqual(
lines[2], "\tDescription A description for the experiment"
)

def testReadableFormatterWithEmptyNameAndDescription(self):
data = [
formatters.ExperimentMetadataField(
formatters.EXPERIMENT_METADATA_URL_JSON_KEY,
"URL",
"http://tensorboard.dev/deadbeef",
str,
),
formatters.ExperimentMetadataField(
"name", "Name", "", lambda x: x or "[No Name]",
),
formatters.ExperimentMetadataField(
"description",
"Description",
"",
lambda x: x or "[No Description]",
),
]
formatter = formatters.ReadableFormatter(12)
output = formatter.format_experiment(data)
lines = output.split("\n")
self.assertLen(lines, 3)
self.assertEqual(lines[0], "http://tensorboard.dev/deadbeef")
self.assertEqual(lines[1], "\tName [No Name]")
self.assertEqual(lines[2], "\tDescription [No Description]")

def testJsonFormatterWithEmptyNameAndDescription(self):
data = [
formatters.ExperimentMetadataField(
formatters.EXPERIMENT_METADATA_URL_JSON_KEY,
"URL",
"http://tensorboard.dev/deadbeef",
str,
),
formatters.ExperimentMetadataField(
"name", "Name", "", lambda x: x or "[No Name]",
),
formatters.ExperimentMetadataField(
"description",
"Description",
"",
lambda x: x or "[No Description]",
),
formatters.ExperimentMetadataField("runs", "Runs", 8, str,),
formatters.ExperimentMetadataField("tags", "Tags", 12, str,),
formatters.ExperimentMetadataField(
"binary_object_bytes", "Binary object bytes", 2000, str,
),
]
formatter = formatters.JsonFormatter(2)
output = formatter.format_experiment(data)
lines = output.split("\n")
self.assertLen(lines, 8)
self.assertEqual(lines[0], "{")
self.assertEqual(
lines[1], ' "url": "http://tensorboard.dev/deadbeef",'
)
self.assertEqual(lines[2], ' "name": "",')
self.assertEqual(lines[3], ' "description": "",')
self.assertEqual(lines[4], ' "runs": 8,')
self.assertEqual(lines[5], ' "tags": 12,')
self.assertEqual(lines[6], ' "binary_object_bytes": 2000')
self.assertEqual(lines[7], "}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just

expected_lines = [
    "{",
    '  "url": "http://tensorboard.dev/deadbeef",',
    # ...
    "}"
]
self.assertEqual(lines, expected_lines)

rather than asserting on each line individually? It’s easier to read,
wraps better, and gives better failure messages.

Alternative, you could just json.loads the thing and verify that the
object is correct rather than worrying about the details of whitespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.



if __name__ == "__main__":
tb_test.main()
74 changes: 62 additions & 12 deletions tensorboard/uploader/uploader_main.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
from tensorboard.uploader import auth
from tensorboard.uploader import exporter as exporter_lib
from tensorboard.uploader import flags_parser
from tensorboard.uploader import formatters
from tensorboard.uploader import server_info as server_info_lib
from tensorboard.uploader import uploader as uploader_lib
from tensorboard.uploader import util
Expand Down Expand Up @@ -65,6 +66,11 @@
_EXPERIMENT_NAME_MAX_CHARS = 100
_EXPERIMENT_DESCRIPTION_MAX_CHARS = 600

_EXPERIMENT_METADATA_JSON_INDENT = 2
_EXPERIMENT_METADATA_READABLE_NAME_COLUMN_WIDTH = 20
_EXPERIMENT_METADATA_URL_JSON_KEY = formatters.EXPERIMENT_METADATA_URL_JSON_KEY
_ExperimentMetadataField = formatters.ExperimentMetadataField


def _prompt_for_user_ack(intent):
"""Prompts for user consent, exiting the program if they decline."""
Expand Down Expand Up @@ -332,6 +338,15 @@ class _ListIntent(_Intent):
"""
)

def __init__(self, json=None):
"""Constructor of _ListIntent.

Args:
json: If and only if `True`, will print the list as pretty-formatted
JSON objects, one object for each experiment.
"""
self.json = json

def get_ack_message_body(self):
return self._MESSAGE

Expand All @@ -348,23 +363,58 @@ def execute(self, server_info, channel):
)
gen = exporter_lib.list_experiments(api_client, fieldmask=fieldmask)
count = 0

if self.json:
caisq marked this conversation as resolved.
Show resolved Hide resolved
formatter = formatters.JsonFormatter(
_EXPERIMENT_METADATA_JSON_INDENT
)
else:
formatter = formatters.ReadableFormatter(
_EXPERIMENT_METADATA_READABLE_NAME_COLUMN_WIDTH
)
for experiment in gen:
count += 1
experiment_id = experiment.experiment_id
url = server_info_lib.experiment_url(server_info, experiment_id)
print(url)
data = [
("Name", experiment.name or "[No Name]"),
("Description", experiment.description or "[No Description]"),
("Id", experiment.experiment_id),
("Created", util.format_time(experiment.create_time)),
("Updated", util.format_time(experiment.update_time)),
("Scalars", str(experiment.num_scalars)),
("Runs", str(experiment.num_runs)),
("Tags", str(experiment.num_tags)),
_ExperimentMetadataField(
_EXPERIMENT_METADATA_URL_JSON_KEY, "URL", url, str,
),
_ExperimentMetadataField(
"name", "Name", experiment.name, lambda x: x or "[No Name]",
),
_ExperimentMetadataField(
"description",
"Description",
experiment.description,
lambda x: x or "[No Description]",
),
_ExperimentMetadataField(
"id", "Id", experiment.experiment_id, str
),
_ExperimentMetadataField(
"created",
"Created",
util.format_time(experiment.create_time),
str,
),
_ExperimentMetadataField(
"updated",
"Updated",
util.format_time(experiment.update_time),
str,
),
_ExperimentMetadataField(
"scalars", "Scalars", experiment.num_scalars, str,
),
_ExperimentMetadataField(
"runs", "Runs", experiment.num_runs, str,
),
_ExperimentMetadataField(
"tags", "Tags", experiment.num_tags, str,
),
]
for (name, value) in data:
print("\t%s %s" % (name.ljust(12), value))
print(formatter.format_experiment(data))
sys.stdout.flush()
if not count:
sys.stderr.write(
Expand Down Expand Up @@ -550,7 +600,7 @@ def _get_intent(flags):
"Must specify experiment to delete via `--experiment_id`."
)
elif cmd == flags_parser.SUBCOMMAND_KEY_LIST:
return _ListIntent()
return _ListIntent(json=flags.json)
elif cmd == flags_parser.SUBCOMMAND_KEY_EXPORT:
if flags.outdir:
return _ExportIntent(flags.outdir)
Expand Down