uploader: add `--json` flag to the `list` command #3480

caisq · 2020-04-05T17:12:50Z

Motivation for features / changes
- Fulfill feature request bug b/153232102
Technical description of changes
- Add the --json flag to the list subcommand of tensorboard dev.
- If the flag is used, the experiments will be printed as a JSON object mapping experiment URLs to experiment data (name, description, runs, tags, etc.)
Screenshots of UI changes
Detailed steps to verify changes work correctly (as executed by you)
- Manually ran tensorboard dev list --json (see screenshot above)
Alternate designs / implementations considered
- Output a single big json array at the end:
  - Pro: may be easier to parse programmatically
  - Con: no streaming

bileschi · 2020-04-06T15:03:51Z

Do we envision more formats being desired here? (csv?). Did you consider whether this should be a more general --output_format flag? Is there guidance or precedent we can follow?

caisq · 2020-04-06T15:14:33Z

@bileschi Good question. I opted for the current approach (i.e., --json instead of --output_format=json) for following reasons:

I don't think it's a good idea to show the data in the csv format on the screen in general. The experiment description can contain newline characters, leading to potentially messy and hard-to-read outputs (see https://stackoverflow.com/questions/566052/can-you-encode-cr-lf-in-into-csv-files). JSON handles this well, by using the \n escape character in its string values.
Given --json will be supported, I don't see too much extra value in supporting csv additionally.
Even if we do support csv in the future, it makes sense to output the results to a file directly, in which case a separate arg such as --csv_file=... makes sense.

wchargin

Thanks for writing this! Appreciate it.

+1 to @bileschi’s question. In my experience, precedent is a bit mixed.
The JavaScript world often uses --json (flow, webpack, npm show,
jest, …), but JSON is obviously more common there (it’s the “JS” in
“JSON”, after all). Tools like gcloud have --format json, but tools
like grpc_cli have --json_output.

Not sure that I have a strong opinion here.

wchargin · 2020-04-06T15:13:05Z

tensorboard/uploader/uploader_main.py

+        """Constructor of _ListIntent.
+
+        Args:
+          json: If and ony if `True`, will print the list as a pretty-formatted


sp.: ony → only

wchargin · 2020-04-06T15:31:56Z

tensorboard/uploader/uploader_main.py

+        if self.json:
+            print(json.dumps(experiments_json, indent=2))


So the output here looks like this:

[ { "URL": "https://tensorboard.dev/experiment/abcdefghijklmnopqrstuv/", "Name": "[No Name]", "Description": "[No Description]", "Id": "abcdefghijklmnopqrstuv", "Created": "2019-10-25 07:22:30", "Updated": "2019-10-25 07:22:34", "Scalars": "3208", "Runs": "8", "Tags": "4" } ]

There are a few problems with this:

The keys are the same as the human-readable form, and in particular
are capitalized and (after uploader: display binary object bytes in tensorboard dev list output #3464) may have whitespace.

Numeric values like runs and tags are strings, not ints.

Name and description have dummy values instead of proper ""s.

The output is slurped and not streamed.

On (1): After #3464, this will have an extra "Binary object bytes"
key. But this is pretty unidiomatic. JSON keys do not typically have
whitespace; you can’t use such a key in a jq dot-selector:

$ printf '{"one": 1, "two three": 23}' | jq '.one' 1 $ printf '{"one": 1, "two three": 23}' | jq '.two three' jq: error: syntax error, unexpected IDENT, expecting $end (Unix shell quoting issues?) at <top-level>, line 1: .two three jq: 1 compile error

On (2): The integers are being converted to string for human
presentation, but should be left as ints in the JSON output for
programmatic manipulation.

On (3): Similar to (2). The dummy values are just to prevent confusing
human-readable output with a blank space. There’s no ambiguity in using
JSON "".

On (4): I see that you considered a streaming response but didn’t like
that it was “less easy to parse programmatically”. Is it? You can
convert a streaming response to a buffered response by simply piping
through jq -s, but you can’t go the other way around without waiting
for the whole buffer. And as long as we make sure to print one record
per line, people using tools other than JQ should be able to deal with
it handily (for line in file: json.loads(line) works in Python, e.g.).

One way to resolve these could be to change data from a 2-column table
to a 4-column table with human readable name, JSON key, raw value, and
formatter used when displaying to humans: e.g.,

data = [ ("Name", "name", experiment.name, lambda x: x or "[No Name]"), ("Description", "description", experiment.description, lambda x: x or "[No Description]"), ("Id", "id", experiment.experiment_id, str), ("Created", "created", util.format_time(experiment.create_time), str), ("Updated", "updated", util.format_time(experiment.update_time), str), ("Scalars", "scalars", experiment.num_scalars, str), ("Runs", "runs", experiment.num_runs, str), ("Tags", "tags", experiment.num_tags, str), ]

and project out the appropriate fields in the JSON and non-JSON cases;
of course feel free to pick a different implementation if you prefer.

Thanks for the detailed comments and suggestions. I made the following changes to address them.

The point that whitespace should be avoided in keys is taken. I created a Namedtuple called ExperimentMetadataField to hold the data for each row of the experiment metadata. It has the arg name for the name that's appropriate for a JSON key and the arg readable_name that holds the human-readable key.

I also agree that numbers should be printed as numbers, not strings in the JSON output. In this revision, this is done by using the aforementioned new ExperimentMetadataField namedtuple, in addition to the two refactored classes JsonFormatter and ReadableFormatter. Both have the format_experiment() method to convert a list of ExpeimentMetadataFields into a (potentially multi-line) string. I also created a common ancestor abstract class for the two classes for clarity.

Addressed by the above-described approach.

The refactored and revised code now uses streaming consistently across the readable and json formats.

Also note:

the new helper classes ReadableFormatter and JsonFormatter are refactored to a new Python module tensorboard/uploader/foramtters.py

A unit test is written for them in the newly-added unit test module tensorboard/uploader/formatters_test.py.

wchargin · 2020-04-06T15:33:21Z

tensorboard/uploader/uploader_main.py

-                print("\t%s %s" % (name.ljust(12), value))
+            if self.json:
+                experiments_json.append(
+                    collections.OrderedDict([("URL", url)] + data)


Could we just pass sort_keys=True instead of using an OrderedDict?
The determinism is nice, but I don’t know that we really need to enforce
the same order as with the human-readable output; sorting keys is the
standard “stable stringify”.

I prefer using an OrderedDict and making the key order consistent between the readable and json formats.

I find printed outputs such the following slightly cognitive overloading because it takes some efforts

to see what the identifying fields are (url and id in this case),

to mentally associate the logically related by actually separated fields like name and description.

to mentally sort the logically sorted by actually shuffled fields like runs, tags and scalars

{ "created": "2020-03-26 10:23:53", "description": "", "id": "dpi2D3lPTbe84YPSLw0giw", "name": "", "runs": "2", "scalars": "6", "tags": "1", "updated": "2020-03-26 10:23:53", "url": "https://tensorboard.dev/experiment/dpi2D3lPTbe84YPSLw0giw/" }

tensorboard/uploader/uploader_main.py

caisq

Thank you both for the insightful reviews!

caisq · 2020-04-06T19:46:32Z

tensorboard/uploader/uploader_main.py

+        """Constructor of _ListIntent.
+
+        Args:
+          json: If and ony if `True`, will print the list as a pretty-formatted


tensorboard/uploader/uploader_main.py

caisq · 2020-04-07T02:44:08Z

tensorboard/uploader/uploader_main.py

+        if self.json:
+            print(json.dumps(experiments_json, indent=2))


Thanks for the detailed comments and suggestions. I made the following changes to address them.

The point that whitespace should be avoided in keys is taken. I created a Namedtuple called ExperimentMetadataField to hold the data for each row of the experiment metadata. It has the arg name for the name that's appropriate for a JSON key and the arg readable_name that holds the human-readable key.

I also agree that numbers should be printed as numbers, not strings in the JSON output. In this revision, this is done by using the aforementioned new ExperimentMetadataField namedtuple, in addition to the two refactored classes JsonFormatter and ReadableFormatter. Both have the format_experiment() method to convert a list of ExpeimentMetadataFields into a (potentially multi-line) string. I also created a common ancestor abstract class for the two classes for clarity.

Addressed by the above-described approach.

The refactored and revised code now uses streaming consistently across the readable and json formats.

Also note:

the new helper classes ReadableFormatter and JsonFormatter are refactored to a new Python module tensorboard/uploader/foramtters.py

A unit test is written for them in the newly-added unit test module tensorboard/uploader/formatters_test.py.

caisq · 2020-04-07T02:49:09Z

tensorboard/uploader/uploader_main.py

-                print("\t%s %s" % (name.ljust(12), value))
+            if self.json:
+                experiments_json.append(
+                    collections.OrderedDict([("URL", url)] + data)


I prefer using an OrderedDict and making the key order consistent between the readable and json formats.

I find printed outputs such the following slightly cognitive overloading because it takes some efforts

to see what the identifying fields are (url and id in this case),

to mentally associate the logically related by actually separated fields like name and description.

to mentally sort the logically sorted by actually shuffled fields like runs, tags and scalars

{ "created": "2020-03-26 10:23:53", "description": "", "id": "dpi2D3lPTbe84YPSLw0giw", "name": "", "runs": "2", "scalars": "6", "tags": "1", "updated": "2020-03-26 10:23:53", "url": "https://tensorboard.dev/experiment/dpi2D3lPTbe84YPSLw0giw/" }

caisq · 2020-04-07T03:33:32Z

tensorboard/uploader/flags_parser.py

@@ -177,6 +177,11 @@ def define_flags(parser):
        "list", help="list previously uploaded experiments"
    )
    list_parser.set_defaults(**{SUBCOMMAND_FLAG: SUBCOMMAND_KEY_LIST})
+    list_parser.add_argument(
+        "--json",


@bileschi As pointed out by @wchargin, "prior art" varies. So I opted to err on the side of conciseness. Also see my earlier comments regarding this.

davidsoergel

LGTM mod one remaining comment. Thanks!

davidsoergel · 2020-04-07T14:17:53Z

tensorboard/uploader/formatters.py

+EXPERIMENT_METADATA_URL_JSON_KEY = "url"
+ExperimentMetadataField = collections.namedtuple(
+    "ExperimentMetadataField",
+    ("json_key", "readable_name", "value", "formatter"),


I was confused by the name collision of "formatter". Can this one be "field_formatter" or "to_readable" or similar?

Also, it still feels a bit intertwined to pass a lambda that is specifically used only in the ReadableFormatter case. I think you can remove the field entirely and automate this. In ReadableFormatter.format_experiment, you could:

if not metadata_field.value and not typeof(metadata_field.value) == 'int': readable_value = "[No %s]".format(metadata_field.readable_name) else: readable_value = str(metadata_field.value)

The typecheck there is a little iffy (because, as written, None will trigger the substitution even for fields that should be int, and a future float-valued field containing 0.0 would trigger too). I guess ExperimentMetadataField could carry a Type enum, but that seems overwrought. I think it'd be fine to gloss over all that for now (maybe with a code comment about the limitations).

tensorboard/uploader/uploader_main.py

wchargin · 2020-04-07T17:12:50Z

tensorboard/uploader/formatters.py

+        Returns:
+          A string that represents the `experiment_metadata`.
+        """
+        raise NotImplementedError()


In Python it’s generally preferable to pass in abstract base class
methods, because otherwise implementations cannot call super().(),
which precludes cooperative multiclassing.

If you want to ensure that all concrete subclasses actually implement
this, consider making this an actual abc.ABCMeta rather than just
calling it an abstract base class in the docstring.

wchargin · 2020-04-07T17:17:30Z

tensorboard/uploader/formatters_test.py

+        lines = output.split("\n")
+        self.assertLen(lines, 8)
+        self.assertEqual(lines[0], "{")
+        self.assertEqual(
+            lines[1], '  "url": "http://tensorboard.dev/deadbeef",'
+        )
+        self.assertEqual(lines[2], '  "name": "",')
+        self.assertEqual(lines[3], '  "description": "",')
+        self.assertEqual(lines[4], '  "runs": 8,')
+        self.assertEqual(lines[5], '  "tags": 12,')
+        self.assertEqual(lines[6], '  "binary_object_bytes": 2000')
+        self.assertEqual(lines[7], "}")


Can we just

expected_lines = [ "{", ' "url": "http://tensorboard.dev/deadbeef",', # ... "}" ] self.assertEqual(lines, expected_lines)

rather than asserting on each line individually? It’s easier to read,
wraps better, and gives better failure messages.

Alternative, you could just json.loads the thing and verify that the
object is correct rather than worrying about the details of whitespace.

wchargin · 2020-04-07T17:27:02Z

tensorboard/uploader/formatters.py

+ExperimentMetadataField = collections.namedtuple(
+    "ExperimentMetadataField",
+    ("json_key", "readable_name", "value", "formatter"),


Adding a formatter interface sounds reasonable to me, but this doesn’t
really feel like the right structure. The ExperimentMetadataField
struct is tightly coupled to the two implementations of formatters. The
readable_name nad formatter fields are only relevant to the
ReadableFormatter, and the json_key field is only relevant to the
JsonFormatter. And as written, although the name Experiment* is
everywhere, nothing actually knows about experiments.

Why introduce a new type with a big list of vague “fields” when we
already have the experiment proto? The interface could just be

class ExperimentFormatter(metaclass=abc.ABCMeta): def format(self, experiment): """Format an experiment. Args: experiment: An `experiment_pb2.Experiment` value. Returns: A string. """ pass

The ReadableExperimentFormatter would encapsulate the readable field
names and how to format them, and the JsonExperiment would encapsulate
the JSON keys and how to order and indent them. The tests would be on
the same output that we would actually see from the binary rather than
just an arbitrary set of fields defined in the test case.

Then, you could remove the giant block of field definitions in
uploader_main.py, too: just print(formatter.format(experiment)).

Done effecting the refactoring that you suggested. This revision has pushed a lot of the variables related to the formatting into formatters.py, which is indeed nicer than before.

Also note that I changed the signature of the format_experiment() method slightly so that it takes two input arguments: experiment (a experiment_pb2.Experiment proto) and experiment_url (a string). Getting the latter from an Experiment proto requires the server_info module, which I think shouldn't be known to the formatter module, which is purely concerned with string formatting.

wchargin

Looks good once tensorboard dev list --json is fixed.

Getting the [experiment URL] from an Experiment proto requires the
server_info module, which I think shouldn't be known to the
formatter module, which is purely concerned with string formatting.

Yep, agreed; sounds good to me.

wchargin · 2020-04-07T19:16:55Z

tensorboard/uploader/uploader_main.py

@@ -348,23 +358,16 @@ def execute(self, server_info, channel):
        )
        gen = exporter_lib.list_experiments(api_client, fieldmask=fieldmask)
        count = 0
+
+        if self.json:
+            formatter = formatters.JsonFormatterI()


Presumably JsonFormatterI should be JsonFormatter?

This breaks when run:

formatter = formatters.JsonFormatterI() AttributeError: module 'tensorboard.uploader.formatters' has no attribute 'JsonFormatterI'

Hmmm. I don't know how this got in. I'll be more careful with my keyboard setup. Thanks for catching this.

wchargin · 2020-04-07T19:17:09Z

tensorboard/uploader/formatters.py

+            ("scalars", experiment.num_scalars),
+        ]
+        return json.dumps(
+            collections.OrderedDict(data), indent=self._JSON_INDENT,


Optional, but: if you’re concerned about ease of parsing the JSON output
without jq, you could consider using indent=None (the default), so
that the output is one JSON object per line. This makes it easy to parse
with a parser that can only take a complete JSON string, like Python’s
json.loads or JS’s JSON.parse, because the user can easily identify
the framing boundaries (just split by newline). Otherwise, they have to
identify the actual object boundaries themselves, which is slightly less
trivial.

Up to you; just mentioning because you noted this concern in the
original PR description.

I did a test with tensorboard dev list --json | jq -s under this PR. jq seems to be able to handle the indent=2 just fine. So I'll leave it as is.

Right, yes, jq handles it fine, hence “[…] ease of parsing the JSON
output without jq.” This is fine with me.

Ack. I think the slight difficulty of parsing with other tools should be acceptable.

wchargin · 2020-04-07T19:17:17Z

tensorboard/uploader/formatters.py

+import collections
+import json
+
+import six


No need to support Python 2 in new srcs_version = "PY3" modules.

Done. There are a few other modules in tensorboard/uploader that are still using six.add_metaclass(). Those can be corrected later.

wchargin · 2020-04-07T19:17:30Z

tensorboard/uploader/formatters.py

+            ("created", util.format_time(experiment.create_time)),
+            ("updated", util.format_time(experiment.update_time)),


It probably makes more sense to use format_time_absolute for the JSON
formatter? Since the point is to have a simple, parseable format,
outputs like "just now" probably aren’t what’s wanted.

Done. Unit test is updated accordingly.

wchargin · 2020-04-07T20:25:14Z

tensorboard/uploader/formatters.py

+            ("scalars", experiment.num_scalars),
+        ]
+        return json.dumps(
+            collections.OrderedDict(data), indent=self._JSON_INDENT,


Right, yes, jq handles it fine, hence “[…] ease of parsing the JSON
output without jq.” This is fine with me.

wchargin · 2020-04-07T20:25:39Z

tensorboard/uploader/formatters.py

@@ -0,0 +1,99 @@
+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.


nit: 2019 → 2020 (and test)

Done here and in formatters_test.py.

* Motivation for features / changes * Fulfill feature request bug b/153232102 * Technical description of changes * Add the `--json` flag to the `list` subcommand of `tensorboard dev`. * If the flag is used, the experiments will be printed as a JSON object mapping experiment URLs to experiment data (name, description, runs, tags, etc.) * Screenshots of UI changes * ![image](https://user-images.githubusercontent.com/16824702/78626883-0f77f480-785e-11ea-88ca-b8d653d302c6.png) * Detailed steps to verify changes work correctly (as executed by you) * Manually ran `tensorboard dev list --json` (see screenshot above) * Alternate designs / implementations considered * Output a single big json array at the end: * Pro: may be easier to parse programmatically * Con: no streaming

uploader: add --json flag to the list command

d0d0a10

googlebot added the cla: yes label Apr 5, 2020

caisq added 2 commits April 5, 2020 13:26

Switch JSON object to array

229d4c5

url --> URL

cb36b79

caisq marked this pull request as ready for review April 5, 2020 17:50

caisq requested review from davidsoergel and wchargin April 6, 2020 13:08

wchargin reviewed Apr 6, 2020

View reviewed changes

davidsoergel reviewed Apr 6, 2020

View reviewed changes

tensorboard/uploader/uploader_main.py Show resolved Hide resolved

Address comments made by William and David

271bbab

caisq commented Apr 7, 2020

View reviewed changes

Improve doc strings; remove a magic number

2db97e2

caisq commented Apr 7, 2020

View reviewed changes

caisq requested review from wchargin and davidsoergel April 7, 2020 03:36

davidsoergel approved these changes Apr 7, 2020

View reviewed changes

wchargin reviewed Apr 7, 2020

View reviewed changes

caisq added 3 commits April 7, 2020 14:48

Address comments by William: refactor formatters.py

965ad48

Revise test format

f01ac7c

Use less arbitrary datetime for testing

26d51aa

caisq requested a review from wchargin April 7, 2020 19:13

wchargin reviewed Apr 7, 2020

View reviewed changes

caisq added 2 commits April 7, 2020 15:30

Address comments

7650755

Apply black formatting

f4819ad

caisq requested a review from wchargin April 7, 2020 19:45

wchargin approved these changes Apr 7, 2020

View reviewed changes

wchargin reviewed Apr 7, 2020

View reviewed changes

Correct year numbers in copyright headers

9ba4cd3

caisq merged commit 2edd0ea into tensorflow:master Apr 7, 2020

caisq mentioned this pull request Apr 7, 2020

uploader: display binary object bytes in tensorboard dev list output #3464

Merged

		("created", util.format_time(experiment.create_time)),
		("updated", util.format_time(experiment.update_time)),

		@@ -0,0 +1,99 @@
		# Copyright 2019 The TensorFlow Authors. All Rights Reserved.

uploader: add --json flag to the list command #3480

uploader: add --json flag to the list command #3480

Conversation

caisq commented Apr 5, 2020 • edited Loading

bileschi commented Apr 6, 2020

caisq commented Apr 6, 2020

wchargin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

caisq Apr 7, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

caisq left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

caisq Apr 7, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidsoergel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wchargin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wchargin Apr 7, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uploader: add `--json` flag to the `list` command #3480

uploader: add `--json` flag to the `list` command #3480

caisq commented Apr 5, 2020 •

edited

Loading

caisq Apr 7, 2020 •

edited

Loading

caisq Apr 7, 2020 •

edited

Loading

wchargin Apr 7, 2020 •

edited

Loading