Labeled repr #1044

chris-b1 · 2016-10-12T21:26:42Z

It may be nice to take advantage of labels to show a different, labeled repr - especially for more than 3 dimensions, I personally find the the numpy array one hard to read.

Some sample data and the current repr

In [103]: d = xr.DataArray(np.arange(200).reshape((2,5,2,10)), dims=('a', 'b', 'c', 'd'),
     ...:                  coords={'a': ['A', 'B'], 'b': ['Cat 1', 'Cat 2', 'Cat 3', 'Cat 4', 'Cat 5'],
     ...:                          'c': ['J', 'K']})

In [104]: d
Out[104]: 
<xarray.DataArray (a: 2, b: 5, c: 2, d: 10)>
array([[[[  0,   1,   2,   3,   4,   5,   6,   7,   8,   9],
         [ 10,  11,  12,  13,  14,  15,  16,  17,  18,  19]],

        [[ 20,  21,  22,  23,  24,  25,  26,  27,  28,  29],
         [ 30,  31,  32,  33,  34,  35,  36,  37,  38,  39]],

        [[ 40,  41,  42,  43,  44,  45,  46,  47,  48,  49],
         [ 50,  51,  52,  53,  54,  55,  56,  57,  58,  59]],

        [[ 60,  61,  62,  63,  64,  65,  66,  67,  68,  69],
         [ 70,  71,  72,  73,  74,  75,  76,  77,  78,  79]],

        [[ 80,  81,  82,  83,  84,  85,  86,  87,  88,  89],
         [ 90,  91,  92,  93,  94,  95,  96,  97,  98,  99]]],


       [[[100, 101, 102, 103, 104, 105, 106, 107, 108, 109],
         [110, 111, 112, 113, 114, 115, 116, 117, 118, 119]],

        [[120, 121, 122, 123, 124, 125, 126, 127, 128, 129],
         [130, 131, 132, 133, 134, 135, 136, 137, 138, 139]],

        [[140, 141, 142, 143, 144, 145, 146, 147, 148, 149],
         [150, 151, 152, 153, 154, 155, 156, 157, 158, 159]],

        [[160, 161, 162, 163, 164, 165, 166, 167, 168, 169],
         [170, 171, 172, 173, 174, 175, 176, 177, 178, 179]],

        [[180, 181, 182, 183, 184, 185, 186, 187, 188, 189],
         [190, 191, 192, 193, 194, 195, 196, 197, 198, 199]]]])
Coordinates:
  * a        (a) <U1 'A' 'B'
  * b        (b) <U5 'Cat 1' 'Cat 2' 'Cat 3' 'Cat 4' 'Cat 5'
  * c        (c) <U1 'J' 'K'
  * d        (d) int64 0 1 2 3 4 5 6 7 8 9

The labeled repr could instead look something (not exactly) like this?

<xarray.DataArray (a: 2, b: 5, c: 2, d: 10)>

a: 'A'
b: 'Cat 1'
c x d: 
         0   2   3   4   5   6   7   8   9  10
     J   0   1   2   3   4   5   6   7   8   9
     K  10  11  12  13  14  15  16  17  18  19


a: 'A'
b: 'Cat 2'
c x d
    <repeat>
...

Coordinates:
  * a        (a) <U1 'A' 'B'
  * b        (b) <U5 'Cat 1' 'Cat 2' 'Cat 3' 'Cat 4' 'Cat 5'
  * c        (c) <U1 'J' 'K'
  * d        (d) int64 0 1 2 3 4 5 6 7 8 9

The text was updated successfully, but these errors were encountered:

shoyer · 2016-10-12T21:31:58Z

Agreed, I'm never been really happy with our use of the NumPy repr for >2 dimensions. It's quite hard to match up the labels.

Something like this would be a meaningful improvement! I would encourage experimentation on this.

fmaussion · 2016-10-12T21:36:37Z

Good idea! I am in favor of as few repr as possible, i.e. maybe the first few values in each dimension.

max-sixty · 2016-10-12T22:31:40Z

I think dupe of #680

benbovy · 2016-10-13T10:37:24Z

After seeing the discussion in #680, I'm wondering if showing the firsts values of the flattened array wouldn't be enough here, e.g., something like this:

>>> d
<xarray.DataArray (a: 2, b: 5, c: 2, d: 10)>
  array          int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 ...
Coordinates:
  * a        (a) <U1 'A' 'B'
  * b        (b) <U5 'Cat 1' 'Cat 2' 'Cat 3' 'Cat 4' 'Cat 5'
  * c        (c) <U1 'J' 'K'
  * d        (d) int64 0 1 2 3 4 5 6 7 8 9

This example is more consistent with the repr of Dataset data variables, and similarly we could customize the repr of dask arrays and lazy arrays (loaded from netcdf files) like this:

>>> d.chunk((10, 5, 5, 10))
<xarray.DataArray (a: 2, b: 5, c: 2, d: 10)>
  dask.array     int64 chunksize=(10, 5, 5, 10)
Coordinates:
  * a        (a) <U1 'A' 'B'
  * b        (b) <U5 'Cat 1' 'Cat 2' 'Cat 3' 'Cat 4' 'Cat 5'
  * c        (c) <U1 'J' 'K'
  * d        (d) int64 0 1 2 3 4 5 6 7 8 9

>>> d.name = 'myvar'
>>> d.to_netcdf('data.nc')
>>> xr.open_dataset('data.nc').myvar
<xarray.DataArray 'myvar' (a: 2, b: 5, c: 2, d: 10)>
  lazy-array     int64
Coordinates:
  * a        (a) <U1 'A' 'B'
  * b        (b) <U5 'Cat 1' 'Cat 2' 'Cat 3' 'Cat 4' 'Cat 5'
  * c        (c) <U1 'J' 'K'
  * d        (d) int64 0 1 2 3 4 5 6 7 8 9

fmaussion · 2016-10-13T16:33:08Z

I agree, but I see one or two cases where it could be useful to have the first few values for each dim. For example with geopotential data on pressure levels, it could be interesting to see how the data varies with height on the third dim. But this is a detail, not very important.

chris-b1 · 2016-10-13T20:11:38Z

There could be some display options exposed to manage this - for instance I personally would not like a flat array - but see how it could make sense.

Additionally / alternatively, the repr I'm talking (small slice of values laid out with coordinate labels) could called something other than __repr__ - something like pandas .head() although may be a better name to use here.

benbovy · 2016-10-13T21:52:46Z

In most cases I found the DataArray repr useful for quickly checking the dimensions (both names and sizes), the attributes and the types/values of both data and labels (I mean just checking here if the values are consistent regarding their units, acceptable ranges, etc.), but rarely for in-depth checking of the data values along each dimension, hence my suggestion of a flat (subset) array.

To inspect the data of high dimensional datarrays, I've mainly used the indexing logic of xarray to extract slices of <3 dimensions. However, I admit that for quick inspection purposes I actually like your suggestion of having a specific repr method that would allow showing small data slices as labeled tables, especially if we choose to always use a flat array for the repr of Dataarray (i.e., even when the number of dimensions <3). Why not something like:

>>> d.slice_repr(a=0, b=0)
d   0   1   2   3   4   5   6   7   8   9
c                                        
J   0   1   2   3   4   5   6   7   8   9
K  10  11  12  13  14  15  16  17  18  19

This is equivalent to

>>> dslice = d.isel(a=0, b=0)
>>> pd.DataFrame(data=dslice.data, index=dslice.c, columns=dslice.d)

Except that slice_repr() would return a string instead of a data object (or an array or a dataframe).
Not sure about the name and/or signature of slice_repr(), though.

stale · 2019-01-25T04:43:30Z

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity
If this issue remains relevant, please comment here; otherwise it will be marked as closed automatically

shoyer added the contrib-help-wanted label Nov 10, 2016

fmaussion mentioned this issue Dec 6, 2016

"ncdump -h" like repr? #1150

Closed

jhamman mentioned this issue Dec 21, 2016

add info method to dataset #1176

Merged

shoyer mentioned this issue Jan 15, 2017

Shortened display of NumPy arrays in DataArray.__repr__ #1207

Merged

stale bot added the stale label Jan 25, 2019

stale bot closed this as completed Feb 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Labeled repr #1044

Labeled repr #1044

chris-b1 commented Oct 12, 2016

shoyer commented Oct 12, 2016

fmaussion commented Oct 12, 2016 •

edited

Loading

max-sixty commented Oct 12, 2016

benbovy commented Oct 13, 2016 •

edited

Loading

fmaussion commented Oct 13, 2016

chris-b1 commented Oct 13, 2016 •

edited

Loading

benbovy commented Oct 13, 2016 •

edited

Loading

stale bot commented Jan 25, 2019

Labeled repr #1044

Labeled repr #1044

Comments

chris-b1 commented Oct 12, 2016

shoyer commented Oct 12, 2016

fmaussion commented Oct 12, 2016 • edited Loading

max-sixty commented Oct 12, 2016

benbovy commented Oct 13, 2016 • edited Loading

fmaussion commented Oct 13, 2016

chris-b1 commented Oct 13, 2016 • edited Loading

benbovy commented Oct 13, 2016 • edited Loading

stale bot commented Jan 25, 2019

fmaussion commented Oct 12, 2016 •

edited

Loading

benbovy commented Oct 13, 2016 •

edited

Loading

chris-b1 commented Oct 13, 2016 •

edited

Loading

benbovy commented Oct 13, 2016 •

edited

Loading