unstructured grid #105

mathause · 2021-10-14T13:38:33Z

This issue is relevant for #65

Internally mesmer uses an unstructured grid. That is the lat and lon coords are not 2D but along a vector. When we start using xarray internally we need to name the non-coordinate dimension of the vector. @leabeusch suggests to use "gp" (for gridpoint). The most likely alternative would be "cell" (see details).

This is what some other models do:

ICON: cell
MPAS: nCells
CAM with unstructured grid(?) (CAMSE?): ncol
CLM (used internally only): column and gridcell
AWI ocean model: ncells

Example:

import xarray as xr
xr.set_options(display_style="text")

lat = [0.5, 0.5, 1.5, 1.5]
lon = [0.5, 1.5, 0.5, 1.5]
data = [0.5, 0.7, 0.8, 0.2]

ds = xr.Dataset(
    data_vars=dict(data=("gp", data)),
    coords={"lon": ("gp", lon), "lat": ("gp", lat)}
)

and the repr would look like

<xarray.DataArray 'data' (gp: 4)>
array([0.5, 0.7, 0.8, 0.2])
Coordinates:
    lon      (gp) float64 0.5 1.5 0.5 1.5
    lat      (gp) float64 0.5 0.5 1.5 1.5
Dimensions without coordinates: gp

Note: I would not go for a MultiIndex because (i) it brings its own set of problems (ii) it should no longer be necessary after the index refactor of xarray (which is actually finally underway) (iii) we probably seldom need to select individual grid points from the
Obviously the array has more dimensions, likely "time" and "member" (or "realization").

Dimensions:  (member, time, gp)
Coordinates:
    time (time)
    lon  (gp)
    lat  (gp)
Dimensions without coordinates: member, gp

xarray does not support two dimensions with the same name. Therefore we need new names for the geo distance matrix and the correlation matrix. We thought to subscript "gp", "lon", and "lat" with "_i", and "_j":

geodist = xr.Dataset(
    data_vars=dict(
        dist=(("gp_j", "gp_i"), [
        [0.5, 0.7, 0.8, 0.2],
        [0.5, 0.7, 0.8, 0.2],
        [0.5, 0.7, 0.8, 0.2],
        [0.5, 0.7, 0.8, 0.2]
        ])
    ),
    coords={
        "lon_j": ("gp_j", lon),
        "lat_j": ("gp_j", lat),
        "lon_i": ("gp_i", lon),
        "lat_i": ("gp_i", lat)
    }
)

I.e. the array would look like:

<xarray.Dataset>
Dimensions:  (gp_i: 4, gp_j: 4)
Coordinates:
    lon_j    (gp_j) float64 0.5 ...
    lat_j    (gp_j) float64 0.5 ...
    lon_i    (gp_i) float64 0.5 ...
    lat_i    (gp_i) float64 0.5 ...
Dimensions without coordinates: gp_i, gp_j
Data variables:
    dist     (gp_j, gp_i) float64 0.5 ...

The text was updated successfully, but these errors were encountered:

leabeusch · 2021-10-20T14:10:13Z

This is based on a discussion @mathause & I had last week (obviously very much driven by Mathias' actual knowledge on these things and me learning about them ;)) -> @znicholls maybe it would make sense for you to have a quick look at it before our meeting tomorrow? Especially the part about avoiding MultiIndex.

@leabeusch suggests to use "gp" (for gridpoint). The most likely alternative would be "cell" (see details).

I can live with "cell" too, if there is someone with a clear preference for it. In my head it was just always called "gp". ^^

Obviously the array has more dimensions, likely "time" and "member".

@mathause, I remember we talked about "member" vs "realization" but I cannot remember why we leaned towards "member" at the end? Currently "realization" seems more intuitive to me. But I'm sure I could be convinced otherwise again.

dist (gp_j, gp_i) float64 0.5 ...

Usually, we'd put i before j, no? (I know, extremely relevant point)

znicholls · 2021-10-21T06:37:27Z

Especially the part about avoiding MultiIndex

Avoiding multiindex is totally fine for me, I was literally just hacking anything together which would sort of work but I am glad we now have an actual xarray expert.

I can live with "cell" too, if there is someone with a clear preference for it. In my head it was just always called "gp"

I'm happy with whatever. I have always preferred longer names (so gridpoint rather than gp) because I find myself being like, "wtf is gp" for too long at the start of doing work and the extra characters are free. Given that, perhaps cell is better because it's shorter but not an abbreviation (but I really have no preference).

Currently "realization" seems more intuitive to me

Realisation (and if you want to use american spelling I will live) makes more sense to me too (given I think of emulations leading to realisations or draws), member is also totally fine though given CMIP always takes about members and member_id.

xarray does not support two dimensions with the same name. Therefore we need new names for the geo distance matrix and the correlation matrix. We thought to subscript "gp", "lon", and "lat" with "_i", and "_j"

I would make it a more explicit name e.g. "gridpoint_correlation_matrix". I know it starts to get long but I would have no idea what the difference between "gp" and "gp_i" was without stopping and thinking whereas "gridpoint" and "gridpoint_correlation_matrix" are immediately obvious to me (and given we use black the code will never look that horrendous anyway). "gridpoint_crossterms" would also work if we want a more general thing.

yquilcaille · 2021-10-27T12:44:12Z

Note: I would not go for a MultiIndex

I agree, it would bring more problems over the long term than it would solve now.

I can live with "cell" too, if there is someone with a clear preference for it. In my head it was just always called "gp"

I have a small preferences for cell, for the same reasons that @znicholls mentions. "gp" is not very clear, not very user-friendly, and cell is shorter than gridpoint.

@mathause, I remember we talked about "member" vs "realization"

I have a preference for member. We are using the runs from ESMs on scenarios under different ensemble members. It is the term used in the climate community, then it would make more sense from my perspective. Plus, it is shorter than realization :)
One quick note on this point, we should refer to the members (or realizations if you prefer) using their full id and not number, to be sure that we use the same ones. For instance, when using tas and hfds in a training.

leabeusch · 2021-11-02T22:11:31Z

Just to follow up on this one more time: seems you all do have nice arguments for "cell" over "gp" -> consider me convinced of "cell" too.

Realisation (and if you want to use american spelling I will live)

Funny that this is coming up already again (@mathause & I had a moderately important discussion about American vs British English in the context of a comma a few weeks ago). I think we may actually have to make a decision on the type of English we use sometimes soonish. ^^

& more importantly, on the "realization" vs "member" topic: I see @yquilcaille's point as long as it's the actual ESM output. But for the emulations we generate, but I find "realization" for the inidivdual emulations a lot more descriptive, as they are realizations of a stochastic process... & the part about the full id, I also see advantages for clear ESM run identification but it feels like a bit a strange overhead for the emulations? & are these member id's even defined outside of CMIP experiments? On the other hand: it would probably also be very counterintuitive to have different naming conventions for the ESM simulations & the emulations.

mathause · 2021-11-03T13:15:51Z

Yes, I am not always consistent in my choice of dialect.

I did just go check the cmip6 definitions (see note 8). The whole thing is the variant_label = "r1i1p1f1" and

For a given experiment, the realization_index, initialization_index, physics_index, and forcing_index are used to uniquely identify each simulation of an ensemble of runs contributed by a single model.

but we don't have to adapt this nomenclature. E.g. we generally want to pool everything that is below the level of "model" (but see also #113).

leabeusch mentioned this issue Oct 20, 2021

internal data structure #106

Open

mathause mentioned this issue Oct 27, 2021

draft roadmap #111

Closed

mathause mentioned this issue Dec 2, 2021

round-trip land-only unstructured grid #117

Closed

mathause mentioned this issue Dec 23, 2021

Fix config files #34

Open

mathause mentioned this issue Nov 2, 2022

convert grids to/ from unstructured #217

Merged

4 tasks

mathause closed this as completed in #217 Nov 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unstructured grid #105

unstructured grid #105

mathause commented Oct 14, 2021 •

edited

Loading

leabeusch commented Oct 20, 2021

znicholls commented Oct 21, 2021 •

edited

Loading

yquilcaille commented Oct 27, 2021 •

edited

Loading

leabeusch commented Nov 2, 2021

mathause commented Nov 3, 2021 •

edited

Loading

unstructured grid #105

unstructured grid #105

Comments

mathause commented Oct 14, 2021 • edited Loading

leabeusch commented Oct 20, 2021

znicholls commented Oct 21, 2021 • edited Loading

yquilcaille commented Oct 27, 2021 • edited Loading

leabeusch commented Nov 2, 2021

mathause commented Nov 3, 2021 • edited Loading

mathause commented Oct 14, 2021 •

edited

Loading

znicholls commented Oct 21, 2021 •

edited

Loading

yquilcaille commented Oct 27, 2021 •

edited

Loading

mathause commented Nov 3, 2021 •

edited

Loading