Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concatenation of scalar cubes #4720

Closed
valeriupredoi opened this issue Apr 28, 2022 · 19 comments
Closed

Concatenation of scalar cubes #4720

valeriupredoi opened this issue Apr 28, 2022 · 19 comments

Comments

@valeriupredoi
Copy link

Hey guys, I have the following workflow in mind, originally pointed to me by @ledm

Problem: concatenate N scalar cubes by simply stacking their data

  • I have 2 (or N) cubes
  • I am getting the means of the data for each cube, along axis=0 (time means)
  • result is: 2 (or N) scalar cubes (after using iris.analysis.MEAN)
  • I want my resulting two (or N) (scalar) cubes concatenated into a single cube where I am storing two (or N) data points (their means) in an array that'll be the new cube's data object

Currently, iris offers me the chance to do means, but no option to have my result concatenated and stored into a single cube. POC code below..

POC code

import iris.cube
import numpy as np
from cf_units import Unit

c1 = iris.cube.Cube(np.arange(0, 5), var_name='co2', units='J')
c1.add_dim_coord(
    iris.coords.DimCoord(
        np.arange(0., 5., 1.),
        standard_name='time',
        units=Unit('days since 1950-01-01 00:00:00',
        calendar='360_day'),
     ),
    0,
)

c2 = iris.cube.Cube(np.arange(0, 5), var_name='co2', units='J')
c2.add_dim_coord(
    iris.coords.DimCoord(
        np.arange(0., 5., 1.),
        standard_name='time',
        units=Unit('days since 1950-01-01 00:00:00',
        calendar='360_day'),
     ),
    0,
)

c1 = c1.collapsed('time', iris.analysis.MEAN)
c2 = c2.collapsed('time', iris.analysis.MEAN)

cube_list = iris.cube.CubeList([c1, c2])

new_cube_list = []
for cube in cube_list:
    cube.attributes = {}
    cube = iris.util.new_axis(cube, scalar_coord="time")
    new_cube_list.append(cube)

cube_mean = iris.cube.CubeList(new_cube_list).concatenate_cube()
print(len(cube_mean))

This doesn't work, spitting the usual (and very non-descript) error:

Traceback (most recent call last):
  File "/home/users/valeriu/lee_mean.py", line 39, in <module>
    cube_mean = iris.cube.CubeList(new_cube_list).concatenate_cube()
  File "/home/users/valeriu/miniconda3-June2021/envs/esmvaltool-latest/lib/python3.10/site-packages/iris/cube.py", line 565, in concatenate_cube
    raise iris.exceptions.ConcatenateError(msgs)
iris.exceptions.ConcatenateError: failed to concatenate into a single cube.
  An unexpected problem prevented concatenation.
  Expected only a single cube, found 2.

The issue here is that, even if I promote a scalar coordinate to DimCoord in

cube = iris.util.new_axis(cube, scalar_coord="time")

the (single-point) coords are identical, so iris is incapable of concatenating; equally incapable even if I don't promote that time coord. Unless I fully shift one cube's time exis so they don't overlap, I can't concatenate. We have fixed concatenation of overlapping cubes in time in ESMValCore, so it'd be worth implementing that straight into iris. But I digress...

What I want

I want to have concatenate_cube() work and get me a single cube in this case. For that. I'd like the following:

  • get a descriptive error message why the cubes can't be concatenated in a single one
  • said error message offer me clues as to what I can do to avert the issue, ie give me an option to eg set a kwarg scalar_cubes = True - at the end of the day, one doesn't care about an axis if a cube is scalar, it's just a stacking of floats in an array that'll be the cube's data
  • the above two should work without me having to promote any scalar coords to DimCoords and consequently to shift my axis so they don't overlap, to prevent concatenation
  • also, the above two should work, even if cubes' metadata differs, since, inherently, a multi-model cube like this is assembled from cubes with different metadata attributes and values

How's this sound to you? Cheers muchly 🍺

@zklaus
Copy link

zklaus commented Apr 28, 2022

Have you tried mergeing?

@valeriupredoi
Copy link
Author

conversely, you could generalize cube.collapsed(axis, operation) to CubeList.collapsed(axis, operation) that performs two things: collapsing and concatenation into a single cube ❓

@zklaus
Copy link

zklaus commented Apr 28, 2022

See also the general explanation of merge v concatenate in the userguide.

@valeriupredoi
Copy link
Author

I have too (good point, I should have mentioned) - it doesn't work still - it's complaining about cube duplication:

iris.exceptions.DuplicateDataError: failed to merge into a single cube.
  Duplicate 'co2' cube, with scalar coordinates

@ledm
Copy link

ledm commented Apr 28, 2022

(I apologise profusely for this. I naively thought I was being silly and that there would be a simple answer to my question - not a 3 page bug report.)

@valeriupredoi
Copy link
Author

See also the general explanation of merge v concatenate in the userguide.

cheers Klaus, I am aware of that, but I want a working solution for something that is currently not permitted due to obvious rules for regular cubes, but should be allowed for scalar cubes. Me and my corner cases 😁

@zklaus
Copy link

zklaus commented Apr 28, 2022

No worries, @ledm. @valeriupredoi, great, we agree then that this is definitely about merge and not concatenate, but that there might be a bug/shortcoming in the merge routine, right?

@zklaus
Copy link

zklaus commented Apr 28, 2022

Ah, I see the problem now. It is that both cubes have the same time. Hence they have no distinguishing scalar coordinate.

If the goal is to combine two cubes that represent consecutive timespans, use a different time for the second cube, for example np.arange(6., 11.) (the , 1. isn't really doing anything here).

If the goal is to combine two cubes that represent the same timespan, they should be differentiated by something different, perhaps a model name?

Either way in my tests then both concatenate and merge work; concatenate after using the new_axis call, merge without it.
Full MWE:

import iris.cube
import numpy as np
from cf_units import Unit

c1 = iris.cube.Cube(np.arange(0, 5), var_name='co2', units='J')
c1.add_dim_coord(
    iris.coords.DimCoord(
        np.arange(0., 5., 1.),
        standard_name='time',
        units=Unit('days since 1950-01-01 00:00:00',
        calendar='360_day'),
     ),
    0,
)

c2 = iris.cube.Cube(np.arange(0, 5), var_name='co2', units='J')
c2.add_dim_coord(
    iris.coords.DimCoord(
        np.arange(6., 11., 1.),
        standard_name='time',
        units=Unit('days since 1950-01-01 00:00:00',
        calendar='360_day'),
     ),
    0,
)

c1 = c1.collapsed('time', iris.analysis.MEAN)
c2 = c2.collapsed('time', iris.analysis.MEAN)

cube_list = iris.cube.CubeList([c1, c2])

print(cube_list.merge_cube())

new_cube_list = []
for cube in cube_list:
    cube.attributes = {}
    cube = iris.util.new_axis(cube, scalar_coord="time")
    new_cube_list.append(cube)

cube_mean = iris.cube.CubeList(new_cube_list).concatenate_cube()
print(cube_mean)

@ledm
Copy link

ledm commented Apr 28, 2022

In the original code, this was an attempt at taking a mean of two different ensemble members for the same model and experiment covering the same time period. So while your solution works, it's not the solution to my problem.

@rcomer
Copy link
Member

rcomer commented Apr 28, 2022

I was going to say the same as @zklaus 🙂. My version adds string coords to distinguish two different models. You could equally add a "realization" coordinate to distinguish members from the same ensemble.

import iris.cube
import numpy as np
from cf_units import Unit

c1 = iris.cube.Cube(np.arange(0, 5), var_name='co2', units='J')
c1.add_dim_coord(
    iris.coords.DimCoord(
        np.arange(0., 5., 1.),
        standard_name='time',
        units=Unit('days since 1950-01-01 00:00:00',
        calendar='360_day'),
     ),
    0,
)

c2 = iris.cube.Cube(np.arange(0, 5), var_name='co2', units='J')
c2.add_dim_coord(
    iris.coords.DimCoord(
        np.arange(0., 5., 1.),
        standard_name='time',
        units=Unit('days since 1950-01-01 00:00:00',
        calendar='360_day'),
     ),
    0,
)

c1 = c1.collapsed('time', iris.analysis.MEAN)
c2 = c2.collapsed('time', iris.analysis.MEAN)

cube_list = iris.cube.CubeList([c1, c2])

new_cube_list = []
for cube, model in zip(cube_list, ["model1", "model2"]):
    cube.attributes = {}
    cube.add_aux_coord(iris.coords.AuxCoord(model, long_name="model_id"))
    new_cube_list.append(cube)

cube_mean = iris.cube.CubeList(new_cube_list).merge_cube()
print(cube_mean)

@zklaus
Copy link

zklaus commented Apr 28, 2022

@rcomer beat me to it.

@ledm
Copy link

ledm commented Apr 28, 2022

"""
c1 = c1.collapsed('time', iris.analysis.MEAN)
c2 = c2.collapsed('time', iris.analysis.MEAN)
"""

But the arrays are no longer scalar if you take the mean of the only dimension, right?

@zklaus
Copy link

zklaus commented Apr 28, 2022

They are even more scalar, aren't they? I mean before they were vectors, now they are scalars. But that doesn't really matter at all. The important point is that there is a scalar coordinate, in this case model_id that allows Iris to aggregate everything. The other dimensions of the cubes play no role.

@valeriupredoi
Copy link
Author

ahaa! Good call! But this will still not work if the metadata dictionaries differ - and they usually differ like in Lee's case - different ensemble names, parent experiment IDs etc

@valeriupredoi
Copy link
Author

at any rate, it'd be cool if iris did this under the hood for the user, not the user going about and MacGyver-ing the whole thing 😁

@zklaus
Copy link

zklaus commented Apr 28, 2022

Other metadata getting in the way is a bit off-topic here, imho. It's debatable whether that is a good idea as default behaviour, but to make things simple, there is already now unify_time_units and equalise_attributes.

@valeriupredoi
Copy link
Author

ah have completely forgotten about equalize_attributes - good call, Klaus!

@rcomer
Copy link
Member

rcomer commented Apr 28, 2022

There is also #4446 for the question of mismatched metadata.

@valeriupredoi
Copy link
Author

cheers @rcomer - I guess closing this is OK - it'd be nice to have this concat/merge stuff be more lenient as per the issue you link - threw a bunch of ❤️ 's all over there just to show my support 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants