-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-model statistics shift time coordinate #665
Comments
I got almost the same problem while using multi_model_statistics preprocessor.
CC: @Peter9192 |
@LisaBock I don't think you can perform multimodel statistics over two timeseries with (completely) different time points, as the statistics are computed for each time point that the datasets have in common. So far as I understand, keyword @SarahAlidoost as we discussed, this issue seems to come from the dataset fix that changes the calendar of the dataset. I'll look into the reason for why this seems to break, rather than fix our time coordinate. |
about |
@SarahAlidoost I made a new issue about the access data: #669 |
Are you sure about this? As far as I see, this just creates a new time array containing all unique time points in the input data. But if they are offset with respect to one another, you'll get multiple indices. Lets say dataset 1 contains [01-15, 02-15, 03-15, ...], and dataset 2 contains [01-16, 02-16, 03-16, ...], then the new time array will contain [01-15, 01-16, 02-15, 02-16, 03-15, 03-16, ...]. But perhaps your point is that it should never be the case that these monthly time points can differ if the data are cmor standard? |
yes, |
Ah okay, I see. But then I don't understand how both Lisa and us are getting the unexpected behaviour where the length of the time coordinate get's doubled. Let me dive into it a bit deeper. |
can you pls post the actual times you mean by double? 🍺 |
...
File "esmvalcore/preprocessor/_multimodel.py", line 297, in _assemble_full_data
new_datas_array = _full_time_slice(cubes, empty_arr, indices_list,
File "esmvalcore/preprocessor/_multimodel.py", line 237, in _full_time_slice
ndat[indices[idx_cube]] = cube.data
File "lib/python3.8/site-packages/numpy/ma/core.py", line 3343, in __setitem__
_data[indx] = dval
ValueError: shape mismatch: value array of shape (1668,192,288) could not be broadcast to indexing result of shape (1668,145,192) |
But I see it's not actually the time coordinate that's causing the problem in our case. |
Okay never mind. |
I managed to reproduce our problem. Here's a MWE: ---
documentation:
description: mwe
authors:
- kalverla_peter
- alidoost_sarah
datasets:
- {dataset: ACCESS1-0, project: CMIP5, mip: Amon, exp: [historical, rcp85], ensemble: r1i1p1, start_year: 1961, end_year: 2099}
- {dataset: CCSM4, project: CMIP5, mip: Amon, exp: [historical, rcp85], ensemble: r1i1p1, start_year: 1961, end_year: 2099}
preprocessors:
preprocessor1:
custom_order: True
area_statistics:
operator: mean
anomalies:
period: full # 'full' requires https://github.com/ESMValGroup/ESMValCore/pull/652, use # 'month' for now.
reference: &reference
start_year: 1980
start_month: 1
start_day: 1
end_year: 2009
end_month: 12
end_day: 31
standardize: false
annual_statistics:
operator: mean
multi_model_statistics:
span: full
statistics: [mean, median] # might want to add percentiles here, but not supported
diagnostics:
mwe:
description: minimal working example
variables:
tas:
preprocessor: preprocessor1
scripts: null The preprocessed output files of this recipe contain 139 time points for both the access1-0 and the CCSM4 dataset, but 278 time points for the multimodelmean. Perhaps this is caused by different calendars, still looking into it. |
I guess the problem is that we're passing yearly data to the multimodel stats, and due to the unfortunate time difference between our datasets, the annually average time points end up in the beginning of July for the one dataset, and end of June for the other. |
Final diagnosis: CCSM4 data has a no-leap calendar, and ACCESS-1 a gregorian calendar (that seems to need an extra fix: #669). The multimodel preprocessor can deal with this as long as it's monthly data, but it fails if we first compute annual means, because the two calendars' time points average out into different months. So my questions are:
|
@Peter9192 a few points:
For MM; overlap the MM will use the first cube in the cube list time axis as the MM cube time axis (after all cubes have been sliced on the common overlap period, so dates may differ from cube to cube, but as long as they are monthly dates the MM doesn't care); for MM: full the MM will reset the monthly dates to the first of the month; no calendar manipulation done 🍺 |
@valeriupredoi Yep, thanks for the explanation. I was trying to reproduce our earlier workflow, but overlooked the custom_order setting. I'm still a bit confused about this, though:
I understand that it shouldn't be possible, but still something weird happens there. And in general: should all calendars be 'gregorian', or is it okay for datasets to have different calendars? |
@Peter9192 what type of data is it (monthly, yearly means?), also can you pls post sample minimal recipe with which I could attempt to replicate the behaviour? 🍺 |
Hey @valeriupredoi thanks for looking into this. It's monthly data, that we first yearly means on, and then multimodel means.
See my earlier comment. We need the custom order and the anomalies. However, the behaviour can also be replicated with this: ---
documentation:
description: mwe
authors:
- kalverla_peter
- alidoost_sarah
datasets:
- {dataset: ACCESS1-0, project: CMIP5, mip: Amon, exp: [historical], ensemble: r1i1p1, start_year: 1961, end_year: 1965}
- {dataset: CCSM4, project: CMIP5, mip: Amon, exp: [historical], ensemble: r1i1p1, start_year: 1961, end_year: 1965}
preprocessors:
preprocessor1:
custom_order: True
area_statistics:
operator: mean
annual_statistics:
operator: mean
# regrid_time:
# frequency: yr
multi_model_statistics:
span: full
statistics: [mean]
diagnostics:
mwe:
description: minimal working example
variables:
tas:
preprocessor: preprocessor1
scripts: null Without the In [2]: time = iris.load_cube('CMIP5_ACCESS1-0_Amon_historical_r1i1p1_tas_1961-1965.nc').coord('time')
In [3]: print(time)
DimCoord([1961-06-30 12:00:00, 1962-06-30 12:00:00, 1963-06-30 12:00:00,
1964-06-30 00:00:00, 1965-06-30 12:00:00], bounds=[[1960-12-30 00:00:00, 1961-12-30 00:00:00],
[1961-12-30 00:00:00, 1962-12-30 00:00:00],
[1962-12-30 00:00:00, 1963-12-30 00:00:00],
[1963-12-30 00:00:00, 1964-12-30 00:00:00],
[1964-12-30 00:00:00, 1965-12-30 00:00:00]], standard_name='time', calendar='gregorian', long_name='time', var_name='time')
In [4]: time = iris.load_cube('CMIP5_CCSM4_Amon_historical_r1i1p1_tas_1961-1965.nc').coord('time')
In [5]: print(time)
DimCoord([1961-07-02 12:00:00, 1962-07-02 12:00:00, 1963-07-02 12:00:00,
1964-07-02 12:00:00, 1965-07-02 12:00:00], bounds=[[1961-01-01 00:00:00, 1962-01-01 00:00:00],
[1962-01-01 00:00:00, 1963-01-01 00:00:00],
[1963-01-01 00:00:00, 1964-01-01 00:00:00],
[1964-01-01 00:00:00, 1965-01-01 00:00:00],
[1965-01-01 00:00:00, 1966-01-01 00:00:00]], standard_name='time', calendar='365_day', long_name='time', var_name='time')
In [6]: time = iris.load_cube('MultiModelMean_Amon_tas_1961-1965.nc').coord('time')
In [7]: print(time)
DimCoord([1961-06-28 00:00:00, 1961-07-28 00:00:00, 1962-06-28 00:00:00,
1962-07-28 00:00:00, 1963-06-28 00:00:00, 1963-07-28 00:00:00,
1964-06-29 00:00:00, 1964-07-29 00:00:00, 1965-06-29 00:00:00,
1965-07-29 00:00:00], standard_name='time', calendar='365_day', var_name='time') Note that the source data, from ESGF, are all monthly data with exactly matching datetimes, but different calendars.: In [8]: time = iris.load_cube('ACCESS1-0/r1i1p1/tas_Amon_ACCESS1-0_historical_r1i1p1_185001-200512.nc').coord('time')[:5]
In [9]: print(time)
DimCoord([1850-01-16 12:00:00, 1850-02-15 00:00:00, 1850-03-16 12:00:00,
1850-04-16 00:00:00, 1850-05-16 12:00:00], bounds=[[1850-01-01 00:00:00, 1850-02-01 00:00:00],
[1850-02-01 00:00:00, 1850-03-01 00:00:00],
[1850-03-01 00:00:00, 1850-04-01 00:00:00],
[1850-04-01 00:00:00, 1850-05-01 00:00:00],
[1850-05-01 00:00:00, 1850-06-01 00:00:00]], standard_name='time', calendar='proleptic_gregorian', long_name='time', var_name='time')
In [10]: time = iris.load_cube('tas_Amon_CCSM4_historical_r1i1p1_185001-200512.nc').coord('time')[:5]
In [11]: print(time)
DimCoord([1850-01-16 12:00:00, 1850-02-15 00:00:00, 1850-03-16 12:00:00,
1850-04-16 00:00:00, 1850-05-16 12:00:00], bounds=[[1850-01-01 00:00:00, 1850-02-01 00:00:00],
[1850-02-01 00:00:00, 1850-03-01 00:00:00],
[1850-03-01 00:00:00, 1850-04-01 00:00:00],
[1850-04-01 00:00:00, 1850-05-01 00:00:00],
[1850-05-01 00:00:00, 1850-06-01 00:00:00]], standard_name='time', calendar='365_day', long_name='time', var_name='time') Adding the |
I see. Cheers for the exact pinpointing! It's actually a combination of two things: lack of feature on the multimodel preprocessor side and garden variety of input data time axes. The MM can not actually handle correctly this case since it computes a set on the days and not months too. Let me try fix it and write a test for it too 🍺 |
@Peter9192 could you test your failed recipe with #677 please mate? That should fix the yearly data behaviour 🍺 |
@valeriupredoi I tested again with #677 and sorry, but it fails again. recipe:
And the timestamps of my preproc file
|
I tried now to add to the preprocessor
And now it works! |
also those timestamps look fine to me (cheers for posting them) - what is the issue? There is no risk of spillover since they reset back to 1 if they reach the end of the month - that is, they look good if the data is monthly means |
OK then #677 will not change anything to your data; the monthly data MM dates seem fine to me - can you please state again what the problem is? Also bear in mind that if you see dates in the |
The problem is that it does not reset simply back to 1 but shift to 1 in the next month... I extract two part of the dates: You see that there is always one month missing when changing from 30 to 1. And therefor it adds 14 month in the end. All models end in December 1999 but the MultiModelMean 14 month later... But as I said with |
I can confirm that this fixes our recipe. Let me have a better look at the PR though, see you there. 🍺 |
@LisaBock yes but the actual data point that spills over will be masked so will not affect the actual computation. Can you pls confirm that? 🍺 |
also - beats me if I can see any missing month - can you post that bit of time points (+/- 1 month before and after), it might be just me being blind 😁 |
@valeriupredoi the problem is that for every month which is missing (here: 1960-05 and 2000-12) the time period is extended by one month. As I said all models end in 1999-12 but the MulitModelMean is extended to 2001-04. |
aaah now I see - cheers for the clarification! Not good 😁 - let me extend #677 to monthly data too |
Seems I was too quick to confirm. The duplicates are gone because of #671, but the time array is still showing these strange offsets, and for longer time ranges than in my previous example, they also start wandering, like Lisa described. |
this time round for yearly means data right? |
yep. So the MWE now produces single time stamps per year (which is good, much better than before), but they're not at the first of the month, and for longer time arrays they start shifthing. |
can you pls post a snippet of the time points? |
|
right, cheers, I know where the bugger is! 🍺 |
OK can you guys please give it one more test with #677 - I have shifted the points for |
Works for me! |
brilliant! cheers for the quick test @Peter9192 🍺 @LisaBock please don't dash my hopes 😁 |
@valeriupredoi I am very sorry... |
looking at it right now. I am smelling a cmip3 particular issue, just noticed your isse is from using cmip3 data. Just got back to Jasmin now so I can do proper debugging 🍺 |
@LisaBock a few points:
tas:
preprocessor: clim_ref
reference_dataset: HadCRUT4
project: CMIP3
mip: A1
modeling_realm: atmos
exp: 20c3m
frequency: mon
ensemble: r1
start_year: 1850
end_year: 1999
|
@LisaBock could you give it one more test please: I managed to reproduce your issue (am back on Jasmin yay!) and @Peter9192 spotted where it actually stemmed from so we fixed it in #677 (hopefully!) 🍺 |
@valeriupredoi and @Peter9192 Thanks a lot! It works now for me! Thank you! Thank you! Thank you! |
I tried to calculate the multi-model mean of several models with different time axes. So I chose
span: full
. After that the time axis of the MulitModelMean preproc file has different time points. And it does not fit to the overall time period of the models anymore.Did I something wrong?
Could anybody help me?
Thanks!
recipe to test:
The text was updated successfully, but these errors were encountered: