Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple time units for MFDatasets #435

Closed
huard opened this issue Jan 24, 2017 · 13 comments
Closed

Multiple time units for MFDatasets #435

huard opened this issue Jan 24, 2017 · 13 comments
Assignees
Labels
Milestone

Comments

@huard
Copy link
Contributor

huard commented Jan 24, 2017

This bug affects time indexing when the following conditions are met:

  • opening multiple files (netcdf4.MFDataset) along the time dimension
  • each file has a different time.units (for example "days since 2000-01-01" for the first file, "days since 2010-01-01" for the second file, etc.

What happens is that ocgis does not recognize that the time units are different for each file, and is not able to group data over a period (for example an average from 2030 to 2050).

netcdf4 has a MFTime class that handles this case, but it does not appear to be used in ocgis.

@bekozi bekozi self-assigned this Jan 24, 2017
@bekozi bekozi added the bug label Jan 24, 2017
@bekozi
Copy link
Contributor

bekozi commented Jan 24, 2017

netcdf4 has a MFTime class that handles this case, but it does not appear to be used in ocgis.

Correct. I'll post once I've looked into it a bit more.

bekozi added a commit that referenced this issue Jan 27, 2017
Added MFTime unit test for the netCDF4-python library
@bekozi
Copy link
Contributor

bekozi commented Jan 27, 2017

Note: MFTime will have the same format limitations as MFDataset.

======================================================================
ERROR: test_netCDF4_MFTime (ocgis.test.test_simple.test_dependencies.TestDependencies)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/benkoziol/l/ocgis/src/ocgis/test/test_simple/test_dependencies.py", line 41, in test_netCDF4_MFTime
    mfd = MFDataset(paths)
  File "netCDF4/_netCDF4.pyx", line 5444, in netCDF4._netCDF4.MFDataset.__init__ (netCDF4/_netCDF4.c:64536)
ValueError: MFNetCDF4 only works with NETCDF3_* and NETCDF4_CLASSIC formatted files, not NETCDF4

----------------------------------------------------------------------

bekozi added a commit that referenced this issue Jan 27, 2017
Time variables now use MFTime if the source dataset is multi-file.
@bekozi
Copy link
Contributor

bekozi commented Jan 27, 2017

@huard: See commit (e978a65) introduing MFTime. When you have time to test, let me know how it goes.

@bekozi
Copy link
Contributor

bekozi commented Mar 27, 2017

Fixed in next and v-2.0.0.dev1.

@bekozi bekozi closed this as completed Mar 27, 2017
@huard
Copy link
Contributor Author

huard commented Apr 27, 2017

No go. MFTime is instantiated on time_bnds instead (or as well as) time. Since time_bnds has no calendar attribute, MFTime raises an error:

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.MFTime.__init__ (netCDF4/_netCDF4.c:71210)()

ValueError: MFTime requires that the time variable in all files have a calendar attribute

Tested on pr_Amon_GFDL-ESM2M_rcp45_r1i1p1_200601-201012.nc, pr_Amon_GFDL-ESM2M_rcp45_r1i1p1_200601-201012.nc.

@bekozi bekozi reopened this Apr 27, 2017
@bekozi
Copy link
Contributor

bekozi commented Apr 27, 2017

@huard If it's not too much trouble, could you pass along the metadata dumps for the two files?

@huard
Copy link
Contributor Author

huard commented Apr 27, 2017

netcdf pr_Amon_GFDL-ESM2M_rcp45_r1i1p1_200601-201012 {
dimensions:
	time = UNLIMITED ; // (60 currently)
	lat = 90 ;
	lon = 144 ;
	bnds = 2 ;
variables:
	double average_DT(time) ;
		average_DT:long_name = "Length of average period" ;
		average_DT:units = "days" ;
	double average_T1(time) ;
		average_T1:long_name = "Start time for average period" ;
		average_T1:units = "days since 2006-01-01 00:00:00" ;
	double average_T2(time) ;
		average_T2:long_name = "End time for average period" ;
		average_T2:units = "days since 2006-01-01 00:00:00" ;
	double lat(lat) ;
		lat:long_name = "latitude" ;
		lat:units = "degrees_north" ;
		lat:standard_name = "latitude" ;
		lat:axis = "Y" ;
		lat:bounds = "lat_bnds" ;
	double lon(lon) ;
		lon:long_name = "longitude" ;
		lon:units = "degrees_east" ;
		lon:standard_name = "longitude" ;
		lon:axis = "X" ;
		lon:bounds = "lon_bnds" ;
	double bnds(bnds) ;
		bnds:long_name = "vertex number" ;
		bnds:cartesian_axis = "N" ;
	float pr(time, lat, lon) ;
		pr:long_name = "Precipitation" ;
		pr:units = "kg m-2 s-1" ;
		pr:cell_methods = "time: mean" ;
		pr:interp_method = "conserve_order1" ;
		pr:missing_value = 1.e+20f ;
		pr:_FillValue = 1.e+20f ;
		pr:standard_name = "precipitation_flux" ;
		pr:original_units = "kg/m2/s" ;
		pr:original_name = "precip" ;
		pr:cell_measures = "area: areacella" ;
		pr:associated_files = "baseURL: http://cmip-pcmdi.llnl.gov/CMIP5/dataLocation areacella: areacella_fx_GFDL-ESM2M_rcp45_r0i0p0.nc" ;
	double time(time) ;
		time:long_name = "time" ;
		time:units = "days since 2006-01-01 00:00:00" ;
		time:cartesian_axis = "T" ;
		time:calendar_type = "noleap" ;
		time:calendar = "noleap" ;
		time:bounds = "time_bnds" ;
		time:standard_name = "time" ;
		time:axis = "T" ;
	double time_bnds(time, bnds) ;
		time_bnds:long_name = "time axis boundaries" ;
		time_bnds:units = "days since 2006-01-01 00:00:00" ;
	double lat_bnds(lat, bnds) ;
	double lon_bnds(lon, bnds) ;

// global attributes:
		:title = "NOAA GFDL GFDL-ESM2M, RCP4.5 (run 1) experiment output for CMIP5 AR5" ;
		:institute_id = "NOAA GFDL" ;
		:source = "GFDL-ESM2M 2010 ocean: MOM4 (MOM4p1_x1_Z50_cCM2M,Tripolar360x200L50); atmosphere: AM2 (AM2p14,M45L24); sea ice: SIS (SISp2,Tripolar360x200L50); land: LM3 (LM3p7_cESM,M45)" ;
		:contact = "gfdl.climate.model.info@noaa.gov" ;
		:project_id = "CMIP5" ;
		:table_id = "Table Amon (31 Jan 2011)" ;
		:experiment_id = "rcp45" ;
		:realization = 1 ;
		:modeling_realm = "atmos" ;
		:tracking_id = "ca6e0315-c881-4063-9f7c-751c9d7426ea" ;
		:Conventions = "CF-1.4" ;
		:references = "The GFDL Data Portal (http://nomads.gfdl.noaa.gov/) provides access to NOAA/GFDL\'s publicly available model input and output data sets. From this web site one can view and download data sets and documentation, including those related to the GFDL coupled models experiments run for the IPCC\'s 5th Assessment Report and the US CCSP." ;
		:comment = "GFDL experiment name = ESM2M-HC1_2006-2100_all_rcp45_XC1. PCMDI experiment name = RCP4.5 (run1). Initial conditions for this experiment were taken from 1 January 2006 of the parent experiment, ESM2M-C1_all_historical_HC1 (historical). Several forcing agents varied during the 95 year duration of the RCP4.5 experiment based upon the MINICAM integrated assessment model for the 21st century. The time-varying forcing agents include the well-mixed greenhouse gases (CO2, CH4, N2O, halons), tropospheric and stratospheric O3, model-derived aerosol concentrations (sulfate, black and organic carbon, sea salt and dust), and land use transitions. Volcanic aerosols were zero and solar irradiance varied seasonally based upon late 20th century averages but with no interannual variation. The direct effect of tropospheric aerosols is calculated by the model, but not the indirect effects." ;
		:gfdl_experiment_name = "ESM2M-HC1_2006-2100_all_rcp45_XC1" ;
		:creation_date = "2011-08-12T01:56:03Z" ;
		:model_id = "GFDL-ESM2M" ;
		:branch_time = "52925" ;
		:experiment = "RCP4.5" ;
		:forcing = "GHG,SD,Oz,LU,SS,BC,MD,OC (GHG includes CO2, CH4, N2O, CFC11, CFC12, HCFC22, CFC113)" ;
		:frequency = "mon" ;
		:initialization_method = 1 ;
		:parent_experiment_id = "historical" ;
		:physics_version = 1 ;
		:product = "output1" ;
		:institution = "NOAA GFDL(201 Forrestal Rd, Princeton, NJ, 08540)" ;
		:history = "File was processed by fremetar (GFDL analog of CMOR). TripleID: [exper_id_K92MrW6Oa4,realiz_id_GX4D0HOU9Z,run_id_y4SDyPhdCz]" ;
		:parent_experiment_rip = "r1i1p1" ;

@huard
Copy link
Contributor Author

huard commented Apr 28, 2017

It wasn't clear, but except for the tracking_id and creation date, the metadata is identical for both files.

@bekozi
Copy link
Contributor

bekozi commented Apr 28, 2017

Ha, I figured. Always good to check though. Thanks for passing the metadata along.

I think the correct approach is to have the bounds variable inherit the calendar from its parent variable. Interesting that they listed the units on the time bounds and not the calendar. The calendar is also non-standard in this file.

@bekozi
Copy link
Contributor

bekozi commented Apr 28, 2017

This data does present a problem. The calendar is already inherited from the parent for bounds variables in ocgis, but MFTime operates from-file so there is no opportunity to intercept the metadata. There are two options as I see it:

  1. Add the appropriate attribute to the source data.
  2. Use the dimension_map to ignore the time bounds altogether:
import ocgis

rd = ocgis.RequestDataset(paths)
rd.dimension_map.set_bounds(ocgis.constants.DimensionMapKey.TIME, None)
ops = ocgis.OcgOperations(dataset=rd, ...)

What do you think?

@huard
Copy link
Contributor Author

huard commented Apr 28, 2017 via email

@bekozi
Copy link
Contributor

bekozi commented Apr 28, 2017

How would time subsetting work then if time_bnds is not available ? It
would use time instead?

Yes. time_bnds will still be sliced, but it won't be used for time subsetting. Only the time centroids in time will be used.

I've tried your proposal but I'm missing something. rd.dimension_map is a dict without a set_bounds method.

The dimension map was just recently objectified. You'll need to pull and re-install. Sorry, I should have mentioned that.

@huard
Copy link
Contributor Author

huard commented Apr 28, 2017

Seems to work. Thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants