Add an `index` attribute of model-scenario combinations #438

danielhuppmann · 2020-10-10T14:34:57Z

Please confirm that this PR has done the following:

Tests Added
Documentation Added
Description in RELEASE_NOTES.md Added

Description of PR

Per suggestions by @Rlamboll and @znicholls in the discussion related to #432, this PR adds an index attribute, where the index dimensions are those common to both the timeseries data and meta dataframes, i.e., model and scenario.

This allows to easily loop over available models and scenarios, by using

for model, scenario in df.index:
    ...

This snippet is also shown in the docstring of the new function.

To make the meaning of index intuitive for new users of pyam, the info() output is changed to the following (inspired by xarray):

<class 'pyam.core.IamDataFrame'>
Index dimensions:
 * model    : model_a (1)
 * scenario : scen_a, scen_b (2)
Timeseries data coordinates:
   region   : World (1)
   variable : Primary Energy, Primary Energy|Coal (2)
   unit     : EJ/yr (1)
   year     : 2005, 2010 (2)
Meta indicators:
   exclude (bool) False (1)
   number (int64) 1, 2 (2)
   string (object) foo, nan (2)

…nates

Rlamboll · 2020-10-15T13:28:53Z

This works perfectly if we have metadata - are there no circumstances under which we don't (other than calling "del df.meta")? I’m happier calling df.meta.index myself, and this has less chance of being confused with df._data.index or df.data.index (which is what I’d assume the function would return). A more descriptive name would solve this problem, but would probably take longer to type than the actual command.

danielhuppmann · 2020-10-15T16:07:01Z

The whole point of the IamDataFrame is that you have timeseries data and a meta indicators table that are always in sync - in particular after renaming or filtering...

Now that you have inspired this idea, I like it because it allows to show this simple approach for looping over all model-scenario combinations in the API docs. Users might not even think of using df.meta.index and end up doing something like the roundabout way suggested by @znicholls here...

Rlamboll · 2020-10-15T16:39:20Z

It's definitely showcasing a useful feature of the metadata and currently I do do things like Zeb suggested in my code, which makes it ugly. The confusing aspect to me is the name - I'd be much happier if it were called "metaindex" or similar.
Are you irrevocably committed to the idea that metadata can only be model/scenario dependent? I imagine that as regional analysis becomes more common, model/scenario/region metadata will become useful. Silicone does not really permit multiple regions to be involved at the moment but at some point should do, and this feature will stop being useful in many contexts then. Also if you want metadata to hold processing history, you won't be able to combine pre- and post-processed data in the same df unless you add variable name to the meta index. Should a development like this happen, it will make no sense that this is function is called "metaindex" or anything else similar.

danielhuppmann · 2020-10-16T06:02:29Z

I try to avoid the word metadata because this usually refers to the license, authors, etc. of the entire database, scenario ensemble, ...

I see our data model going into a direction of xarray, where each "unit" (i.e., a scenario quantified by a particular model, colloquially also called a scenario) is like a DataArray, and the IamDataFrame is a Dataset. There are dimension coordinates of this unit (model and scenario) that are common across all attributes (currently data and meta), and non-dimension coordinates that are specific to some attributes. This is why I think the more general name index makes sense - it is the intersection of the indices/indexes? of all attributes.

There are a number of validation functions (require_variable, validate, check_aggregate, ...) that have an exclude_on_fail option - if True, mark the model-scenario as exclude=True in meta if the validation fails. This is a key feature and will not change (per your question above).

If we want to have other types of information (e.g., indicators by model/scenario/region), I see this as being implemented as a new attribute, not adding it to meta because this would break the validation feature.

danielhuppmann added 4 commits October 10, 2020 16:29

add index attribute

8e1d1cf

update info to clarify the difference between index and data coordi…

a72eb6d

…nates

add to release notes

8ccd7c2

Merge branch 'master' into feature/index

eaed98e

danielhuppmann marked this pull request as ready for review October 15, 2020 11:49

danielhuppmann merged commit 20414e9 into IAMconsortium:master Oct 16, 2020

danielhuppmann deleted the feature/index branch October 19, 2020 04:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an `index` attribute of model-scenario combinations #438

Add an `index` attribute of model-scenario combinations #438

danielhuppmann commented Oct 10, 2020 •

edited

Loading

Rlamboll commented Oct 15, 2020

danielhuppmann commented Oct 15, 2020

Rlamboll commented Oct 15, 2020

danielhuppmann commented Oct 16, 2020

Add an index attribute of model-scenario combinations #438

Add an index attribute of model-scenario combinations #438

Conversation

danielhuppmann commented Oct 10, 2020 • edited Loading

Please confirm that this PR has done the following:

Description of PR

Rlamboll commented Oct 15, 2020

danielhuppmann commented Oct 15, 2020

Rlamboll commented Oct 15, 2020

danielhuppmann commented Oct 16, 2020

Add an `index` attribute of model-scenario combinations #438

Add an `index` attribute of model-scenario combinations #438

danielhuppmann commented Oct 10, 2020 •

edited

Loading