Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an index attribute of model-scenario combinations #438

Merged
merged 4 commits into from
Oct 16, 2020

Conversation

danielhuppmann
Copy link
Member

@danielhuppmann danielhuppmann commented Oct 10, 2020

Please confirm that this PR has done the following:

  • Tests Added
  • Documentation Added
  • Description in RELEASE_NOTES.md Added

Description of PR

Per suggestions by @Rlamboll and @znicholls in the discussion related to #432, this PR adds an index attribute, where the index dimensions are those common to both the timeseries data and meta dataframes, i.e., model and scenario.

This allows to easily loop over available models and scenarios, by using

for model, scenario in df.index:
    ...

This snippet is also shown in the docstring of the new function.

To make the meaning of index intuitive for new users of pyam, the info() output is changed to the following (inspired by xarray):

<class 'pyam.core.IamDataFrame'>
Index dimensions:
 * model    : model_a (1)
 * scenario : scen_a, scen_b (2)
Timeseries data coordinates:
   region   : World (1)
   variable : Primary Energy, Primary Energy|Coal (2)
   unit     : EJ/yr (1)
   year     : 2005, 2010 (2)
Meta indicators:
   exclude (bool) False (1)
   number (int64) 1, 2 (2)
   string (object) foo, nan (2)

@danielhuppmann danielhuppmann marked this pull request as ready for review October 15, 2020 11:49
@Rlamboll
Copy link
Collaborator

This works perfectly if we have metadata - are there no circumstances under which we don't (other than calling "del df.meta")? I’m happier calling df.meta.index myself, and this has less chance of being confused with df._data.index or df.data.index (which is what I’d assume the function would return). A more descriptive name would solve this problem, but would probably take longer to type than the actual command.

@danielhuppmann
Copy link
Member Author

The whole point of the IamDataFrame is that you have timeseries data and a meta indicators table that are always in sync - in particular after renaming or filtering...

Now that you have inspired this idea, I like it because it allows to show this simple approach for looping over all model-scenario combinations in the API docs. Users might not even think of using df.meta.index and end up doing something like the roundabout way suggested by @znicholls here...

@Rlamboll
Copy link
Collaborator

It's definitely showcasing a useful feature of the metadata and currently I do do things like Zeb suggested in my code, which makes it ugly. The confusing aspect to me is the name - I'd be much happier if it were called "metaindex" or similar.
Are you irrevocably committed to the idea that metadata can only be model/scenario dependent? I imagine that as regional analysis becomes more common, model/scenario/region metadata will become useful. Silicone does not really permit multiple regions to be involved at the moment but at some point should do, and this feature will stop being useful in many contexts then. Also if you want metadata to hold processing history, you won't be able to combine pre- and post-processed data in the same df unless you add variable name to the meta index. Should a development like this happen, it will make no sense that this is function is called "metaindex" or anything else similar.

@danielhuppmann
Copy link
Member Author

I try to avoid the word metadata because this usually refers to the license, authors, etc. of the entire database, scenario ensemble, ...

I see our data model going into a direction of xarray, where each "unit" (i.e., a scenario quantified by a particular model, colloquially also called a scenario) is like a DataArray, and the IamDataFrame is a Dataset. There are dimension coordinates of this unit (model and scenario) that are common across all attributes (currently data and meta), and non-dimension coordinates that are specific to some attributes. This is why I think the more general name index makes sense - it is the intersection of the indices/indexes? of all attributes.

There are a number of validation functions (require_variable, validate, check_aggregate, ...) that have an exclude_on_fail option - if True, mark the model-scenario as exclude=True in meta if the validation fails. This is a key feature and will not change (per your question above).

If we want to have other types of information (e.g., indicators by model/scenario/region), I see this as being implemented as a new attribute, not adding it to meta because this would break the validation feature.

@danielhuppmann danielhuppmann merged commit 20414e9 into IAMconsortium:master Oct 16, 2020
@danielhuppmann danielhuppmann deleted the feature/index branch October 19, 2020 04:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants