Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check output consistency of .validate and similar #428

Closed
byersiiasa opened this issue Sep 23, 2020 · 6 comments · Fixed by #429
Closed

Check output consistency of .validate and similar #428

byersiiasa opened this issue Sep 23, 2020 · 6 comments · Fixed by #429
Assignees
Labels
data back-end Anything related to the (timeseries) data back end implementation

Comments

@byersiiasa
Copy link
Collaborator

Came across an issue with the newest version where an output of .validate(), which I think was previously a pd.DataFrame is now a pd.Series.
Not a problem in itself - but scripts that then take this output and perform operations that only work with DataFrame will throw an error. - So backward compatibility is reduced.

In our case it was .drop(..., axis=1) (i.e. drop columns), which is not possible on pd.Series

Error given is:
ValueError: No axis named 1 for object type <class 'pandas.core.series.Series'>

@danielhuppmann
Copy link
Member

Thanks @byersiiasa for identifying this issue!

Let me flip this back to you (also pinging @kvanderwijst @gidden @jkikstra): what would be the "natural" return type of this function?

For the aggregation features, we recently switched all returned objects to an IamDataFrame - would that make sense here as well? Also considering that there is now a info() function (see #427) which is shown by default in Jupyter notebooks...

@danielhuppmann danielhuppmann added data back-end Anything related to the (timeseries) data back end implementation and removed next release labels Sep 25, 2020
@gidden
Copy link
Member

gidden commented Sep 25, 2020

If the return type is always guaranteed to be 1-dimensional, then I would say pd.Series in order to stay in the pandas-verse. Assumptions here based on the docstring:

Returns all scenarios which do not match the criteria and prints a log message, or returns None if all scenarios match the criteria.

@danielhuppmann
Copy link
Member

If staying in the pandas-verse is preferred, I'd rather revert the behavior to a DataFrame rather than a Series. From my own experience, I usually use this in a notebook context where the DataFrame is quicker to parse (visually) than a Series with a complex MultiIndex...

@byersiiasa
Copy link
Collaborator Author

I would probably favour either type DF over Series - Series almost always give me headaches

@jkikstra
Copy link
Collaborator

I also prefer pandas.dataframe.
I find myself using .reset_index() in my workflows basically everytime I "happen to find" a MultiIndex coming from a pyam function.

@danielhuppmann
Copy link
Member

Thanks @jkikstra - good that I implemented it that way yesterday evening... 😜 Added you as a reviewer.

Please create a new issue for any other functions or methods where a non-intuitive or impractical type is returned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data back-end Anything related to the (timeseries) data back end implementation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants