diff --git a/RELEASE_NOTES.md b/RELEASE_NOTES.md index e4f9a2eb3..57a64d1d4 100644 --- a/RELEASE_NOTES.md +++ b/RELEASE_NOTES.md @@ -2,6 +2,16 @@ ## API changes +PR [#413](https://github.com/IAMconsortium/pyam/pull/413) changed the +return type of `pyam.read_iiasa()` and `pyam.iiasa.Connection.query()` +to an `IamDataFrame` (instead of a `pandas.DataFrame`) +and loads meta-indicators by default. + +Also, the following functions were deprecated for package consistency: +- `index()` replaces `scenario_list()` for an overview of all scenarios +- `meta_columns` (attribute) replaces `available_metadata()` +- `meta()` replaces `metadata()` + PR [#402](https://github.com/IAMconsortium/pyam/pull/402) changed the default behaviour of `as_pandas()` to include all columns of `meta` in the returned dataframe, or only merge columns given by the renamed argument `meta_cols`. @@ -10,6 +20,7 @@ a utility function `pyam.plotting.mpl_args_to_meta_cols()`. ## Individual Updates +- [#413](https://github.com/IAMconsortium/pyam/pull/413) Refactor IIASA-connection-API and rework all related tests. - [#412](https://github.com/IAMconsortium/pyam/pull/412) Add building the docs to GitHub Actions CI. - [#410](https://github.com/IAMconsortium/pyam/pull/410) Activate tutorial tests on GitHub Actions CI (py3.8). - [#409](https://github.com/IAMconsortium/pyam/pull/409) Remove travis and appveyor CI config. diff --git a/doc/source/tutorials/iiasa_dbs.ipynb b/doc/source/tutorials/iiasa_dbs.ipynb index 232c13433..1e324eb7a 100644 --- a/doc/source/tutorials/iiasa_dbs.ipynb +++ b/doc/source/tutorials/iiasa_dbs.ipynb @@ -10,7 +10,7 @@ "High-profile use cases include the [IAMC 1.5°C Scenario Explorer hosted by IIASA](https://data.ene.iiasa.ac.at/iamc-1.5c-explorer) supporting the *IPCC Special Report on Global Warming of 1.5°C* (SR15) and the Horizon 2020 project [CD-LINKS](https://data.ene.iiasa.ac.at/cd-links).\n", "\n", "IIASA's [modeling platform infrastructure](http://software.ene.iiasa.ac.at/ixmp-server) and the Scenario Explorer UI is not only a great resource on its own, but it also allows the underlying datasets to be directly queried.\n", - "**pyam** takes advantage of this ability to allow you to easily pull data and work with it." + "**pyam** takes advantage of this ability to allow you to easily pull data and work with it in your Python data processing and analysis workflow." ] }, { @@ -26,7 +26,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Accessing an explorer is done via a `Connection` object.\n", + "## Connecting to a data resource (aka the database API of a Scenario Explorer instance)\n", + "\n", + "Accessing a data resource is done via a **Connection** object.\n", "By default, your can connect to all public scenario explorers instances. " ] }, @@ -44,21 +46,26 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "If you have credentials to connect to a non-public or restricted database,\n", - "you can set this (in a separate Python console) using the following command:\n", + "If you have credentials to connect to a non-public or restricted Scenario Explorer instance,\n", + "you can store this information by running the following command in a separate Python console:\n", "\n", "```\n", "import pyam\n", "pyam.iiasa.set_config(, )\n", "```\n", - "When initializing a new `Connection` instance, **pyam** will automatically search for the configuration in a known location." + "When initializing a new **Connection** instance, **pyam** will automatically search for the configuration in a known location." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "In this example, we will be pulling data from the Special Report on 1.5C explorer. This can be done either via the constructor:\n", + "In this example, we will be retrieving data from the *IAMC 1.5°C Scenario Explorer hosted by IIASA*\n", + "([link](https://data.ene.iiasa.ac.at/iamc-1.5c-explorer)),\n", + "which provides the quantiative scenario ensemble underpinning\n", + "the *IPCC Special Report on Global Warming of 1.5C* (SR15).\n", + "\n", + "This can be done either via the constructor:\n", "\n", "```\n", "pyam.iiasa.Connection('iamc15')\n", @@ -76,7 +83,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We also provide some convenience functions to shorten the amount of code you have to write. Under the hood, `read_iiasa()` is just opening an connection to a database and making a query on that data.\n", + "We also provide some convenience functions to shorten the amount of code you have to write. Under the hood, `read_iiasa()` is just opening a connection to a database API and sends a query to the resource.\n", + "\n", "In this tutorial, we will query specific subsets of data in a manner similar to `pyam.IamDataFrame.filter()`." ] }, @@ -99,7 +107,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Here we pulled out all times series data for model(s) that start with 'MESSAGEix' that are in the 'World' region and associated with the two named variables. We also added the \"category\" metadata, which tells us the climate impact categorisation of each scenario as assessed in the IPCC SR15.\n", + "Here we pulled out all times series data for model(s) that start with 'MESSAGEix' that are in the 'World' region and associated with the two named variables. We also added the meta column \"category\", which tells us the climate impact categorisation of each scenario as assessed in the IPCC SR15.\n", "\n", "Let's plot CO2 emissions." ] @@ -143,7 +151,7 @@ "source": [ "## Exploring the data resource\n", "\n", - "If you're interested in what data is actually in the data source, you can use **pyam.iiasa.Connection** to do so." + "If you're interested in what data is available in the data source, you can use **pyam.iiasa.Connection** to do so." ] }, { @@ -159,7 +167,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The `conn` object has a number of useful functions for listing what's in the dataset. A few of them are shown below." + "The **Connection** object has a number of useful functions for listing what's available in the data resource.\n", + "These functions follow the conventions of the **IamDataFrame** class (where possible).\n", + "\n", + "A few of them are shown below." ] }, { @@ -202,8 +213,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "A number of different kinds of indicators are available for model/scenario combinations.\n", - "We queried the \"category\" metadata in the above example, but there are many more. You can see them with" + "A number of different categorization and quantitative indicators are available for model/scenario combinations.\n", + "These are usually called `meta` indicators in **pyam**.\n", + "\n", + "We queried the meta-indicator \"category\" in the above example, but there are many more.\n", + "You can get a list with the following command:" ] }, { @@ -212,14 +226,14 @@ "metadata": {}, "outputs": [], "source": [ - "conn.available_metadata().head()" + "conn.meta_columns.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "You can directly query the **Connection**, which will give you a [pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html)." + "You can directly query the **Connection**, which will return a **pyam.IamDataFrame**..." ] }, { @@ -232,15 +246,14 @@ " model='MESSAGEix*', \n", " variable=['Emissions|CO2', 'Primary Energy|Coal'], \n", " region='World'\n", - ")\n", - "df.head()" + ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "And you can easily turn this into a **pyam.IamDataFrame** to continue your analysis." + "...so that you can directly continue with your analysis and visualization workflow using **pyam**!" ] }, { @@ -249,12 +262,18 @@ "metadata": {}, "outputs": [], "source": [ - "df = pyam.IamDataFrame(df)\n", "ax = df.filter(variable='Primary Energy|Coal').line_plot(\n", " color='scenario', \n", " legend=dict(loc='center left', bbox_to_anchor=(1.0, 0.5))\n", ")" ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { @@ -273,7 +292,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.4" + "version": "3.7.7" } }, "nbformat": 4, diff --git a/pyam/iiasa.py b/pyam/iiasa.py index 34800ce25..a280178e0 100644 --- a/pyam/iiasa.py +++ b/pyam/iiasa.py @@ -11,14 +11,14 @@ from collections.abc import Mapping from pyam.core import IamDataFrame -from pyam.utils import META_IDX, islistable, isstr, pattern_match +from pyam.utils import META_IDX, IAMC_IDX, isstr, pattern_match from pyam.logging import deprecation_warning logger = logging.getLogger(__name__) -# quiet this fool +# set requests-logger to WARNING only logging.getLogger('requests').setLevel(logging.WARNING) -_BASE_URL = 'https://db1.ene.iiasa.ac.at/EneAuth/config/v1' +_AUTH_URL = 'https://db1.ene.iiasa.ac.at/EneAuth/config/v1' _CITE_MSG = """ You are connected to the {} scenario explorer hosted by IIASA. If you use this data in any published format, please cite the @@ -83,9 +83,9 @@ def _get_token(creds, base_url): if plaintextcreds: logger.warning('You provided credentials in plain text. DO NOT save ' 'these in a repository or otherwise post them online') - deprecation_warning('Providing credentials in plain text', - 'Please use `pyam.iiasa.set_config(, )`' - ' to store your credentials in a file!') + deprecation_warning('Please use `pyam.iiasa.set_config(, )`' + ' to store your credentials in a file!', + 'Providing credentials in plain text') # get user token headers = {'Accept': 'application/json', @@ -98,13 +98,14 @@ def _get_token(creds, base_url): class Connection(object): - """A class to facilitate querying an IIASA scenario explorer database + """A class to facilitate querying an IIASA Scenario Explorer database API Parameters ---------- name : str, optional - A valid database name. For available options, see - valid_connections(). + The name of a database API. + See :attr:`pyam.iiasa.Connection.valid_connections` for a list + of available APIs. creds : str, :class:`pathlib.Path`, list-like, or dict, optional By default, this function will (try to) read user credentials which were set using :meth:`pyam.iiasa.set_config(, )`. @@ -119,9 +120,9 @@ class Connection(object): for backwards compatibility. However, this option is NOT RECOMMENDED and will be deprecated in future releases of pyam. """ - def __init__(self, name=None, creds=None, base_url=_BASE_URL): - self._base_url = base_url - self._token, self._user = _get_token(creds, base_url=self._base_url) + def __init__(self, name=None, creds=None, auth_url=_AUTH_URL): + self._auth_url = auth_url + self._token, self._user = _get_token(creds, base_url=self._auth_url) # connect if provided a name self._connected = None @@ -131,12 +132,12 @@ def __init__(self, name=None, creds=None, base_url=_BASE_URL): if self._user: logger.info(f'You are connected as user `{self._user}`') else: - logger.info(f'You are connected as an anonymous user') + logger.info('You are connected as an anonymous user') @property @lru_cache() def _connection_map(self): - url = '/'.join([self._base_url, 'applications']) + url = '/'.join([self._auth_url, 'applications']) headers = {'Authorization': 'Bearer {}'.format(self._token)} r = requests.get(url, headers=headers) _check_response(r, 'Could not get valid connection list') @@ -164,14 +165,11 @@ def _connection_map(self): @property @lru_cache() def valid_connections(self): - """ Show a list of valid connection names (application aliases or - names when alias is not available or duplicated) - - :return: list of str - """ + """Return available resources (database API connections)""" return list(self._connection_map.keys()) def connect(self, name): + """Connect to a specific resource (database API)""" if name in self._connection_map: name = self._connection_map[name] @@ -186,104 +184,129 @@ def connect(self, name): {} not recognized as a valid connection name. Choose from one of the supported connections for your user: {}. """ - raise ValueError(msg.format(name, valid)) + raise ValueError(msg.format(name, self._connection_map.keys())) - url = '/'.join([self._base_url, 'applications', name, 'config']) + url = '/'.join([self._auth_url, 'applications', name, 'config']) headers = {'Authorization': 'Bearer {}'.format(self._token)} r = requests.get(url, headers=headers) _check_response(r, 'Could not get application information') response = r.json() idxs = {x['path']: i for i, x in enumerate(response)} - self._base_url = response[idxs['baseUrl']]['value'] - # TODO: request the full citation to be added to this metadata instead - # of linking to the about page - about = '/'.join([response[idxs['uiUrl']]['value'], '#', 'about']) - logger.info(_CITE_MSG.format(name, about)) + self._auth_url = response[idxs['baseUrl']]['value'] + # TODO: proper citation (as metadata) instead of link to the about page + if 'uiUrl' in idxs: + about = '/'.join([response[idxs['uiUrl']]['value'], '#', 'about']) + logger.info(_CITE_MSG.format(name, about)) + # TODO: use API "nice-name" self._connected = name @property def current_connection(self): + """Currently connected resource (database API connection)""" return self._connected - @lru_cache() - def scenario_list(self, default=True): - """ - Metadata regarding the list of scenarios (e.g., models, scenarios, - run identifier, etc.) in the connected data source. + def index(self, default=True): + """Return the index of models and scenarios in the connected resource Parameters ---------- default : bool, optional - Return *only* the default version of each Scenario. - Any (`model`, `scenario`) without a default version is omitted. - If :obj:`False`, return all versions. + If `True`, return *only* the default version of a model/scenario. + Any model/scenario without a default version is omitted. + If `False`, returns all versions. """ + cols = ['version'] if default else ['version', 'is_default'] + return self._query_index(default)[META_IDX + cols].set_index(META_IDX) + + def scenario_list(self, default=True): + """Deprecated, use :meth:`Connection.index`""" + deprecation_warning('Use `Connection.index()` instead.') + return self._query_index(default) + + @lru_cache() + def _query_index(self, default=True): + # TODO merge this function with `meta()` default = 'true' if default else 'false' add_url = 'runs?getOnlyDefaultRuns={}' - url = '/'.join([self._base_url, add_url.format(default)]) + url = '/'.join([self._auth_url, add_url.format(default)]) headers = {'Authorization': 'Bearer {}'.format(self._token)} r = requests.get(url, headers=headers) - _check_response(r, 'Could not get scenario list') + _check_response(r, 'Could not retrieve the resource index') return pd.read_json(r.content, orient='records') + @property @lru_cache() - def available_metadata(self): - """List all available meta indicators in the instance""" - url = '/'.join([self._base_url, 'metadata/types']) + def meta_columns(self): + """Return the list of meta indicators in the connected resource""" + url = '/'.join([self._auth_url, 'metadata/types']) headers = {'Authorization': 'Bearer {}'.format(self._token)} r = requests.get(url, headers=headers) _check_response(r) return pd.read_json(r.content, orient='records')['name'] + def available_metadata(self): + """Deprecated, use :attr:`Connection.meta_columns`""" + # TODO: deprecate/remove this function in release >=0.8 + deprecation_warning('Use `Connection.meta_columns` instead.') + return self.meta_columns + @lru_cache() - def metadata(self, default=True): - """All meta categories and indicators of scenarios + def meta(self, default=True): + """Return categories and indicators (meta) of scenarios Parameters ---------- default : bool, optional - Return *only* the default version of each Scenario. + Return *only* the default version of each scenario. Any (`model`, `scenario`) without a default version is omitted. If :obj:`False`, return all versions. """ - # at present this reads in all data for all scenarios, it could be sped - # up in the future to try to query a subset - default = 'true' if default else 'false' + # TODO: at present this reads in all data for all scenarios, + # it could be sped up in the future to try to query a subset + _default = 'true' if default else 'false' add_url = 'runs?getOnlyDefaultRuns={}&includeMetadata=true' - url = '/'.join([self._base_url, add_url.format(default)]) + url = '/'.join([self._auth_url, add_url.format(_default)]) headers = {'Authorization': 'Bearer {}'.format(self._token)} r = requests.get(url, headers=headers) _check_response(r) df = pd.read_json(r.content, orient='records') + cols = ['version'] if default else ['version', 'is_default'] + def extract(row): return ( - pd.concat([row[['model', 'scenario']], + pd.concat([row[META_IDX + cols], pd.Series(row.metadata)]) .to_frame() .T .set_index(['model', 'scenario']) ) - return pd.concat([extract(row) for idx, row in df.iterrows()], - sort=False).reset_index() + return pd.concat([extract(row) for i, row in df.iterrows()], + sort=False) + + def metadata(self, default=True): + """Deprecated, use :meth:`Connection.meta`""" + # TODO: deprecate/remove this function in release >=0.8 + deprecation_warning('Use `Connection.meta()` instead.') + return self.meta(default=default) def models(self): - """All models in the connected data source""" - return pd.Series(self.scenario_list()['model'].unique(), + """List all models in the connected resource""" + return pd.Series(self._query_index()['model'].unique(), name='model') def scenarios(self): - """All scenarios in the connected data source""" - return pd.Series(self.scenario_list()['scenario'].unique(), + """List all scenarios in the connected resource""" + return pd.Series(self._query_index()['scenario'].unique(), name='scenario') @lru_cache() def variables(self): - """All variables in the connected data source""" - url = '/'.join([self._base_url, 'ts']) + """List all variables in the connected resource""" + url = '/'.join([self._auth_url, 'ts']) headers = {'Authorization': 'Bearer {}'.format(self._token)} r = requests.get(url, headers=headers) _check_response(r) @@ -292,7 +315,7 @@ def variables(self): @lru_cache() def regions(self, include_synonyms=False): - """All regions in the connected data source + """List all regions in the connected resource Parameters ---------- @@ -301,7 +324,7 @@ def regions(self, include_synonyms=False): (possibly leading to duplicate region names for regions with more than one synonym) """ - url = '/'.join([self._base_url, 'nodes?hierarchy=%2A']) + url = '/'.join([self._auth_url, 'nodes?hierarchy=%2A']) headers = {'Authorization': 'Bearer {}'.format(self._token)} params = {'includeSynonyms': include_synonyms} r = requests.get(url, headers=headers, params=params) @@ -329,7 +352,8 @@ def convert_regions_payload(response, include_synonyms): def _query_post_data(self, **kwargs): def _get_kwarg(k): - x = kwargs.pop(k, []) + # TODO refactor API to return all models if model-list is empty + x = kwargs.pop(k, '*' if k == 'model' else []) return [x] if isstr(x) else x m_pattern = _get_kwarg('model') @@ -348,7 +372,7 @@ def _match(data, patterns): return data[matches].unique() # get unique run ids - meta = self.scenario_list() + meta = self._query_index() meta = meta[meta.is_default] models = _match(meta['model'], m_pattern) scenarios = _match(meta['scenario'], s_pattern) @@ -385,81 +409,112 @@ def _match(data, patterns): } return data - def query(self, **kwargs): - """Query the data source with filters + def query(self, default=True, meta=True, **kwargs): + """Query the connected resource for timeseries data (with filters) + + Parameters + ---------- + default : bool, optional + Return *only* the default version of each scenario. + Any (`model`, `scenario`) without a default version is omitted. + If :obj:`False`, return all versions. + meta : bool or list, optional + If :obj:`True`, merge all meta columns indicators + (or subset if list is given). + kwargs + Available keyword arguments include - Available keyword arguments include + - model + - scenario + - region + - variable - - model - - scenario - - region - - variable + Returns + ------- + IamDataFrame Examples -------- - You can read from a :class:`pyam.iiasa.Connection` instance using keyword arguments similar to filtering an :class:`IamDataFrame`: .. code-block:: python - Connection.query(model='MESSAGE', scenario='SSP2*', + Connection.query(model='MESSAGE*', scenario='SSP2*', variable=['Emissions|CO2', 'Primary Energy']) """ + # TODO: API returns timeseries data for non-default versions + if default is not True: + msg = 'Querying for non-default scenarios is not (yet) supported' + raise ValueError(msg) + + # retrieve data headers = { 'Authorization': 'Bearer {}'.format(self._token), 'Content-Type': 'application/json', } - data = json.dumps(self._query_post_data(**kwargs)) - url = '/'.join([self._base_url, 'runs/bulk/ts']) - logger.debug('Querying timeseries data ' - 'from {} with filter {}'.format(url, data)) - r = requests.post(url, headers=headers, data=data) + _args = json.dumps(self._query_post_data(**kwargs)) + url = '/'.join([self._auth_url, 'runs/bulk/ts']) + logger.debug(f'Query timeseries data from {url} with data {_args}') + r = requests.post(url, headers=headers, data=_args) _check_response(r) # refactor returned json object to be castable to an IamDataFrame dtype = dict(model=str, scenario=str, variable=str, unit=str, region=str, year=int, value=float, version=int) - df = pd.read_json(r.content, orient='records', dtype=dtype) - logger.debug('Response size is {0} bytes, ' - '{1} records'.format(len(r.content), len(df))) - columns = ['model', 'scenario', 'variable', 'unit', - 'region', 'year', 'value', 'time', 'meta', - 'version'] + data = pd.read_json(r.content, orient='records', dtype=dtype) + logger.debug(f'Response: {len(r.content)} bytes, {len(data)} records') + cols = IAMC_IDX + ['year', 'value', 'subannual', 'version'] # keep only known columns or init empty df - df = pd.DataFrame(data=df, columns=columns) - # replace missing meta (for backward compatibility) - df.fillna({'meta': 0}, inplace=True) - df.fillna({'time': 0}, inplace=True) - df.rename(columns={'time': 'subannual'}, inplace=True) - # check if returned dataframe has subannual disaggregation, drop if not - if pd.Series([i in [-1, 'year'] for i in df.subannual]).all(): - df.drop(columns='subannual', inplace=True) + data = pd.DataFrame(data=data, columns=cols) + + # check if timeseries data has subannual disaggregation, drop if not + if 'subannual' in data: + timeslices = data.subannual.dropna().unique() + if all([i in [-1, 'Year'] for i in timeslices]): + data.drop(columns='subannual', inplace=True) + # check if there are multiple version for any model/scenario lst = ( - df[META_IDX + ['version']].drop_duplicates() + data[META_IDX + ['version']].drop_duplicates() .groupby(META_IDX).count().version ) + # checking if there are multiple versions # for every model/scenario combination + # TODO this is probably not necessary if len(lst) > 1 and max(lst) > 1: raise ValueError('multiple versions for {}'.format( lst[lst > 1].index.to_list())) - df.drop(columns='version', inplace=True) + data.drop(columns='version', inplace=True) + + # cast to IamDataFrame + df = IamDataFrame(data) + + # merge meta categorization and quantitative indications + if meta: + _meta = self.meta().loc[df.meta.index] + for i in _meta.columns if meta is True else meta + ['version']: + df.set_meta(_meta[i]) - return df + return IamDataFrame(df) -def read_iiasa(name, meta=False, creds=None, base_url=_BASE_URL, **kwargs): - """Read data from an IIASA scenario explorer and return as IamDataFrame +def read_iiasa(name, default=True, meta=True, creds=None, base_url=_AUTH_URL, + **kwargs): + """Query an IIASA Scenario Explorer database API and return as IamDataFrame Parameters ---------- name : str A valid name of an IIASA scenario explorer instance, see :attr:`pyam.iiasa.Connection.valid_connections` - meta : bool or list of strings - If :obj:`True`, include all meta categories & quantitative indicators + default : bool, optional + Return *only* the default version of each scenario. + Any (`model`, `scenario`) without a default version is omitted. + If :obj:`False`, return all versions. + meta : bool or list of strings, optional + If `True`, include all meta categories & quantitative indicators (or subset if list is given). creds : dict Credentials to access scenario explorer instance and @@ -469,20 +524,5 @@ def read_iiasa(name, meta=False, creds=None, base_url=_BASE_URL, **kwargs): kwargs Arguments for :meth:`pyam.iiasa.Connection.query` """ - conn = Connection(name, creds, base_url) - # data - df = IamDataFrame(conn.query(**kwargs)) - # meta: categorization and quantitative indications - if meta: - mdf = conn.metadata() - # only data for models/scenarios in df - mdf = mdf[mdf.model.isin(df['model'].unique()) & - mdf.scenario.isin(df['scenario'].unique())] - # get subset of data if meta is a list - if islistable(meta): - mdf = mdf[['model', 'scenario'] + meta] - mdf = mdf.set_index(['model', 'scenario']) - # we have to loop here because `set_meta()` can only take series - for col in mdf: - df.set_meta(mdf[col]) - return df + return Connection(name, creds, base_url)\ + .query(default=default, meta=meta, **kwargs) diff --git a/pyam/testing.py b/pyam/testing.py index 941d15a11..cc44a421e 100644 --- a/pyam/testing.py +++ b/pyam/testing.py @@ -8,4 +8,5 @@ def assert_iamframe_equal(a, b, **assert_kwargs): msg = 'IamDataFrame.data are different: \n {}' raise AssertionError(msg.format(diff.head())) - pdt.assert_frame_equal(a.meta, b.meta, **assert_kwargs) + pdt.assert_frame_equal(a.meta, b.meta, check_dtype=False, check_like=True, + **assert_kwargs) diff --git a/tests/conftest.py b/tests/conftest.py index 0520e4f06..db833a983 100644 --- a/tests/conftest.py +++ b/tests/conftest.py @@ -18,6 +18,9 @@ except SSLError: IIASA_UNAVAILABLE = True +TEST_API = 'integration-test' +TEST_API_NAME = 'IXSE_INTEGRATION_TEST' + here = os.path.dirname(os.path.realpath(__file__)) IMAGE_BASELINE_DIR = os.path.join(here, 'expected_figs') @@ -181,3 +184,9 @@ def plot_df(): def plot_stack_plot_df(): df = IamDataFrame(TEST_STACKPLOT_DF) yield df + + +@pytest.fixture(scope="session") +def conn(): + if not IIASA_UNAVAILABLE: + return iiasa.Connection(TEST_API) diff --git a/tests/test_iiasa.py b/tests/test_iiasa.py index 0704491d3..6ae3ef43c 100644 --- a/tests/test_iiasa.py +++ b/tests/test_iiasa.py @@ -1,22 +1,19 @@ import os import copy -import yaml import pytest +import pandas as pd +import numpy as np + import numpy.testing as npt -from requests.exceptions import SSLError +import pandas.testing as pdt -from pyam import iiasa -from conftest import IIASA_UNAVAILABLE +from pyam import IamDataFrame, iiasa, read_iiasa, META_IDX +from pyam.testing import assert_iamframe_equal +from conftest import IIASA_UNAVAILABLE, TEST_API, TEST_API_NAME if IIASA_UNAVAILABLE: pytest.skip('IIASA database API unavailable', allow_module_level=True) -# verify whether IIASA database API can be reached, skip tests otherwise -try: - iiasa.Connection() -except SSLError: - pytest.skip('IIASA database API unavailable', allow_module_level=True) - # check to see if we can do online testing of db authentication TEST_ENV_USER = 'IIASA_CONN_TEST_USER' TEST_ENV_PW = 'IIASA_CONN_TEST_PW' @@ -25,74 +22,87 @@ TEST_ENV_USER, TEST_ENV_PW ) +VERSION_COLS = ['version', 'is_default'] +META_COLS = ['number', 'string'] +META_DF = pd.DataFrame([ + ['model_a', 'scen_a', 1, True, 1, 'foo'], + ['model_a', 'scen_b', 1, True, 2, np.nan], + ['model_a', 'scen_a', 2, False, 1, 'bar'], + ['model_b', 'scen_a', 1, True, 3, 'baz'] +], columns=META_IDX + VERSION_COLS + META_COLS).set_index(META_IDX) + +MODEL_B_DF = pd.DataFrame([ + ['Primary Energy', 'EJ/yr', 'Summer', 1, 3], + ['Primary Energy', 'EJ/yr', 'Year', 3, 8], + ['Primary Energy|Coal', 'EJ/yr', 'Summer', 0.4, 2], + ['Primary Energy|Coal', 'EJ/yr', 'Year', 0.9, 5] +], columns=['variable', 'unit', 'subannual', 2005, 2010]) + + +def test_unknown_conn(): + # connecting to an unknown API raises an error + pytest.raises(ValueError, iiasa.Connection, 'foo') + -def test_anon_conn(): - conn = iiasa.Connection('IXSE_SR15') - assert conn.current_connection == 'IXSE_SR15' +def test_valid_connections(): + # connecting to an unknown API raises an error + assert TEST_API in iiasa.Connection().valid_connections -def test_anon_conn_warning(): - conn = iiasa.Connection('iamc15') - assert conn.current_connection == 'IXSE_SR15' +def test_anon_conn(conn): + assert conn.current_connection == TEST_API_NAME @pytest.mark.skipif(not CONN_ENV_AVAILABLE, reason=CONN_ENV_REASON) -def test_conn_creds_file(tmp_path): - user, pw = os.environ[TEST_ENV_USER], os.environ[TEST_ENV_PW] - path = tmp_path / 'config.yaml' - with open(path, 'w') as f: - yaml.dump({'username': user, 'password': pw}, f) - conn = iiasa.Connection('IXSE_SR15', creds=path) - assert conn.current_connection == 'IXSE_SR15' +def test_conn_creds_config(): + iiasa.set_config(os.environ[TEST_ENV_USER], os.environ[TEST_ENV_PW]) + conn = iiasa.Connection(TEST_API) + assert conn.current_connection == TEST_API_NAME @pytest.mark.skipif(not CONN_ENV_AVAILABLE, reason=CONN_ENV_REASON) def test_conn_creds_tuple(): user, pw = os.environ[TEST_ENV_USER], os.environ[TEST_ENV_PW] - conn = iiasa.Connection('IXSE_SR15', creds=(user, pw)) - assert conn.current_connection == 'IXSE_SR15' - - -def test_conn_bad_creds(): - pytest.raises(RuntimeError, iiasa.Connection, - 'IXSE_SR15', creds=('_foo', '_bar')) - - -def test_anon_conn_tuple_raises(): - pytest.raises(ValueError, iiasa.Connection, 'foo') + conn = iiasa.Connection(TEST_API, creds=(user, pw)) + assert conn.current_connection == TEST_API_NAME @pytest.mark.skipif(not CONN_ENV_AVAILABLE, reason=CONN_ENV_REASON) def test_conn_creds_dict(): user, pw = os.environ[TEST_ENV_USER], os.environ[TEST_ENV_PW] - conn = iiasa.Connection( - 'IXSE_SR15', creds={'username': user, 'password': pw}) - assert conn.current_connection == 'IXSE_SR15' + conn = iiasa.Connection(TEST_API, creds={'username': user, 'password': pw}) + assert conn.current_connection == TEST_API_NAME + + +def test_conn_bad_creds(): + # connecting with invalid credentials raises an error + creds = ('_foo', '_bar') + pytest.raises(RuntimeError, iiasa.Connection, TEST_API, creds=creds) def test_conn_creds_dict_raises(): - pytest.raises(KeyError, iiasa.Connection, - 'IXSE_SR15', creds={'username': 'foo'}) + # connecting with incomplete credentials as dictionary raises an error + creds = {'username': 'foo'} + pytest.raises(KeyError, iiasa.Connection, TEST_API, creds=creds) -def test_variables(): - conn = iiasa.Connection('IXSE_SR15') - obs = conn.variables().values - assert 'Emissions|CO2' in obs +def test_variables(conn): + # check that connection returns the correct variables + npt.assert_array_equal(conn.variables(), + ['Primary Energy', 'Primary Energy|Coal']) -def test_regions(): - conn = iiasa.Connection('IXSE_SR15') - obs = conn.regions().values - assert 'World' in obs +def test_regions(conn): + # check that connection returns the correct regions + npt.assert_array_equal(conn.regions(), ['World', 'region_a']) -def test_regions_with_synonyms(): - conn = iiasa.Connection('IXSE_SR15') + +def test_regions_with_synonyms(conn): obs = conn.regions(include_synonyms=True) - assert 'synonym' in obs.columns - assert (obs[obs.region == 'R5ROWO'] - .synonym == 'Rest of the World (R5)').all() + exp = pd.DataFrame([['World', None], ['region_a', 'ISO_a']], + columns=['region', 'synonym']) + pdt.assert_frame_equal(obs, exp) def test_regions_empty_response(): @@ -131,105 +141,121 @@ def test_regions_with_synonyms_response(): .synonym.isin(['Deutschland', 'DE'])).all() -def test_metadata(): - conn = iiasa.Connection('IXSE_SR15') - obs = conn.scenario_list()['model'].values - assert 'MESSAGEix-GLOBIOM 1.0' in obs - - -def test_available_indicators(): - conn = iiasa.Connection('IXSE_SR15') - obs = conn.available_metadata() - assert 'carbon price|2050' in list(obs) - - -QUERY_DATA_EXP = { - "filters": { - "regions": [], - "variables": [], - "runs": [], - "years": [], - "units": [], - "timeslices": [] - } -} - - -def test_query_data_model_scen(): - conn = iiasa.Connection('IXSE_SR15') - obs = conn._query_post_data(model='AIM*', scenario='ADVANCE_2020_Med2C') - exp = copy.deepcopy(QUERY_DATA_EXP) - exp['filters']['runs'] = [2] - assert obs == exp - - -def test_query_data_region(): - conn = iiasa.Connection('IXSE_SR15') - obs = conn._query_post_data(model='AIM*', scenario='ADVANCE_2020_Med2C', - region='*World*') - exp = copy.deepcopy(QUERY_DATA_EXP) - exp['filters']['runs'] = [2] - exp['filters']['regions'] = ['World'] - assert obs == exp - - -def test_query_data_variables(): - conn = iiasa.Connection('IXSE_SR15') - obs = conn._query_post_data(model='AIM*', scenario='ADVANCE_2020_Med2C', - variable='Emissions|CO2*') - exp = copy.deepcopy(QUERY_DATA_EXP) - exp['filters']['runs'] = [2] - exp['filters']['variables'] = [ - 'Emissions|CO2', 'Emissions|CO2|AFOLU', 'Emissions|CO2|Energy', - 'Emissions|CO2|Energy and Industrial Processes', - 'Emissions|CO2|Energy|Demand', 'Emissions|CO2|Energy|Demand|AFOFI', - 'Emissions|CO2|Energy|Demand|Industry', - 'Emissions|CO2|Energy|Demand|Other Sector', - 'Emissions|CO2|Energy|Demand|Residential and Commercial', - 'Emissions|CO2|Energy|Demand|Transportation', - 'Emissions|CO2|Energy|Supply', - 'Emissions|CO2|Energy|Supply|Electricity', - 'Emissions|CO2|Energy|Supply|Gases', - 'Emissions|CO2|Energy|Supply|Heat', - 'Emissions|CO2|Energy|Supply|Liquids', - 'Emissions|CO2|Energy|Supply|Other Sector', - 'Emissions|CO2|Energy|Supply|Solids', - 'Emissions|CO2|Industrial Processes', 'Emissions|CO2|Other' - ] - for k in obs['filters']: - npt.assert_array_equal(obs['filters'][k], exp['filters'][k]) - - -def test_query_IXSE_SR15(): - df = iiasa.read_iiasa('IXSE_SR15', - model='AIM*', - scenario='ADVANCE_2020_Med2C', - variable='Emissions|CO2', - region='World', - ) - assert len(df) == 20 - - -def test_query_IXSE_AR6(): - with pytest.raises(RuntimeError) as excinfo: - variable = 'Emissions|CO2|Energy|Demand|Transportation' - creds = dict(username='mahamba', password='verysecret') - iiasa.read_iiasa('IXSE_AR6', - scenario='ADVANCE_2020_WB2C', - model='AIM/CGE 2.0', - region='World', - variable=variable, - creds=creds) - assert str(excinfo.value).startswith('Login failed for user: mahamba') - - -def test_query_IXSE_SR15_with_metadata(): - df = iiasa.read_iiasa('IXSE_SR15', - model='MESSAGEix*', - variable=['Emissions|CO2', 'Primary Energy|Coal'], - region='World', - meta=['carbon price|2100 (NPV)', 'category'], - ) - assert len(df) == 168 - assert len(df.data) == 168 - assert len(df.meta) == 7 +def test_meta_columns(conn): + # test that connection returns the correct list of meta indicators + npt.assert_array_equal(conn.meta_columns, META_COLS) + + # test for deprecated version of the function + npt.assert_array_equal(conn.available_metadata(), META_COLS) + + +@pytest.mark.parametrize("default", [True, False]) +def test_index(conn, default): + # test that connection returns the correct index + if default: + exp = META_DF.loc[META_DF.is_default, ['version']] + else: + exp = META_DF[VERSION_COLS] + + pdt.assert_frame_equal(conn.index(default=default), exp, check_dtype=False) + + +@pytest.mark.parametrize("default", [True, False]) +def test_meta(conn, default): + # test that connection returns the correct meta dataframe + if default: + exp = META_DF.loc[META_DF.is_default, ['version'] + META_COLS] + else: + exp = META_DF[VERSION_COLS + META_COLS] + + pdt.assert_frame_equal(conn.meta(default=default), exp, check_dtype=False) + + # test for deprecated version of the function + pdt.assert_frame_equal(conn.metadata(default=default), exp, + check_dtype=False) + + +@pytest.mark.parametrize("kwargs", [ + {}, + dict(variable='Primary Energy'), + dict(scenario='scen_a', variable='Primary Energy') +]) +def test_query_year(conn, test_df_year, kwargs): + # test reading timeseries data (`model_a` has only yearly data) + exp = test_df_year.copy() + for i in ['version'] + META_COLS: + exp.set_meta(META_DF.iloc[[0, 1]][i]) + + # test method via Connection + df = conn.query(model='model_a', **kwargs) + assert_iamframe_equal(df, exp.filter(**kwargs)) + + # test top-level method + df = read_iiasa(TEST_API, model='model_a', **kwargs) + assert_iamframe_equal(df, exp.filter(**kwargs)) + + +@pytest.mark.parametrize("kwargs", [ + {}, + dict(variable='Primary Energy'), + dict(scenario='scen_a', variable='Primary Energy') +]) +def test_query_with_subannual(conn, test_pd_df, kwargs): + # test reading timeseries data (including subannual data) + exp = IamDataFrame(test_pd_df, subannual='Year')\ + .append(MODEL_B_DF, model='model_b', scenario='scen_a', region='World') + for i in ['version'] + META_COLS: + exp.set_meta(META_DF.iloc[[0, 1, 3]][i]) + + # test method via Connection + df = conn.query(**kwargs) + assert_iamframe_equal(df, exp.filter(**kwargs)) + + # test top-level method + df = read_iiasa(TEST_API, **kwargs) + assert_iamframe_equal(df, exp.filter(**kwargs)) + + +@pytest.mark.parametrize("kwargs", [ + {}, + dict(variable='Primary Energy'), + dict(scenario='scen_a', variable='Primary Energy') +]) +def test_query_with_meta_arg(conn, test_pd_df, kwargs): + # test reading timeseries data (including subannual data) + exp = IamDataFrame(test_pd_df, subannual='Year')\ + .append(MODEL_B_DF, model='model_b', scenario='scen_a', region='World') + for i in ['version', 'string']: + exp.set_meta(META_DF.iloc[[0, 1, 3]][i]) + + # test method via Connection + df = conn.query(meta=['string'], **kwargs) + assert_iamframe_equal(df, exp.filter(**kwargs)) + + # test top-level method + df = read_iiasa(TEST_API, meta=['string'], **kwargs) + assert_iamframe_equal(df, exp.filter(**kwargs)) + + +@pytest.mark.parametrize("kwargs", [ + {}, + dict(variable='Primary Energy'), + dict(scenario='scen_a', variable='Primary Energy') +]) +def test_query_with_meta_false(conn, test_pd_df, kwargs): + # test reading timeseries data (including subannual data) + exp = IamDataFrame(test_pd_df, subannual='Year')\ + .append(MODEL_B_DF, model='model_b', scenario='scen_a', region='World') + + # test method via Connection + df = conn.query(meta=False, **kwargs) + assert_iamframe_equal(df, exp.filter(**kwargs)) + + # test top-level method + df = read_iiasa(TEST_API, meta=False, **kwargs) + assert_iamframe_equal(df, exp.filter(**kwargs)) + + +def test_query_non_default(conn): + # querying for non-default scenario data raises an error + pytest.raises(ValueError, conn.query, default=False)