Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle context=None in convert_unit when input/output GHG species match #369

Merged
merged 11 commits into from
Apr 22, 2020
11 changes: 6 additions & 5 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,14 @@

## Individual Updates
- [#370](https://github.com/IAMconsortium/pyam/pull/370) Allowed filter to work with np.int64 years and np.datetime64 dates.
- [#369](https://github.com/IAMconsortium/pyam/pull/369) `convert_unit()` supports GWP conversion of same GHG species without context, lower-case aliases for species symbols.
- [#361](https://github.com/IAMconsortium/pyam/pull/361) iam-units refactored from a Git submodule to a Python dependency of pyam.

# Release v0.5.0

## Highlights

- Improved feature for unit conversion
- Improved feature for unit conversion
using the [pint package](https://pint.readthedocs.io) and
the [IAMconsortium/units](https://github.com/IAMconsortium/units) repository,
providing out-of-the-box conversion of unit definitions commonly used
Expand All @@ -17,7 +18,7 @@
for more information
- Increased support for operations on timeseries data with continuous-time
resolution
- New tutorial for working with various input data formats;
- New tutorial for working with various input data formats;
[take a look](https://pyam-iamc.readthedocs.io/en/stable/tutorials/data_table_formats.html)
- Rewrite and extension of the documentation pages for the API;
[read the new docs](https://pyam-iamc.readthedocs.io/en/stable/api.html)!
Expand All @@ -30,7 +31,7 @@ the API of `IamDataFrame.convert_unit()` from a dictionary to explicit kwargs

PR [#334](https://github.com/IAMconsortium/pyam/pull/334) changed the arguments
of `IamDataFrame.interpolate()` and `pyam.fill_series()` to `time`. It can still
be an integer (i.e., a year).
be an integer (i.e., a year).

With PR [#337](https://github.com/IAMconsortium/pyam/pull/337), initializing
an IamDataFrame with `n/a` entries in columns other than `value` raises an error.
Expand Down Expand Up @@ -74,7 +75,7 @@ behaviour of `aggregate_region()` regarding the treatment of components at the
region-level. To keep the previous behaviour, add `components=True`.

PR [#315](https://github.com/IAMconsortium/pyam/pull/314) changed the return
type of `aggregate[_region]()` to an `IamDataFrame` instance.
type of `aggregate[_region]()` to an `IamDataFrame` instance.
To keep the previous behaviour, add `timeseries()`.
The object returned by `[check_]aggregate[_region]()` now includes both the
actual and the expected value as a `pd.DataFrame` instance.
Expand All @@ -91,7 +92,7 @@ rather than a dictionary and also includes optional treatment of components
- [#305](https://github.com/IAMconsortium/pyam/pull/305) Add `method` and `weight` options to the (region) aggregation functions
- [#302](https://github.com/IAMconsortium/pyam/pull/302) Rework the tutorials
- [#301](https://github.com/IAMconsortium/pyam/pull/301) Bugfix when using `to_excel()` with a `pd.ExcelWriter`
- [#297](https://github.com/IAMconsortium/pyam/pull/297) Add `empty` attribute, better error for `timeseries()` on empty dataframe
- [#297](https://github.com/IAMconsortium/pyam/pull/297) Add `empty` attribute, better error for `timeseries()` on empty dataframe
- [#295](https://github.com/IAMconsortium/pyam/pull/295) Include `meta` table when writing to or reading from `xlsx` files
- [#292](https://github.com/IAMconsortium/pyam/pull/292) Add warning message if `data` is empty at initialization (after formatting)
- [#288](https://github.com/IAMconsortium/pyam/pull/288) Put `pyam` logger in its own namespace (see [here](https://docs.python-guide.org/writing/logging/#logging-in-a-library>))
Expand Down
26 changes: 16 additions & 10 deletions pyam/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -766,7 +766,7 @@ def rename(self, mapping=None, inplace=False, append=False,

def convert_unit(self, current, to=None, factor=None, registry=None,
context=None, inplace=False):
"""Convert all data having *current* units to new units.
r"""Convert all data having *current* units to new units.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the r on purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes: I built the docs, and saw that 'gwp_' further down was being interpreted as a ReST link (trailing underscore) when I didn't intend that. So, I added \ to escape the underscore; but then my Python linter was complaining that that was a special character. r makes it a "raw string" in which \ is a literal character.


If *factor* is given, existing values are multiplied by it, and the
*to* units are assigned to the 'unit' column.
Expand All @@ -784,29 +784,35 @@ def convert_unit(self, current, to=None, factor=None, registry=None,

from iam_units import registry

Using the :mod:`iam_units` *registry*, *current* and *to* may contain
the names of greenhouse gas (GHG) species, such as 'CO2e', 'C', 'CH4',
'N2O', 'HFC236fa', etc. In this case, *context* must contain 'gwp_'
followed by the name of a specific global warming potential (GWP)
metric supported by :mod:`iam_units`, e.g. 'gwp_AR5GWP100'.
When using this registry, *current* and *to* may contain the symbols of
greenhouse gas (GHG) species, such as 'CO2e', 'C', 'CH4', 'N2O',
'HFC236fa', etc., as well as lower-case aliases like 'co2' supported by
:mod:`pyam`. In this case, *context* must contain 'gwp\_' followed by
the name of a specific global warming potential (GWP) metric supported
by :mod:`iam_units`, e.g. 'gwp_AR5GWP100'.

Rows with units other than *current* are not altered.

Parameters
----------
current : str (or mapping, deprecated)
Name of current unit (to be converted from).
Current units to be converted.
to : str
Name of new unit (to be converted to) or target GHG species.
New unit (to be converted to) or symbol for target GHG species. If
only the GHG species is provided, the units (e.g. :code:`Mt /
year`) will be the same as `current`, and an expression combining
units and species (e.g. 'Mt CO2e / yr') will be placed in the
'unit' column.
factor : value, optional
Explicit conversion factor.
Explicit factor for conversion without `pint`.
registry : pint.UnitRegistry, optional
Specific unit registry to use for conversion. Default: the
`iam-units <https://github.com/IAMconsortium/units>`_ registry.
context : str or pint.Context, optional
(Name of) a :ref:`pint context <pint:context>` to use in
conversion. Required when converting between GHG species using GWP
metrics.
metrics, unless the species indicated by *current* and *to* are the
same.
inplace : bool, optional
Whether to return a new IamDataFrame.

Expand Down
107 changes: 73 additions & 34 deletions pyam/units.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import logging
import re

import iam_units
import pandas as pd
Expand All @@ -8,14 +9,6 @@
logger = logging.getLogger(__name__)


# Thin wrapper around pint.UndefinedUnitError to provide a usage hint
class UndefinedUnitError(pint.UndefinedUnitError):
def __str__(self):
return super().__str__() + (
"\nMust provide IamDataFrame.convert_unit(..., context=...) to "
"convert GHG species")


def convert_unit(df, current, to, factor=None, registry=None, context=None,
inplace=False):
"""Internal implementation of unit conversion with explicit kwargs"""
Expand All @@ -33,18 +26,14 @@ def convert_unit(df, current, to, factor=None, registry=None, context=None,
# Convert using a pint.UnitRegistry; default the one from iam_units
registry = registry or iam_units.registry

# Tuple of (magnitude, unit)
qty = (ret.data.loc[where, 'value'].values, current)
# Pair of (magnitude, unit)
qty = [ret.data.loc[where, 'value'].values, current]

try:
# Create a vector pint.Quantity
qty = registry.Quantity(*qty)
except pint.UndefinedUnitError as exc:
# *current* might include a GHG species
if not context:
# Can't do anything without a context
raise UndefinedUnitError(*exc.args) from None

except pint.UndefinedUnitError:
# *qty* might include a GHG species; try GWP conversion
result, to = convert_gwp(context, qty, to)
except AttributeError:
# .Quantity() did not exist
Expand All @@ -62,38 +51,88 @@ def convert_unit(df, current, to, factor=None, registry=None, context=None,
return None if inplace else ret


# GWP conversion using iam_units

#: Supported lower-case aliases for chemical symbols of GHG species. See
#: :meth:`.convert_unit`.
# Keys and values can only differ in case; other items will have no effect.
SPECIES_ALIAS = {
'ch4': 'CH4',
'co2': 'CO2',
'co2_eq': 'CO2_eq',
'co2e': 'CO2e',
'co2eq': 'CO2eq',
'n2o': 'N2O',
'nh3': 'NH3',
'nox': 'NOx',
}


# Thin wrapper around pint.UndefinedUnitError to provide a usage hint
class UndefinedUnitError(pint.UndefinedUnitError):
def __str__(self):
return super().__str__() + (
"\nGWP conversion with IamDataFrame.convert_unit() requires a "
"'gwp_...' *context* and mass-based *to* units.")


def extract_species(expr):
"""Handle supported expressions for GHG species and units."""
# Split *expr* into 1 or 3 strings. Unlike iam_units, re.IGNORECASE is used
# to match e.g. lower-case 'ch4'.
parts = re.split(iam_units.emissions.pattern.pattern, expr,
maxsplit=1, flags=re.IGNORECASE)

if len(parts) == 1:
# No split occurred. *expr* is only a species ('CO2e') without units.
species, units = parts[0], None
else:
# An expression with both units and species name ('kg CO2e / year').
# parts[1] is the species, others are pre-/suffix ('kg ', ' / year').
species, units = parts[1], (parts[0] + parts[2])

# Convert allowed lower-case aliases to chemical symbols
return SPECIES_ALIAS.get(species, species), units


def convert_gwp(context, qty, to):
"""Helper for :meth:`convert_unit` to perform GWP conversions."""
# Remove a leading 'gwp_' to produce the metric name
metric = context.split('gwp_')[1]

# Split *to* into a 1- or 3-tuple of str. This allows for *to* to be:
_to = iam_units.emissions.pattern.split(to, maxsplit=1)
if len(_to) == 1:
# Only a species name ('CO2e') without any unit
species_to = _to[0]
units_to = None
else:
# An expression with both units and species name ('kg CO2e / year');
# to[1] is the species
species_to = _to[1]
# Other elements are pre- and suffix, e.g. 'kg ' and ' / year'
units_to = _to[0] + _to[2]
metric = context.split('gwp_')[1] if context else context

# Convert GWP using the (magnitude, unit-and-species) tuple in *qty*
result = iam_units.convert_gwp(metric, qty, species_to)
# Extract the species from *qty* and *to*, allowing supported aliases
species_from, units_from = extract_species(qty[1])
species_to, units_to = extract_species(to)

try:
# Convert using a (magnitude, unit) tuple with only units, and explicit
# input and output units
result = iam_units.convert_gwp(metric, (qty[0], units_from),
species_from, species_to)
except (AttributeError, ValueError):
# Missing *metric*, or *species_to* contains invalid units. pyam
# promises UndefinedUnitError in these cases. Use a subclass (above) to
# add a usage hint.
raise UndefinedUnitError(species_to) from None
except pint.DimensionalityError:
# Provide an exception with the user's inputs
raise pint.DimensionalityError(qty[1], to) from None

# Other exceptions are not caught and will pass up through convert_unit()

if units_to:
# Also convert the units
result = result.to(units_to)
else:
# *to* was only a species name; provide units based on input and the
# output species name
# *to* was only a species name. Provide units based on input and the
# output species name.
to = iam_units.format_mass(result, species_to, spec=':~')

return result, to


# Deprecated methods

def convert_unit_with_mapping(df, conversion_mapping, inplace=False):
"""Internal implementation of unit conversion by mapping (deprecated)"""
# TODO: deprecate in next release (>=0.6.0)
Expand Down
81 changes: 52 additions & 29 deletions tests/test_units.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,24 +15,19 @@ def get_units_test_df(test_df):
return df


def assert_converted_units(df, current, to, exp, exp_factor=1.0, **kwargs):
def assert_converted_units(df, current, to, exp, **kwargs):
# testing for `inplace=False` - converted values and expected unit
_df = df.convert_unit(current, to, **kwargs, inplace=False)
pd.testing.assert_series_equal(
_df.data.value,
exp * exp_factor,
**PRECISE_ARG)
# For GWP conversion with a species name (e.g. 'CO2e'), units are added
# (e.g. 'Mt CO2e'). Compare using 'in'.
assert to in _df.data.unit[5]
pd.testing.assert_series_equal(_df.data.value, exp, **PRECISE_ARG)
# When *to* is only a species symbol (e.g. 'co2e'), units are added and a
# non-aliased symbol is returned (e.g. 'Mt CO2e'). Compare using 'in' and
# lower().
assert to.lower() in _df.data.unit[5].lower()

# testing for `inplace=True` - converted values and expected unit
df.convert_unit(current, to, **kwargs, inplace=True)
pd.testing.assert_series_equal(
df.data.value,
exp * exp_factor,
**PRECISE_ARG)
assert to in df.data.unit[5]
pd.testing.assert_series_equal(df.data.value, exp, **PRECISE_ARG)
assert to.lower() in df.data.unit[5].lower()


@pytest.mark.parametrize("current,to", [
Expand Down Expand Up @@ -75,7 +70,27 @@ def test_convert_unit_with_custom_registry(test_df):
assert_converted_units(df, 'foo', 'baz', exp, registry=ureg)


@pytest.mark.parametrize('current, to, exp_factor', [
# This test is parametrized as the product of three sets:
# 1. The test_df fixture.
# 2. Current species, context, and expected output magnitude.
# 3. Input and output expressions, and any factor on the output magnitude due
# to differences in units between these.
@pytest.mark.parametrize('context, current_species, exp', [
('AR5GWP100', 'CH4', 28),
('AR4GWP100', 'CH4', 25),
('SARGWP100', 'CH4', 21),

# Without context, CO2e → CO2e works
(None, 'CO2e', 1.),

# Lower-case symbol, handled as alias for CH4
('AR5GWP100', 'ch4', 28),

# Lower-case alias for CO2_eq handled *and* convertible to 'CO2e' without a
# context/metric
(None, 'co2_eq', 1.)
])
@pytest.mark.parametrize('current_expr, to_expr, exp_factor', [
# exp_factor is used when the conversion includes both a species *and* unit
# change.

Expand All @@ -85,32 +100,40 @@ def test_convert_unit_with_custom_registry(test_df):
('Mt {} / yr', 'Mt {} / yr', 1),
('g {} / sec', 'g {} / sec', 1),

# Only a species name as the *to* argument
# Only a species symbol as the *to* argument
('Mt {}', '{}', 1),

# *to* contains units, but no mass units → DimensionalityError
pytest.param('Mt {} / yr', '{} / yr', 1,
marks=pytest.mark.xfail(raises=pint.DimensionalityError)),
# *to* contains units, but no mass units. UndefinedUnitError when no
# context is given, otherwise DimensionalityError.
pytest.param(
'Mt {} / yr', '{} / yr', 1,
marks=pytest.mark.xfail(raises=(pint.UndefinedUnitError,
pint.DimensionalityError))),

# *to* contains both species *and* mass units that are different than
# *current*
('t {} / year', 'kt {} / year', 1e-3),
])
def test_convert_unit_with_context(test_df, current, to, exp_factor):
# unit conversion with contexts in application registry
def test_convert_gwp(test_df, context, current_species, current_expr, to_expr,
exp, exp_factor):
"""Units and GHG species can be converted."""
# Handle parameters
current = current_expr.format(current_species)
to = to_expr.format('CO2e')
if context is not None:
# pyam-style context
context = f'gwp_{context}'

# Prepare input data
df = test_df.copy()
df['variable'] = [i.replace('Primary Energy', 'Emissions|CH4')
for i in df['variable']]
current = current.format('CH4')
df['unit'] = current
to = to.format('CO2e')

# test conversion for multiple contexts
for (c, v) in [('AR5GWP100', 28), ('AR4GWP100', 25), ('SARGWP100', 21)]:
exp = test_df.data.value * v
assert_converted_units(df.copy(), current, to, exp,
exp_factor=exp_factor,
context=f'gwp_{c}')
# Expected values
exp_values = test_df.data.value * exp * exp_factor

assert_converted_units(df.copy(), current, to, exp_values, context=context)


def test_convert_unit_bad_args(test_pd_df):
Expand All @@ -124,7 +147,7 @@ def test_convert_unit_bad_args(test_pd_df):
idf.convert_unit('Mt CH4', 'CO2e', registry=object())

# Conversion fails without context; exception provides a usage hint
match = 'Must provide IamDataFrame.convert_unit'
match = r'GWP conversion with IamDataFrame.convert_unit\(\) requires...'
with pytest.raises(pint.UndefinedUnitError, match=match):
idf.convert_unit('Mt CH4', 'CO2e')

Expand Down