Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make AffineScalarFunc hashable for Pandas, Pint and Pint-Pandas #170

Closed
5 changes: 3 additions & 2 deletions appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ install:
- conda init cmd.exe
- conda info -a
# Create a conda virtual environement
- "conda create -n uncty-env numpy nose python=%PYTHON_VERSION%"
- "conda create -n uncty-env future numpy nose python=%PYTHON_VERSION%"
- activate uncty-env


Expand All @@ -56,4 +56,5 @@ test_script:
- "cd C:\\projects\\uncertainties"
- activate uncty-env # Activate the virtual environment
- python setup.py egg_info
- python setup.py nosetests -sv
# Try to work around nose-1.3.7 not working with modern setuptools (> Python 3.5)
- nosetests -sv uncertainties/
23 changes: 22 additions & 1 deletion uncertainties/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -1849,6 +1849,15 @@ def std_dev(self):
# Abbreviation (for formulas, etc.):
s = std_dev

def __hash__(self):
if not self._linear_part.expanded():
self._linear_part.expand()
combo = tuple(iter(self._linear_part.linear_combo.items()))
if len(combo) > 1 or combo[0][1] != 1.0:
return hash(combo)
# The unique value that comes from a unique variable (which it also hashes to)
return id(combo[0][0])

def __repr__(self):
# Not putting spaces around "+/-" helps with arrays of
# Variable, as each value with an uncertainty is a
Expand Down Expand Up @@ -2821,7 +2830,19 @@ def __hash__(self):
# variables, so they never compare equal; therefore, their
# id() are allowed to differ
# (http://docs.python.org/reference/datamodel.html#object.__hash__):
return id(self)

# Also, since the _linear_part of a variable is based on self, we can use
# that as a hash (uniqueness of self), which allows us to also
# preserve the invariance that x == y implies hash(x) == hash(y)
if hasattr(self, '_linear_part'):
if (
hasattr(self._linear_part, 'linear_combo')
and self in iter(self._linear_part.linear_combo.keys())
):
return id(tuple(iter(self._linear_part.linear_combo.keys()))[0])
return hash(self._linear_part)
else:
return id(self)

def __copy__(self):
"""
Expand Down
18 changes: 18 additions & 0 deletions uncertainties/test_uncertainties.py
Original file line number Diff line number Diff line change
Expand Up @@ -2404,3 +2404,21 @@ def test_correlated_values_correlation_mat():
assert arrays_close(
numpy.array(cov_mat),
numpy.array(uncert_core.covariance_matrix([x2, y2, z2])))

def test_hash():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any NaN-likes that will have x!=x? if you want those to be considered "matching" for e.g. dict lookups (and pandas Index lookups) those will need matching hashes

Copy link
Author

@MichaelTiemannOSC MichaelTiemannOSC Jul 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that there is no canonical NaN in the uncertainties package:

import numpy as np
import uncertainties
from uncertainties import ufloat

xx = ufloat(np.nan, 0)
yy = ufloat(np.nan, 1)

# nan+/-0
print(xx + xx)
# nan+/-2
print(yy + yy)

# False, because nan != nan
print(xx == xx)

The uncertainties package has unumpy.isnan which works for both scalar and arrays. There's no dictionary compare against a nan. The PintArray type uses unp.isnan to check whether something is a nan or not. The pandas changes all use PintArray.isna to test NA-ness (including NaN-ness). But if I'm missing something, please let me know!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now changed the Pint-Pandas changes to use np.nan instead of pd.NA. While I was able to get everything to work with pd.NA, things were cleaner using np.nan as the NA value for uncertain values.

'''
Tests the invariance that if x==y, then hash(x)==hash(y)
'''

x = ufloat(1.23, 2.34)
y = ufloat(1.23, 2.34)
# nominal values and std_dev terms are equal, but...
assert x.n==y.n and x.s==y.s
# ...x and y are independent variables, therefore not equal as uncertain numbers
assert x != y
assert hash(x) != hash(y)

# the equation (2x+x)/3 is equal to the variable x, so...
assert ((2*x+x)/3)==x
# ...hash of the equation and the variable should be equal
assert hash((2*x+x)/3)==hash(x)