-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use numba rather than C++ for fast polymer calculations #103
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you merge this, you should consider rewriting reflmodule
with numba
as well in light of #100.
Thanks for pinging me on this!
refl1d/polymer.py
Outdated
LAMBDA_ARRAY = np.array([LAMBDA_1, LAMBDA_0, LAMBDA_1]) | ||
MINLAT = 25 | ||
MINBULK = 5 | ||
SQRT_PI = sqrt(pi) | ||
|
||
def correlate_same(a, b): | ||
return np.correlate(a, b, 'same') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking up this correlation mode was absurdly expensive in the pure python mode back in the day, that's why I was selecting it with the weird internal int. See numpy/numpy#4999 and consider using multiarray.correlate
based on timings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't notice significant timing differences when I chose one versus the other, so I went with the newer interface.
Yes, the multiarray.correlate function is faster when I test it alone:
In [187]: %timeit np.convolve(x, y, 'same')
5.4 µs ± 32.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [188]: %timeit np.correlate(x, y, 'same')
3.05 µs ± 132 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [189]: %timeit np.core.multiarray.correlate(x, y, 1)
1.5 µs ± 27 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Trying again, I now see the times dropping significantly (5.7s down to 4.0s), so I'll revert to the deprecated function.
refl1d/polymer.py
Outdated
except ImportError: | ||
USE_NUMBA = False | ||
|
||
USE_NUMBA = True # Uncomment when doing timing tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this should be commented?
refl1d/polymer.py
Outdated
for r in range(1, segments): | ||
g_zs[:, r] = pg_zs = correlate(pg_zs, LAMBDA_ARRAY, 1) * g_z | ||
try: | ||
from numba import njit, prange |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm interested to know if prange
helps for large numbers of segments. It seems like it could lead to cache contention in most cases though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prange made no significant difference in my tests, so I stopped using it
Note: similar to code from https://github.com/richardsheridan/sf_nscf/blob/3e72e5f3777928e8eb844f93bd48549bca8c9041/util.py#L185
The numba version of compose made no difference on the speed of the tests so I did not include it.
This version drops the C++ entirely, falling back to the numpy version if numba is not available.