-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dft for chunked data along the axis of FT #51
Conversation
Codecov Report
@@ Coverage Diff @@
## master #51 +/- ##
=========================================
Coverage ? 88.92%
=========================================
Files ? 1
Lines ? 271
Branches ? 90
=========================================
Hits ? 241
Misses ? 15
Partials ? 15
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing start Takaya. You have become so fast!
I have a few comments on the api and the implementation.
xrft/xrft.py
Outdated
Whether the data is chunked along the axis to take FFT. | ||
kwargs : dict | ||
Need to provide the information of chunk length (`seglen`: int) | ||
when `chunks_to_segment=True`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a) we shouldn't hide arguments in kwargs
unnecessarily
b) we shouldn't need to specify seglen
at all! It should be determined by dask chunks themselves.
xrft/xrft.py
Outdated
if len(dim)>1: | ||
raise NotImplementedError("Currently, only a one-dimensional " | ||
"DFT is supported when the dimension " | ||
"to FT is chunked.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why we need this limitation. It should be possible to to 2d dft on 2d chunks (or any number of dimensions).
One needs to check each dimension in dim
to see if reshaping is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current major issue with this is that the 2D, 3D detrending won't work if the axes are chunked. I'll see if I can change this easily.
xrft/xrft.py
Outdated
newdims.append(d + suffix) | ||
newdims.append(d) | ||
n = len(da[d]) | ||
seglen = kwargs['seglen'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we should instead figure out the chunks by examining the dask chunks.
An error should be raised if the chunks are not uniform.
xrft/xrft.py
Outdated
newdims.append(d) | ||
newshape.append(len(da[d])) | ||
newcoords[d] = da[d].data | ||
da = xr.DataArray(data.reshape(newshape), dims=newdims, coords=newcoords) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend putting the logic in the above ~20 lines into a standalone private function
def _stack_chunks(da, dim, suffix='_segment):
"""Reshape a DataArray so there is only one chunk along dimension `dim`
...
"""
xrft/xrft.py
Outdated
Whether the data is chunked along the axis to take FFT. | ||
kwargs : dict | ||
Need to provide the information of chunk length (`seglen`: int) | ||
when `chunks_to_segment=True`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use kwargs
this way.
You should review the zen of python. Line 2: "Explicit is better than implicit."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to change this doc string to match the function signature.
xrft/tests/test_xrft.py
Outdated
chunks_to_segments=True, seglen=8) | ||
data = da.chunk({'time':8}).data.reshape((2,8,16,16,16)) | ||
npt.assert_almost_equal(ft.values, dsar.fft.fftn(data, axes=[1]).compute(), | ||
decimal=7) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than absorbing this into an existing test, I would prefer to have a new test function for this feature.
@rabernat I incorporated everything except for the linear detrending for chunked axes. That's gonna take a bit longer so can we merge this for now and I'll make a new pull request for the chunked linear detrending? |
Nevermind, got linear detrending for 2D cases to work. |
…ys have a chunk length of one
As far as I understand it, you should first transform the data to have reshaped dimensions, and then all the old windowing / detrending stuff should just work. It's really a pre-processing step. |
Yea, so everything is working fine now will all your reviews incorporated :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Almost there! Just a few more questions.
xrft/tests/test_xrft.py
Outdated
# ) | ||
# da = da5d.chunk({'time':1}) | ||
# with pytest.raises(ValueError): | ||
# func(da.data).compute() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current code should be able to take data that has more than 5 dimensions now so I commented it out.
xrft/tests/test_xrft.py
Outdated
# with pytest.raises(NotImplementedError): | ||
# xrft.dft(da, detrend='linear') | ||
# with pytest.raises(NotImplementedError): | ||
# xrft.dft(da, dim=['time','y','x'], detrend='linear') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we have commented-out code here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll just delete this as it is implemented.
xrft/tests/test_xrft.py
Outdated
|
||
# with pytest.raises(NotImplementedError): | ||
# xrft.dft(da.chunk({'time':16}), dim=['time','y'], detrend='linear', | ||
# chunks_to_segments=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also implemented so I'll just delete it.
xrft/xrft.py
Outdated
raise ValueError("Data has too many dimensions " | ||
"and/or too many axes to detrend over.") | ||
if len(axes) > 3: | ||
raise ValueError("Data has too many axes to detrend over.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's be more explicit with this error, e.g. "Detrending is only possible over 3 dimensions or less, but input had 5 dimensions."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works now with data that has more than 4 dimensions. Detrending itself is only supported up to 3 dimensions at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the error message more explicit to: "Detrending is only supported up to 3 dimensions."
xrft/xrft.py
Outdated
newshape.append(len(da[d])) | ||
newcoords[d] = da[d].data | ||
|
||
return xr.DataArray(data.reshape(newshape), dims=newdims, coords=newcoords) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice function! 👍
@@ -234,6 +270,9 @@ def dft(da, spacing_tol=1e-3, dim=None, shift=True, detrend=None, window=False): | |||
if detrend == 'constant': | |||
da = da - da.mean(dim=dim) | |||
elif detrend == 'linear': | |||
for d in da.dims: | |||
if d not in dim: | |||
da = da.chunk({d:1}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me understand why you need to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linear algebra for detrending currently can only take one chunk for each node/processor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why does dask have to be involved? Are we using map_blocks to apply the detrending?
I'm worried about this. It is generally not a good idea to silently rechunk the user's data. What if this accidentally creates a billion chunks. I would prefer to raise an error and make the user do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we use map_blocks.
Ok, I'll make it raise an error but modify it so that it'll still automatically chunk the _segment
dimensions.
xrft/xrft.py
Outdated
Whether the data is chunked along the axis to take FFT. | ||
kwargs : dict | ||
Need to provide the information of chunk length (`seglen`: int) | ||
when `chunks_to_segment=True`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to change this doc string to match the function signature.
I guess we probably eventually want to add the |
Adding |
if d not in dim and da.chunks[a_n][0]>1: | ||
raise ValueError("Linear detrending utilizes the `dask.map_blocks` " | ||
"API so the dimensions not being detrended " | ||
"must have the chunk length of 1.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rabernat I added this to raise an error if the user didn't provide data chunked with lengths of one for dimensions that won't be detrended.
@@ -283,17 +273,10 @@ def dft(da, spacing_tol=1e-3, dim=None, shift=True, detrend=None, window=False, | |||
if detrend == 'constant': | |||
da = da - da.mean(dim=dim) | |||
elif detrend == 'linear': | |||
for d in da.dims: | |||
if d not in dim: | |||
da = da.chunk({d:1}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this may be a bit redundant, I kept it to make sure the chunks have unit length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the future, I recommend we use xarray.apply_ufunc to apply detrending, rather than map blocks. But that can wait for a future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so... we can merge this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final review I promise! 😉
xrft/tests/test_xrft.py
Outdated
da_prime = xrft.detrendn(da[:,0].values, [0,1,2]) # cubic detrend over time, y, and x | ||
npt.assert_almost_equal(xrft.dft(da[:,0].drop('z'), | ||
dim=['time','y','x'], | ||
shift=False, detrend='linear' | ||
).values, | ||
np.fft.fftn(da_prime)) | ||
|
||
def test_bartlett(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just give this test a more informative name. Like test_chunks_to_segments()
?
xrft/xrft.py
Outdated
Whether to shift the fft output. | ||
detrend : str (optional) | ||
If `constant`, the mean across the transform dimensions will be | ||
subtracted before calculating the Fourier transform (FT). | ||
If `linear`, the linear least-square fit will be subtracted before | ||
the FT. | ||
window : bool (optional) | ||
window : bool (default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you change these from optional
to default
? That doesn't match the numpydoc spec. In fact, looking closely at the spec, it should be
window: bool, optional
xrft/tests/test_xrft.py
Outdated
npt.assert_almost_equal(ps.values, | ||
(ft*np.conj(ft)).real.values, | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no test for cross_spectrum
.
Ok, the tests are passing so I'm gonna merge this. |
@rabernat Something like this for Issue #49 ?