Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dft for chunked data along the axis of FT #51

Merged
merged 15 commits into from
Aug 30, 2018
Merged

dft for chunked data along the axis of FT #51

merged 15 commits into from
Aug 30, 2018

Conversation

roxyboy
Copy link
Member

@roxyboy roxyboy commented Aug 28, 2018

@rabernat Something like this for Issue #49 ?

@roxyboy roxyboy requested a review from rabernat August 28, 2018 18:39
@codecov-io
Copy link

codecov-io commented Aug 28, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@0c8833d). Click here to learn what that means.
The diff coverage is 95.55%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master      #51   +/-   ##
=========================================
  Coverage          ?   88.92%           
=========================================
  Files             ?        1           
  Lines             ?      271           
  Branches          ?       90           
=========================================
  Hits              ?      241           
  Misses            ?       15           
  Partials          ?       15
Impacted Files Coverage Δ
xrft/xrft.py 88.92% <95.55%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0c8833d...86a71da. Read the comment docs.

Copy link
Collaborator

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing start Takaya. You have become so fast!

I have a few comments on the api and the implementation.

xrft/xrft.py Outdated
Whether the data is chunked along the axis to take FFT.
kwargs : dict
Need to provide the information of chunk length (`seglen`: int)
when `chunks_to_segment=True`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a) we shouldn't hide arguments in kwargs unnecessarily

b) we shouldn't need to specify seglen at all! It should be determined by dask chunks themselves.

xrft/xrft.py Outdated
if len(dim)>1:
raise NotImplementedError("Currently, only a one-dimensional "
"DFT is supported when the dimension "
"to FT is chunked.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why we need this limitation. It should be possible to to 2d dft on 2d chunks (or any number of dimensions).

One needs to check each dimension in dim to see if reshaping is needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current major issue with this is that the 2D, 3D detrending won't work if the axes are chunked. I'll see if I can change this easily.

xrft/xrft.py Outdated
newdims.append(d + suffix)
newdims.append(d)
n = len(da[d])
seglen = kwargs['seglen']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we should instead figure out the chunks by examining the dask chunks.

An error should be raised if the chunks are not uniform.

xrft/xrft.py Outdated
newdims.append(d)
newshape.append(len(da[d]))
newcoords[d] = da[d].data
da = xr.DataArray(data.reshape(newshape), dims=newdims, coords=newcoords)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend putting the logic in the above ~20 lines into a standalone private function

def _stack_chunks(da, dim, suffix='_segment):
    """Reshape a DataArray so there is only one chunk along dimension `dim`
    ...
    """

xrft/xrft.py Outdated
Whether the data is chunked along the axis to take FFT.
kwargs : dict
Need to provide the information of chunk length (`seglen`: int)
when `chunks_to_segment=True`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use kwargs this way.

You should review the zen of python. Line 2: "Explicit is better than implicit."

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to change this doc string to match the function signature.

chunks_to_segments=True, seglen=8)
data = da.chunk({'time':8}).data.reshape((2,8,16,16,16))
npt.assert_almost_equal(ft.values, dsar.fft.fftn(data, axes=[1]).compute(),
decimal=7)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than absorbing this into an existing test, I would prefer to have a new test function for this feature.

@roxyboy
Copy link
Member Author

roxyboy commented Aug 29, 2018

@rabernat I incorporated everything except for the linear detrending for chunked axes. That's gonna take a bit longer so can we merge this for now and I'll make a new pull request for the chunked linear detrending?

@roxyboy
Copy link
Member Author

roxyboy commented Aug 29, 2018

Nevermind, got linear detrending for 2D cases to work.

@rabernat
Copy link
Collaborator

As far as I understand it, you should first transform the data to have reshaped dimensions, and then all the old windowing / detrending stuff should just work. It's really a pre-processing step.

@roxyboy
Copy link
Member Author

roxyboy commented Aug 29, 2018

Yea, so everything is working fine now will all your reviews incorporated :)

Copy link
Collaborator

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Almost there! Just a few more questions.

# )
# da = da5d.chunk({'time':1})
# with pytest.raises(ValueError):
# func(da.data).compute()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this commented out?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code should be able to take data that has more than 5 dimensions now so I commented it out.

# with pytest.raises(NotImplementedError):
# xrft.dft(da, detrend='linear')
# with pytest.raises(NotImplementedError):
# xrft.dft(da, dim=['time','y','x'], detrend='linear')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have commented-out code here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll just delete this as it is implemented.


# with pytest.raises(NotImplementedError):
# xrft.dft(da.chunk({'time':16}), dim=['time','y'], detrend='linear',
# chunks_to_segments=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also implemented so I'll just delete it.

xrft/tests/test_xrft.py Outdated Show resolved Hide resolved
xrft/xrft.py Outdated
raise ValueError("Data has too many dimensions "
"and/or too many axes to detrend over.")
if len(axes) > 3:
raise ValueError("Data has too many axes to detrend over.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be more explicit with this error, e.g. "Detrending is only possible over 3 dimensions or less, but input had 5 dimensions."

Copy link
Member Author

@roxyboy roxyboy Aug 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works now with data that has more than 4 dimensions. Detrending itself is only supported up to 3 dimensions at the moment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the error message more explicit to: "Detrending is only supported up to 3 dimensions."

xrft/xrft.py Outdated
newshape.append(len(da[d]))
newcoords[d] = da[d].data

return xr.DataArray(data.reshape(newshape), dims=newdims, coords=newcoords)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice function! 👍

xrft/xrft.py Outdated Show resolved Hide resolved
@@ -234,6 +270,9 @@ def dft(da, spacing_tol=1e-3, dim=None, shift=True, detrend=None, window=False):
if detrend == 'constant':
da = da - da.mean(dim=dim)
elif detrend == 'linear':
for d in da.dims:
if d not in dim:
da = da.chunk({d:1})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand why you need to do this?

Copy link
Member Author

@roxyboy roxyboy Aug 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linear algebra for detrending currently can only take one chunk for each node/processor.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why does dask have to be involved? Are we using map_blocks to apply the detrending?

I'm worried about this. It is generally not a good idea to silently rechunk the user's data. What if this accidentally creates a billion chunks. I would prefer to raise an error and make the user do it.

Copy link
Member Author

@roxyboy roxyboy Aug 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we use map_blocks.

Ok, I'll make it raise an error but modify it so that it'll still automatically chunk the _segment dimensions.

xrft/xrft.py Outdated
Whether the data is chunked along the axis to take FFT.
kwargs : dict
Need to provide the information of chunk length (`seglen`: int)
when `chunks_to_segment=True`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to change this doc string to match the function signature.

xrft/xrft.py Outdated Show resolved Hide resolved
@rabernat
Copy link
Collaborator

I guess we probably eventually want to add the chunks_to_segments options to cross_spectrum and isotropic_power_spectrum too, no?

@roxyboy
Copy link
Member Author

roxyboy commented Aug 29, 2018

Adding chunks_to_segments to isotropic_power_spectrum would fail currently due to the same reasonings as in issue #9 . I haven't been able to parallelize the algorithm for azimuthal averaging.

if d not in dim and da.chunks[a_n][0]>1:
raise ValueError("Linear detrending utilizes the `dask.map_blocks` "
"API so the dimensions not being detrended "
"must have the chunk length of 1.")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rabernat I added this to raise an error if the user didn't provide data chunked with lengths of one for dimensions that won't be detrended.

@@ -283,17 +273,10 @@ def dft(da, spacing_tol=1e-3, dim=None, shift=True, detrend=None, window=False,
if detrend == 'constant':
da = da - da.mean(dim=dim)
elif detrend == 'linear':
for d in da.dims:
if d not in dim:
da = da.chunk({d:1})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this may be a bit redundant, I kept it to make sure the chunks have unit length.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the future, I recommend we use xarray.apply_ufunc to apply detrending, rather than map blocks. But that can wait for a future PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so... we can merge this?

Copy link
Collaborator

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final review I promise! 😉

da_prime = xrft.detrendn(da[:,0].values, [0,1,2]) # cubic detrend over time, y, and x
npt.assert_almost_equal(xrft.dft(da[:,0].drop('z'),
dim=['time','y','x'],
shift=False, detrend='linear'
).values,
np.fft.fftn(da_prime))

def test_bartlett():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just give this test a more informative name. Like test_chunks_to_segments()?

xrft/xrft.py Outdated
Whether to shift the fft output.
detrend : str (optional)
If `constant`, the mean across the transform dimensions will be
subtracted before calculating the Fourier transform (FT).
If `linear`, the linear least-square fit will be subtracted before
the FT.
window : bool (optional)
window : bool (default)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you change these from optional to default? That doesn't match the numpydoc spec. In fact, looking closely at the spec, it should be

window: bool, optional

npt.assert_almost_equal(ps.values,
(ft*np.conj(ft)).real.values,
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no test for cross_spectrum.

@roxyboy
Copy link
Member Author

roxyboy commented Aug 30, 2018

Ok, the tests are passing so I'm gonna merge this.

@roxyboy roxyboy merged commit 1ca9de9 into xgcm:master Aug 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants