Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

operating on chunked data in segments #49

Closed
rabernat opened this issue Aug 28, 2018 · 0 comments
Closed

operating on chunked data in segments #49

rabernat opened this issue Aug 28, 2018 · 0 comments
Assignees

Comments

@rabernat
Copy link
Collaborator

rabernat commented Aug 28, 2018

Imagine if you are trying to do the power spectrum of a really long timeseries. So long that you don't want to put it all in memory. Currently xgcm / dask fail when you try to take an fft over a chunked dimension:

import numpy as np
import dask.array as dsa
import xarray as xr
import xrft
da = xr.DataArray(np.random.rand(1000), dims=['x']).chunk({'x': 100})
xrft.power_spectrum(da, dim='x')
ValueError: Dask array only supports taking an FFT along an axis that 
has a single chunk. An FFT operation was tried on axis 0 
which has chunks (100, 100, 100, 100, 100, 100, 100, 100, 100, 100). To change the array's chunks use dask.Array.rechunk.

This makes sense if you care about the full Fourier transform. But often with long timeseries, we want to do some sort of tapering in which we average over multiple segments in order to get a less biased estimate of the power spectrum (like in Bartlett's method). In this case, the chunks provide a natural way to split up the full interval.

With the example above, we can do the following:

# get raw data
data = da.data
# reshape so there is one chunk for each item on axis 0
data_rs = data.reshape((10, 100))
# transform along the last axis
dsa.fft.fft(data_rs, axis=1)

this works and returns

dask.array<fft, shape=(10, 100), dtype=complex128, chunksize=(1, 100)>

You could then apply Bartlett's method by averaging over axis 0.

In xrft, we could try to accommodate this by adding a new keyword

xrft.power_spectrum(da, dim='x', chunks_to_segments=True)

which would then return something like

<xarray.DataArray '...' (x_segment: 10, freq_x: 100)>
dask.array<shape=(10, 100), dtype=complex128, chunksize=(1, 100)>
Coordinates:
  * freq_x          (freq_x) float64 -0.5 -0.499 -0.498 -0.497 -0.496 -0.495 ...
Dimensions without coordinates: x_segment

I think this would be highly useful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants