operating on chunked data in segments #49

rabernat · 2018-08-28T01:51:30Z

Imagine if you are trying to do the power spectrum of a really long timeseries. So long that you don't want to put it all in memory. Currently xgcm / dask fail when you try to take an fft over a chunked dimension:

import numpy as np
import dask.array as dsa
import xarray as xr
import xrft
da = xr.DataArray(np.random.rand(1000), dims=['x']).chunk({'x': 100})
xrft.power_spectrum(da, dim='x')

ValueError: Dask array only supports taking an FFT along an axis that 
has a single chunk. An FFT operation was tried on axis 0 
which has chunks (100, 100, 100, 100, 100, 100, 100, 100, 100, 100). To change the array's chunks use dask.Array.rechunk.

This makes sense if you care about the full Fourier transform. But often with long timeseries, we want to do some sort of tapering in which we average over multiple segments in order to get a less biased estimate of the power spectrum (like in Bartlett's method). In this case, the chunks provide a natural way to split up the full interval.

With the example above, we can do the following:

# get raw data
data = da.data
# reshape so there is one chunk for each item on axis 0
data_rs = data.reshape((10, 100))
# transform along the last axis
dsa.fft.fft(data_rs, axis=1)

this works and returns

dask.array<fft, shape=(10, 100), dtype=complex128, chunksize=(1, 100)>

You could then apply Bartlett's method by averaging over axis 0.

In xrft, we could try to accommodate this by adding a new keyword

xrft.power_spectrum(da, dim='x', chunks_to_segments=True)

which would then return something like

<xarray.DataArray '...' (x_segment: 10, freq_x: 100)>
dask.array<shape=(10, 100), dtype=complex128, chunksize=(1, 100)>
Coordinates:
  * freq_x          (freq_x) float64 -0.5 -0.499 -0.498 -0.497 -0.496 -0.495 ...
Dimensions without coordinates: x_segment

I think this would be highly useful!

The text was updated successfully, but these errors were encountered:

rabernat added the enhancement label Aug 28, 2018

roxyboy self-assigned this Aug 28, 2018

roxyboy mentioned this issue Aug 28, 2018

dft for chunked data along the axis of FT #51

Merged

roxyboy closed this as completed Sep 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

operating on chunked data in segments #49

operating on chunked data in segments #49

rabernat commented Aug 28, 2018 •

edited

Loading

operating on chunked data in segments #49

operating on chunked data in segments #49

Comments

rabernat commented Aug 28, 2018 • edited Loading

rabernat commented Aug 28, 2018 •

edited

Loading