-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to compute the average over spatial dimensions? #226
Comments
Did you try something like the example below already? In principle, if you use aggregate_spatial, you can leave out other spatial filtering. So you only need to specify your small bbox once, which is most easily done with the shapely 'box' function.
|
Thanks @jdries. I've just tried and the result is the following as a netCDF. I put here the output without applying polygonal_mean_timeseries and with it:
Using aggregate_spatial (polygonal_mean_timeseries)
The differences I see are:
|
I usually use the json output format after aggregating the spatial dimension, but we can indeed improve the NetCDF. |
Yes, it would fit what I am looking for without the need to create or modify other processes! |
What is polygonal_mean_timeseries using internally? That is not a standard process... |
This seems to be a recurring question, it is |
The geometry is taken from load_collection with either the geometry itself or computed from the box? |
It's the behaviour we discussed here: |
Oh, yes, I did not see the parameter being set in the function. Of course, in back-ends without these optimizations, it might lead to loading data for the whole world. So basically, this is not really a general solution for the issue, but it works at VITO. In other cases you may need to specify the geometries twice (which also shouldn't be a major issue?) |
A question that is still open is what could be done to make spatial dimensions better usable in functions like reduce or so. Setting the dimension to either x or y is usually not very meaningful. Two reduce after each other don't work well for all processes (median?). |
I also had the same feeling about loading too much data if you don't specify the bbox in load_collection, this is back-end specific. EDIT: the API states that aggregate_spatial should return 3 dimensions, but in the proposed case it's returning 4, 2 sptail, time and bands. I'm actually fine to have it with also the bands but maybe could be confusing? https://processes.openeo.org/#aggregate_spatial
|
But didn't we specify in #101 that backends have to support this? |
No, we did not. We just specified that aggregate_spatial applies filter_spatial just before the execution of the process (highlights by me):
I agree that things should be as simple as possible, but things can get tricky. If you run load_collection once and then filter twice for lets say Belgium and Germany and then run aggregate_spatial for Germany only, you could run into issues with no data for Belgium being loaded? I feel like that in theory reduce_dimension should have something like a special value "spatial" for "dimension", which would solve the issue. I mean there's also the issue about multiple dimensions #126 with aggregate_spatial. aggregate_spatial was basically meant to run on multiple geometries and for a reduction over all data it's reduce_dimension. |
I guess I could implement in the Web Editor that you can re-use the geometries/bboxes. For programmatic use (Py, JS, R) you can always just use variables. So could just be a client issue, too. |
Getting back to this issue: Just wondering, would it make sense to allow setting the geometries to null in aggregate_spatial and thus allow the process to aggregate over the x and y at the same time without a spatial constraint like a geometry? If that's too much of a stretch for the very geometries-focused process we could also consider the option to define reduce_spatial process that allows reducing over the two spatial dimensions x and y at the same time (or is a generic reduce_dimensions better?) |
while that sounds feasible to do in the VITO backend, I wonder whether it would create problems on the output side. Vector cubes are not fully defined, but the |
Yes, indeed. The return value I had not really taken into account. The question is what you want to get back? If it's still a "raster-cube" it has to be reduce_spatial (or a variant, the specifics are unclear), if it's a vector-cube it's aggregate_spatial (or a variant). So we may even need both? |
I think both use cases (raster cube or vector cube output) are valid indeed |
My take on the subject:
|
FWIW, Google Earth Engine uses the null value for "unbounded" aggregation (or let's say bounded to the footprint of the actual data), see their reduceRegion function. So I don't see a big difference between Bringing in the CRS is actually a good point and leads us back to thinking about how to handle that dimension anyway, see #251. For this process, you may want to actually require reducing the CRS dimension first, but there's no way to do that right now. I think with all other points I basically agree, but we indeed need to think about what we want to support. I don't fully understand the use case for a reduce with multiple dimensions on time + bands, e.g. why a cloudless mosaic would require a reduce over two dimensions at the same time? |
If
Now imagine that
I think it might be better for performance reasons to reduce CRS last (there's a chance you could avoid doing reprojection of rasters in this case), but as you say it seems a bit early to consider all aspects of crs-as-a-spatial-dimension right now.
For the typical cloudless mosaic, if there isn't a separate cloudiness band, the reducer needs all bands to determine cloudiness for a particular timestamp, and then use that to weigh the timestamps for producing output (either select one, or pick median, or similar). See e.g. https://custom-scripts.sentinel-hub.com/custom-scripts/sentinel-2/cloudless_mosaic/# . Of course the same could be represented with separate steps and some extra memory to hold the temporary data (reduce on bands first to produce per-timestamp cloudiness, then combine by matching indexes with time-reducer to compute output data), but to me it feels more natural to think of it as a multi-dimensional reduction - 'take all bands for all timestamps on that location, do some crunching and some statistics on the values and produce an output'). I may be biased because reducing by time+bands to produce an output pixel is the most common thing that people do in Sentinel Hub. |
I see, although openEO processes would throw an error, null is not used to indicate errors. So this would be purely a client issue, which could also be the case for "false" (depending on how functions "return" error states). Also, an additional parameter would not work here as the geometries parameter is required. So you'd still need to pass a geometry, just to disable it afterwards in the new parameter. That's also weird. So short-term (i.e. non-breaking) the only solution is false/null (or some weird string) or defining a new process.
That's valuable input, but we need to think about how that could work with the openEO processes. For example, what would be passed to a "callback" in reduce_multiple_dimensions and how could the processes make use of it (keeping in mind that array handling support in openEO is not as full-fledged as in normal programming languages). I'll have to dig into this later... |
I agree with the objection against a Conceptually it's probably cleaner to define a general So in that sense If there would be need for To come back to the question about the "output" of the two approaches |
We seem to conclude that having a process One thing that came to mind here is that for aggregate_spatial and reduce_dimension we have the binary variants (e.g. |
Indeed, in VITO backend we don't support the binary variants of these processes and no user ever asked for them as far as I know. I guess most use cases are covered with the classics I agree that we should consider moving |
The binary variants are already in the experimental proposals, but I think about removing them completely. |
yes I'm afraid that UDFs in a "binary" approach will be horrible for performance for most backend implementations. |
@clausmichele A new process "reduce_spatial" has just been merged. Let's hope some implementations will implement it soon. |
I'll implement it in our back-end soon and then support EODC for having it tere as well! |
I would like to compute the average over the data I'm loading from a small bbox, to reduce noise and get a smoother signal. It is what I did in python in User Story 2 using the Xarray method
.mean(['x','y'], skipna=True)
How can we perform this task with opneEO processes? I considered various options:
The text was updated successfully, but these errors were encountered: