-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add array_apply #72
Comments
As a computer scientist and not an EO expert, I understand that the common approach in OpenEO involves defining an area of interest (AOI, described by a few polygons) and performing time series analysis, where the primary focus is tracking changes over time. A common use case for our users involves monitoring forested areas or agricultural fields to detect changes, such as tree cover loss or crop health variations. Each day, we aim to calculate a value representing, for example, how green an area is, creating a time series to track changes. The result may then be communicated to the owner of that area. For example, we might detect when trees are cut down or when crops show signs of stress. This is a simplified example, but it illustrates the typical scenario. This corresponds to processing 100,000 to 200,000 polygons daily, with each polygon representing a specific area of interest. The results would be stored and analyzed on a per-polygon basis. In contrast to OpenEO's main scenario, which often focuses on time series analysis for tracking changes across a few polygons, our use case processes a large number of polygons at a single time step, with the option to consider time as a secondary dimension later. Given the sheer number of AOIs requiring individual calculations, we assume that an intermediate step, focusing on polygons first before considering temporal changes, is necessary. Creating individual jobs for every polygon seems inefficient, and looping through 100,000 polygons may not be optimal. Could one solution be to sort polygons by locality, returning chunks of arrays to achieve parallelism, and then applying TLDR With this in mind, should we explore the array_apply approach under these conditions, or move towards a more generic process, as in aggregate_spatial, which might require implementing user-defined functions instead of relying on built-in reducers like mean? |
I think what you describe is not directly related to the array_apply process, but makes more sense in combination with aggregate_spatial. In our case, we somehow came to the conclusion, that we can run some of the computation in parallel and sort the polygons somehow, but we still needed a for loop... I tried to explain what I did here: https://github.com/Open-EO/openeo-processes-dask/blob/main/docs/scalability/aggregate-large-spatial-extents.md not sure if this applies to your use case :) |
My previous comments on where I'd start with this:
The text was updated successfully, but these errors were encountered: