Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large area processing: Dynamically/adaptively start sub-jobs? #37

Open
soxofaan opened this issue Feb 4, 2022 · 5 comments
Open

Large area processing: Dynamically/adaptively start sub-jobs? #37

soxofaan opened this issue Feb 4, 2022 · 5 comments

Comments

@soxofaan
Copy link
Member

soxofaan commented Feb 4, 2022

In the current "partitioned jobs" implementation all sub-jobs are started at once when the main job is started by the user.

To better take back-end load/availability into account, it might be necessary to start jobs in a more on the fly, adaptive approach

@soxofaan
Copy link
Member Author

soxofaan commented Aug 3, 2022

Also to cover here: retrying of jobs (e.g. on same back-end, or on another back-end?)

@jdries
Copy link
Contributor

jdries commented Dec 13, 2022

Could a more generic version of this issue be to simply support the case where a user submits a lot of jobs (e.g. 10k), which then need to be queued adequately?

@soxofaan
Copy link
Member Author

Could a more generic version of this issue be to simply support the case where a user submits a lot of jobs (e.g. 10k), which then need to be queued adequately?

You mean that the aggregator would then act as an intermediate scheduler to not overload the upstream back-ends? That would indeed be a more generic solution for the original idea of this issue

@jdries
Copy link
Contributor

jdries commented Dec 13, 2022

Exactly, and it would be a useful intermediate step towards supporting large scale processing if I can do splitting myself, and let tracking be handled by the aggregator.

@soxofaan
Copy link
Member Author

soxofaan commented Apr 4, 2023

Another thing to consider in this context: in cross-backend processing use cases, some jobs can only be created when other dependency jobs are created or successfully finished, because a valid job id/metadata URL is necesarry in a load_result/load_stac node

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants