Alternative for managing task array status in Google Batch #5723
+143
−33
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is an alternative to process task status in task arrays.
In the current status, every time that getTaskState is called, we perform two calls to the Google Batch client, one to retrieve the list of tasks(
listTasks
) and another to get the task state (getTaskStatus
). This second call is problematic because it generates the NotFoundException and it is also redundant because the first call already provides the task descriptions including their states.The BatchClient has a HashMap as a cache of array task status. A new method is created to retrieve the status of a task belonging to a task array. Instead of making the call to the Google Batch API, it checks the status in the cache. When the status is not in the cache or is outdated, the
listTasks
method is called to update all the array tasks statuses. So, the rest of the array tasks do not require querying the Google Batch API again.When there is no task status, it returns null and fallbacks to the
getJobStatus.
This is the same as we were doing when no tasks were retrieved from the job or there was aNotFoundException
.The invalidation time is 1 second because it is the same as the one in
GoogleBatchTaskHandler
. Another alternative is setting it with the same value as the polling interval.