Additional improvements to the Batching algorithm #16851

AlexGhiondea · 2021-02-22T17:51:33Z

No description provided.

Mohit-Chakraborty · 2021-03-24T00:11:56Z

When submitting documents for indexing, we split batches on unique key of the documents, so that any exception from the service side can be correctly mapped and our document submission retry mechanism is stable. When designing a solution, we should aim to -

Try to fill the entire batch
Maintain the order of document submission

For .NET, we modified the batching algorithm via Azure/azure-sdk-for-net#18469
This helps with 2 above (maintain the order of document submission), but we can do better regarding 1 (try to fill the batch to the maximum extent).

I tried another round of improvement with Azure/azure-sdk-for-net#18603, but there were concerns about semantic change to the operation.

The change is adding a “flush duplicate actions immediately after the batch, regardless of size” behavior that might result in us sending a lot of extra batches. Ideally we’d just update the existing pending/retry queues and keep the rest of the logic the same (since it’s already so much to wrap your head around). That’s probably a nontrivial ask with .NET’s Queue though. I think we might need to switch from Queue to something else.

The remaining work is to modify the batching algorithm further, so that the above concerns are alleviated.

For other languages, we need to evaluate where we sit with regards to their implementation and finish the rest of the work.

AlexGhiondea added Client This issue points to a problem in the data-plane of the library. Search labels Feb 22, 2021

AlexGhiondea added this to the [2021] April milestone Feb 22, 2021

AlexGhiondea assigned xiangyan99 Feb 22, 2021

xiangyan99 modified the milestones: [2021] April, [2021] May Mar 29, 2021

xiangyan99 closed this as completed Apr 30, 2021

github-actions bot locked and limited conversation to collaborators Apr 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional improvements to the Batching algorithm #16851

Additional improvements to the Batching algorithm #16851

AlexGhiondea commented Feb 22, 2021

Mohit-Chakraborty commented Mar 24, 2021

Additional improvements to the Batching algorithm #16851

Additional improvements to the Batching algorithm #16851

Comments

AlexGhiondea commented Feb 22, 2021

Mohit-Chakraborty commented Mar 24, 2021