[RFC]: Batch API for inference job #182

xinchen384 · 2024-09-16T16:14:32Z

Summary

This aims to expose batch API to users so that they can submit a batch job and retrieve job’s status and results anytime after job submission. However, current inference engines such as vllm, do not support such batch feature. This design is to fill in the gap between them.

Motivation

In order to support batch API for users doing batch inference jobs, our inference system needs to handle batch jobs' input and output and do time-based scheduling, both of which are not the scope of inference engine. In the following, we divide motivation into two parts: one part belongs to fundamental capabilities and the other part is for optimization to achieve better performance.

This part lists essential components to make E2E batch inference work. This is motivated by the need to

Enable storage for users' input and output.
Manage batch jobs' due time and other status information.
Schedule jobs FIFO, with time-based sliding window.
Guarantee at least once execution

With all basic capabilities becoming ready, this part focuses on performance improvement.

Fault tolerance and consistency
Scalable that supports a large number of batch jobs
Job scheduling. In order to maximize meeting jobs' due time, we can schedule jobs in a more efficient way instead of using FIFO. This is transparent to inference engine.
More fine-grained request scheduling, compatible with inference engines such as Pipeline parallelism.

Proposed Change

For the first part, this proposal builds several fundamental components to support OpenAI batch API.

A persistent storage.
(a). Store job's input and output. This works as persistent storage to serve users' requests for retrieval.
(b). Interfaces for read/write, request input and output, job metadata
Job metadata management
(a). Handle jobs' state transition. This should clearly outline the transition diagram among different states.
(b). Manage jobs' status, including job creation time, current status, scheduled resources and so on. T
(c). Persistent on storage in terms of checkpoints. With this, users can directly retrieve jobs' status consistently.
Job scheduling
(a). Maintaining time-based sliding window of jobs. Based on job creation time, this slides the job window every minute.
(b). do FIFO job scheduling and do request queries to inference engine. This will prepare all necessary input for inference engine.
(c). Sync job's status. When received response from inference engine, this sync to and job window and metadata management.

The proposed changes listed in the second part of motivation are paused here for now. When we have a clear outline of the fundamentals, we should have a better understanding of tasks for optimization.

Alternatives Considered

No response

xieus · 2024-09-19T05:43:37Z

Cool! Great RFC overall.

Jeffwan · 2024-11-19T18:37:47Z

Remove from v0.2.0 release and move to v0.3.0.

xinchen384 self-assigned this Sep 16, 2024

xinchen384 mentioned this issue Sep 16, 2024

[batch] job manager handles job state transition #180

Merged

Jeffwan added this to the v0.2.0 milestone Sep 25, 2024

Jeffwan added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. kind/feature Categorizes issue or PR as related to a new feature. labels Sep 25, 2024

Jeffwan mentioned this issue Sep 26, 2024

[batch] job FIFO scheduler as baseline #231

Merged

xinchen384 mentioned this issue Oct 3, 2024

[batch] E2E works with driver and request proxy #272

Merged

This was referenced Oct 24, 2024

[batch] use TOS as batch request storage #316

Closed

[batch] use volcano TOS as batch storage #344

Merged

Jeffwan modified the milestones: v0.2.0, v0.3.0 Nov 19, 2024

Jeffwan unassigned xinchen384 Feb 5, 2025

This was referenced Feb 24, 2025

[router] Document supported APIs #732

Open

v0.3.0 roadmap #698

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC]: Batch API for inference job #182

[RFC]: Batch API for inference job #182

xinchen384 commented Sep 16, 2024

xieus commented Sep 19, 2024

Jeffwan commented Nov 19, 2024

[RFC]: Batch API for inference job #182

[RFC]: Batch API for inference job #182

Comments

xinchen384 commented Sep 16, 2024

Summary

Motivation

Proposed Change

Alternatives Considered

xieus commented Sep 19, 2024

Jeffwan commented Nov 19, 2024