Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel sampling with threadpool #1252

Merged

Conversation

mzegla
Copy link
Collaborator

@mzegla mzegla commented Nov 25, 2024

This PR implements the same functionality as: #1233, but in a different manner. Only one of them should be merged.

Since pipeline logic is executed on a single thread, there are periods of low CPU usage while pipeline is not executing inference, but some other logic like sampling which can take quite large fraction of time. Currently after inference is done we sample from each sequence group in a loop on a single thread which becomes an issue with sampling parameters that significantly extend sampling time for a single sequence group.

This PR extracts sampling logic for single sequence group into a separate method that can be executed independently from any other sequence group. In includes generic thread pool implementation that spawns certain amount of threads that are used to run sampling logic for different sequence groups in parallel.

Performance measurements confirm improvement especially for non greedy sampling and with high concurrency (the more sequence groups are scheduled for inference the more benefit from parallel sampling).

CVS-157230

@github-actions github-actions bot added category: continuous batching Continuous batching category: sampling Sampling / Decoding algorithms no-match-files labels Nov 25, 2024
@ilya-lavrenov ilya-lavrenov self-assigned this Nov 26, 2024
@ilya-lavrenov ilya-lavrenov added this to the 2025.0 milestone Dec 4, 2024
@andrei-kochin andrei-kochin modified the milestones: 2025.0, 2025.1 Jan 13, 2025
@mzegla mzegla force-pushed the parallel_sampling_threadpool branch from 0c26c92 to 8b7a92e Compare January 14, 2025 13:02
@github-actions github-actions bot added category: visual language Visual language pipeline category: GHA CI based on Github actions labels Jan 14, 2025
@iefode iefode self-requested a review January 16, 2025 08:24
@mzegla mzegla requested a review from ilya-lavrenov January 16, 2025 10:53
.github/workflows/causal_lm_cpp.yml Show resolved Hide resolved
src/cpp/src/sampler.cpp Outdated Show resolved Hide resolved
src/cpp/src/sampler.cpp Outdated Show resolved Hide resolved
src/cpp/src/sampler.hpp Show resolved Hide resolved
Copy link
Contributor

@iefode iefode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, LGTM. Please check couple comments

src/cpp/src/sampler.cpp Outdated Show resolved Hide resolved
src/cpp/src/threadpool.hpp Outdated Show resolved Hide resolved
src/cpp/src/sampler.cpp Show resolved Hide resolved
src/cpp/src/sampler.hpp Outdated Show resolved Hide resolved
src/cpp/src/sampler.hpp Outdated Show resolved Hide resolved
@ilya-lavrenov
Copy link
Contributor

@mzegla please, address comments and resolve merge conflicts.
After it, we can merge the PR.

@mzegla mzegla force-pushed the parallel_sampling_threadpool branch from 648c730 to bc3fab5 Compare January 29, 2025 11:52
Copy link
Contributor

@iefode iefode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

commit 648c730
Merge: a3b8404 8aeb714
Author: Miłosz Żeglarski <milosz.zeglarski@intel.com>
Date:   Mon Jan 20 13:38:35 2025 +0100

    Merge branch 'master' into parallel_sampling_threadpool

commit a3b8404
Author: mzegla <milosz.zeglarski@intel.com>
Date:   Mon Jan 20 13:38:08 2025 +0100

    review

commit 6da6de8
Merge: b9034b2 fe6311d
Author: Miłosz Żeglarski <milosz.zeglarski@intel.com>
Date:   Mon Jan 20 09:35:09 2025 +0100

    Merge branch 'master' into parallel_sampling_threadpool

commit b9034b2
Merge: ce8053c 1b3c68d
Author: Miłosz Żeglarski <milosz.zeglarski@intel.com>
Date:   Fri Jan 17 14:16:49 2025 +0100

    Merge branch 'master' into parallel_sampling_threadpool

commit ce8053c
Merge: 4eb8696 bb6138e
Author: Miłosz Żeglarski <milosz.zeglarski@intel.com>
Date:   Fri Jan 17 09:30:25 2025 +0100

    Merge branch 'master' into parallel_sampling_threadpool

commit 4eb8696
Merge: 6b0e034 eed81fe
Author: Miłosz Żeglarski <milosz.zeglarski@intel.com>
Date:   Thu Jan 16 14:04:56 2025 +0100

    Merge branch 'master' into parallel_sampling_threadpool

commit 6b0e034
Merge: c7ca805 36b88ad
Author: Miłosz Żeglarski <milosz.zeglarski@intel.com>
Date:   Thu Jan 16 13:22:51 2025 +0100

    Merge branch 'master' into parallel_sampling_threadpool

commit c7ca805
Merge: e82f429 d8f2f0b
Author: Miłosz Żeglarski <milosz.zeglarski@intel.com>
Date:   Thu Jan 16 11:46:44 2025 +0100

    Merge branch 'master' into parallel_sampling_threadpool

commit e82f429
Merge: 06bdc78 8ea1414
Author: Miłosz Żeglarski <milosz.zeglarski@intel.com>
Date:   Thu Jan 16 09:03:24 2025 +0100

    Merge branch 'master' into parallel_sampling_threadpool

commit 06bdc78
Author: mzegla <milosz.zeglarski@intel.com>
Date:   Wed Jan 15 17:40:27 2025 +0100

    bring back timer comment

commit 030f093
Author: mzegla <milosz.zeglarski@intel.com>
Date:   Wed Jan 15 17:21:17 2025 +0100

    CI echo prediction for SD

commit 013fbf1
Author: mzegla <milosz.zeglarski@intel.com>
Date:   Wed Jan 15 16:02:02 2025 +0100

    include internal state update for non-sampled requests

commit 77b62e3
Author: mzegla <milosz.zeglarski@intel.com>
Date:   Wed Jan 15 14:36:27 2025 +0100

    tmp CI verbose

commit 09c894f
Merge: 1ef8716 ce5cda6
Author: Miłosz Żeglarski <milosz.zeglarski@intel.com>
Date:   Wed Jan 15 13:31:05 2025 +0100

    Merge branch 'master' into parallel_sampling_threadpool

commit 1ef8716
Author: mzegla <milosz.zeglarski@intel.com>
Date:   Tue Jan 14 16:33:17 2025 +0100

    fix

commit 8b7a92e
Author: mzegla <milosz.zeglarski@intel.com>
Date:   Tue Nov 5 15:08:26 2024 +0100

    extract sampling for single sequence group and call it asynchronously

    post rebase adjustments

    fix finish iteration

    move currently_processed_tokens update

    switch to async

    experimental threadpool

    remove access to shared struct in parallelized code

    synchronize beam search part

    refactor

    extended timers

    style
@mzegla mzegla force-pushed the parallel_sampling_threadpool branch from 9fa7c8a to 1c25200 Compare January 30, 2025 12:26
@ilya-lavrenov ilya-lavrenov added this pull request to the merge queue Jan 30, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 30, 2025
@mzegla mzegla added this pull request to the merge queue Jan 30, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 30, 2025
@ilya-lavrenov ilya-lavrenov added this pull request to the merge queue Jan 30, 2025
Merged via the queue into openvinotoolkit:master with commit 38ab055 Jan 30, 2025
62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: continuous batching Continuous batching category: GHA CI based on Github actions category: sampling Sampling / Decoding algorithms category: visual language Visual language pipeline no-match-files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants