-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel sampling with threadpool #1252
Merged
ilya-lavrenov
merged 6 commits into
openvinotoolkit:master
from
mzegla:parallel_sampling_threadpool
Jan 30, 2025
Merged
Parallel sampling with threadpool #1252
ilya-lavrenov
merged 6 commits into
openvinotoolkit:master
from
mzegla:parallel_sampling_threadpool
Jan 30, 2025
+336
−174
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
0c26c92
to
8b7a92e
Compare
iefode
reviewed
Jan 20, 2025
iefode
reviewed
Jan 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, LGTM. Please check couple comments
@mzegla please, address comments and resolve merge conflicts. |
648c730
to
bc3fab5
Compare
ilya-lavrenov
approved these changes
Jan 29, 2025
iefode
approved these changes
Jan 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
commit 648c730 Merge: a3b8404 8aeb714 Author: Miłosz Żeglarski <milosz.zeglarski@intel.com> Date: Mon Jan 20 13:38:35 2025 +0100 Merge branch 'master' into parallel_sampling_threadpool commit a3b8404 Author: mzegla <milosz.zeglarski@intel.com> Date: Mon Jan 20 13:38:08 2025 +0100 review commit 6da6de8 Merge: b9034b2 fe6311d Author: Miłosz Żeglarski <milosz.zeglarski@intel.com> Date: Mon Jan 20 09:35:09 2025 +0100 Merge branch 'master' into parallel_sampling_threadpool commit b9034b2 Merge: ce8053c 1b3c68d Author: Miłosz Żeglarski <milosz.zeglarski@intel.com> Date: Fri Jan 17 14:16:49 2025 +0100 Merge branch 'master' into parallel_sampling_threadpool commit ce8053c Merge: 4eb8696 bb6138e Author: Miłosz Żeglarski <milosz.zeglarski@intel.com> Date: Fri Jan 17 09:30:25 2025 +0100 Merge branch 'master' into parallel_sampling_threadpool commit 4eb8696 Merge: 6b0e034 eed81fe Author: Miłosz Żeglarski <milosz.zeglarski@intel.com> Date: Thu Jan 16 14:04:56 2025 +0100 Merge branch 'master' into parallel_sampling_threadpool commit 6b0e034 Merge: c7ca805 36b88ad Author: Miłosz Żeglarski <milosz.zeglarski@intel.com> Date: Thu Jan 16 13:22:51 2025 +0100 Merge branch 'master' into parallel_sampling_threadpool commit c7ca805 Merge: e82f429 d8f2f0b Author: Miłosz Żeglarski <milosz.zeglarski@intel.com> Date: Thu Jan 16 11:46:44 2025 +0100 Merge branch 'master' into parallel_sampling_threadpool commit e82f429 Merge: 06bdc78 8ea1414 Author: Miłosz Żeglarski <milosz.zeglarski@intel.com> Date: Thu Jan 16 09:03:24 2025 +0100 Merge branch 'master' into parallel_sampling_threadpool commit 06bdc78 Author: mzegla <milosz.zeglarski@intel.com> Date: Wed Jan 15 17:40:27 2025 +0100 bring back timer comment commit 030f093 Author: mzegla <milosz.zeglarski@intel.com> Date: Wed Jan 15 17:21:17 2025 +0100 CI echo prediction for SD commit 013fbf1 Author: mzegla <milosz.zeglarski@intel.com> Date: Wed Jan 15 16:02:02 2025 +0100 include internal state update for non-sampled requests commit 77b62e3 Author: mzegla <milosz.zeglarski@intel.com> Date: Wed Jan 15 14:36:27 2025 +0100 tmp CI verbose commit 09c894f Merge: 1ef8716 ce5cda6 Author: Miłosz Żeglarski <milosz.zeglarski@intel.com> Date: Wed Jan 15 13:31:05 2025 +0100 Merge branch 'master' into parallel_sampling_threadpool commit 1ef8716 Author: mzegla <milosz.zeglarski@intel.com> Date: Tue Jan 14 16:33:17 2025 +0100 fix commit 8b7a92e Author: mzegla <milosz.zeglarski@intel.com> Date: Tue Nov 5 15:08:26 2024 +0100 extract sampling for single sequence group and call it asynchronously post rebase adjustments fix finish iteration move currently_processed_tokens update switch to async experimental threadpool remove access to shared struct in parallelized code synchronize beam search part refactor extended timers style
9fa7c8a
to
1c25200
Compare
ilya-lavrenov
approved these changes
Jan 30, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
category: continuous batching
Continuous batching
category: GHA
CI based on Github actions
category: sampling
Sampling / Decoding algorithms
category: visual language
Visual language pipeline
no-match-files
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements the same functionality as: #1233, but in a different manner. Only one of them should be merged.
Since pipeline logic is executed on a single thread, there are periods of low CPU usage while pipeline is not executing inference, but some other logic like sampling which can take quite large fraction of time. Currently after inference is done we sample from each sequence group in a loop on a single thread which becomes an issue with sampling parameters that significantly extend sampling time for a single sequence group.
This PR extracts sampling logic for single sequence group into a separate method that can be executed independently from any other sequence group. In includes generic thread pool implementation that spawns certain amount of threads that are used to run sampling logic for different sequence groups in parallel.
Performance measurements confirm improvement especially for non greedy sampling and with high concurrency (the more sequence groups are scheduled for inference the more benefit from parallel sampling).
CVS-157230