-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use of logits_processors
has become very slow in v0.3.2
#3087
Comments
This seems to be due to the |
Hmm, I see, we should probably make it so that the logit processors are exempt from the deepcopy (unless #2819 already fixes that) |
Ah, yes sorry about this. I can open a PR to do what @Yard1 suggests. |
@Yard1 @simon-mo @saattrupdan fix is in #3099 |
I am using vLLM together with
outlines
for structured generation.After having upgraded from v0.3.2, generation became very slow, and the RAM usage leads to OOM crashes now.
Here is a minimal example:
When I run the above with
vllm==0.3.1
, the generation takes 58 seconds and use ~6GB memory, but if I upgradevllm
to v0.3.2 (and none of the other packages are changed), then suddenly the generation takes 418 seconds and spend ~18GB memory. Almost all of the time is spent stalling, not generating anything, but slowly using more and more memory, until it finally begins to generate.I tried installing a forked version of
outlines
to see if the stalling was due to the internals of theJSONLogitsProcessor
, but it is only called after the "stalling process" is done, so it seems like this is a vLLM issue.The text was updated successfully, but these errors were encountered: