[Bugfix] Fix beam search logits processor #3454

maximzubkov · 2024-03-17T17:34:22Z

This PR addresses the bug described in the following issue. Please, refer to the issue to get more details on the bug.

vllm/model_executor/guided_logits_processors.py

maximzubkov · 2024-03-17T17:49:53Z

vllm/sampling_params.py

            id(lp): lp
-            for lp in self.logits_processors
-        }
+            for lp in self.logits_processors if not hasattr(lp[0], "fsm")


This affects the following line and as described in the following issue and partially addressed by the following PR. Unfortunately, in this case, the copy of logits_processors is inevitable and seems like it indeed does slow down the inference. However, I'm using a relatively old GPU (4x NVIDIA RTX A4000, 16Gb) so I would need someone with better hardware to benchmark the speed. Refer to the issue for the code snippets to reproduce the tests

maximzubkov · 2024-03-17T17:57:29Z

vllm/model_executor/guided_logits_processors.py

@@ -99,8 +120,16 @@ def __call__(self, input_ids: List[int],

        return scores

+    def __deepcopy__(self, memo):


Note that by implementing the __deepcopy__ method here it is now possible to copy.deepcopy an entire BaseGuidedLogitsProcessor object. This lines are needed since a naive copy of outlines.fsm.guide.CFGGuide fails:

Code to reproduce:

import copy from outlines.fsm.guide import CFGGuide from transformers import AutoTokenizer model = "microsoft/phi_1" tokenizer = AutoTokenizer.from_pretrained(model) simple_sql_grammar = """ start: select_statement select_statement: "SELECT" column "from" table "where" condition column: "col_1" | "col_2" table: "table_1" | "table_2" condition: column "=" number number: "1" | "2" """ fsm = CFGGuide(simple_sql_grammar, tokenizer) copy.deepcopy(fsm)

Error message:

Traceback (most recent call last): File "/Users/maximzubkov/tmp.py", line 22, in <module> copy.deepcopy(fsm) File "/Users/maximzubkov/.pyenv/versions/3.10.2/lib/python3.10/copy.py", line 172, in deepcopy y = _reconstruct(x, memo, *rv) File "/Users/maximzubkov/.pyenv/versions/3.10.2/lib/python3.10/copy.py", line 271, in _reconstruct state = deepcopy(state, memo) File "/Users/maximzubkov/.pyenv/versions/3.10.2/lib/python3.10/copy.py", line 146, in deepcopy y = copier(x, memo) File "/Users/maximzubkov/.pyenv/versions/3.10.2/lib/python3.10/copy.py", line 231, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "/Users/maximzubkov/.pyenv/versions/3.10.2/lib/python3.10/copy.py", line 172, in deepcopy y = _reconstruct(x, memo, *rv) File "/Users/maximzubkov/.pyenv/versions/3.10.2/lib/python3.10/copy.py", line 271, in _reconstruct state = deepcopy(state, memo) File "/Users/maximzubkov/.pyenv/versions/3.10.2/lib/python3.10/copy.py", line 146, in deepcopy y = copier(x, memo) File "/Users/maximzubkov/.pyenv/versions/3.10.2/lib/python3.10/copy.py", line 231, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "/Users/maximzubkov/.pyenv/versions/3.10.2/lib/python3.10/copy.py", line 172, in deepcopy y = _reconstruct(x, memo, *rv) File "/Users/maximzubkov/.pyenv/versions/3.10.2/lib/python3.10/copy.py", line 271, in _reconstruct state = deepcopy(state, memo) File "/Users/maximzubkov/.pyenv/versions/3.10.2/lib/python3.10/copy.py", line 146, in deepcopy y = copier(x, memo) File "/Users/maximzubkov/.pyenv/versions/3.10.2/lib/python3.10/copy.py", line 231, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "/Users/maximzubkov/.pyenv/versions/3.10.2/lib/python3.10/copy.py", line 172, in deepcopy y = _reconstruct(x, memo, *rv) File "/Users/maximzubkov/.pyenv/versions/3.10.2/lib/python3.10/copy.py", line 271, in _reconstruct state = deepcopy(state, memo) File "/Users/maximzubkov/.pyenv/versions/3.10.2/lib/python3.10/copy.py", line 146, in deepcopy y = copier(x, memo) File "/Users/maximzubkov/.pyenv/versions/3.10.2/lib/python3.10/copy.py", line 231, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "/Users/maximzubkov/.pyenv/versions/3.10.2/lib/python3.10/copy.py", line 146, in deepcopy y = copier(x, memo) File "/Users/maximzubkov/.pyenv/versions/3.10.2/lib/python3.10/copy.py", line 231, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "/Users/maximzubkov/.pyenv/versions/3.10.2/lib/python3.10/copy.py", line 172, in deepcopy y = _reconstruct(x, memo, *rv) File "/Users/maximzubkov/.pyenv/versions/3.10.2/lib/python3.10/copy.py", line 271, in _reconstruct state = deepcopy(state, memo) File "/Users/maximzubkov/.pyenv/versions/3.10.2/lib/python3.10/copy.py", line 146, in deepcopy y = copier(x, memo) File "/Users/maximzubkov/.pyenv/versions/3.10.2/lib/python3.10/copy.py", line 231, in _deepcopy_dict y[deepcopy(key, memo)] = deepcopy(value, memo) File "/Users/maximzubkov/.pyenv/versions/3.10.2/lib/python3.10/copy.py", line 161, in deepcopy rv = reductor(4) TypeError: cannot pickle 'module' object

vllm/model_executor/guided_logits_processors.py

maximzubkov

Left also some comments and suggestions

maximzubkov · 2024-03-17T22:31:23Z

I was quite surprised to see that so many tests in CI failed, so I tried to reproduce them locally on my machine. I attempted to reproduce the error I got in CI with test tests/entrypoints/test_openai_server.py::test_guided_json_completion:

E           jsonschema.exceptions.ValidationError: 'work history' is a required property
E           
E           Failed validating 'required' in schema:
E               {'properties': {'age': {'type': 'integer'},
E                               'name': {'type': 'string'},
E                               'skills': {'items': {'maxLength': 10, 'type': 'string'},
E                                          'minItems': 3,
E                                          'type': 'array'},
E                               'work history': {'items': {'properties': {'company': {'type': 'string'},
E                                                                         'duration': {'type': 'string'},
E                                                                         'position': {'type': 'string'}},
E                                                          'required': ['company',
E                                                                       'position'],
E                                                          'type': 'object'},
E                                                'type': 'array'}},
E                'required': ['name', 'age', 'skills', 'work history'],
E                'type': 'object'}
E           
E           On instance:
E               {'age': 25,
E                'name': 'John Doe',
E                'skills': ['JavaScript', 'React', 'CSS'],
E                'workHistory': [{'company': 'Company A',
E                                 'duration': '2 years',
E                                 'position': 'Developer'},
E                                {'company': 'Company B',
E                                 'duration': '3 years',
E                                 'position': 'Team Lead'}]}

The bug here is with work history that for some reason was renamed to workHistory in the camel case. Unfortunately, I only have access to NVIDIA RTX A4000 of 16Gb so I was not able to run the tests with HuggingFaceH4/zephyr-7b-beta. Instead, I used microsoft/phi_1 (tho it might be a more suitable model for JSON-formated outputs), and here is a tiny snipped of code that I created based on the above test:

from transformers import AutoTokenizer
from vllm import LLM, SamplingParams
from vllm.model_executor.guided_logits_processors import JSONLogitsProcessor


model = "microsoft/phi_1"
tokenizer = AutoTokenizer.from_pretrained(model)

schema = {
    "type": "object",
    "properties": {
        "name": {
            "type": "string"
        },
        "age": {
            "type": "integer"
        },
        "skills": {
            "type": "array",
            "items": {
                "type": "string",
                "maxLength": 10
            },
            "minItems": 3
        },
        "work history": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "company": {
                        "type": "string"
                    },
                    "duration": {
                        "type": "string"
                    },
                    "position": {
                        "type": "string"
                    }
                },
                "required": ["company", "position"]
            }
        }
    },
    "required": ["name", "age", "skills", "work history"]
}

prompt = f"Give an example JSON for an employee profile that fits this schema: {schema}"

sampling_params = SamplingParams(
        temperature=1.0,
        n=3,
        max_tokens=500,
        logits_processors=[JSONLogitsProcessor(schema, tokenizer)]
)
llm = LLM(model=model, dtype="auto")
outputs = llm.generate([prompt], sampling_params)
print([
        output_.text for output_ in outputs[0].outputs
])

And my local outputs were:

[
     '{"name": "John Doe", "age": 30, "skills": ["Python", "Java", "C++"], "work history": [{"company": "Apple", "duration": "2h", "position": "Developer"}, {"company": "Microsoft", "duration": "4h", "position": "Manager"}]}', 
     '{"name": "Sarah", "age": 35, "skills": ["Python", "Data Lab", "Machine Al"],"work history": [{"company": "Apple", "duration": "7 days", "position": "Actor"}]}', 
     '{"name": "John Doe", "age": 30, "skills": ["Python", "Java", "JavaScript"], "work history": [{"company": "Acme", "duration": "6 months", "position": "Manager"}, {"company": "ACM", "duration": "7 days", "position": "Associate"}, {"company": "LLC", "duration": "1 month", "position": "AM Analyst"}]}'
]

Here for some reason, everything is fine, and work history is formatted as expected. So at that point I definitely need a review, calling for help @simon-mo

simon-mo

Sorry about the delay. I think changing the model to phi is fine but you need to do some work to move the guided decoding related tests fromtest_openai_server.py to a separate file because existing tests rely on LoRA models on Zephyr.

simon-mo · 2024-03-26T20:17:08Z

tests/entrypoints/test_openai_server.py

@@ -694,6 +694,7 @@ async def test_guided_grammar(server, client: openai.AsyncOpenAI):
        prompt=("Generate a sql state that select col_1 from "
                "table_1 where it is equals to 1"),
        temperature=1.0,
+        n=3,


since n = 3 here let's also verify all the outputs, you might also need to add use_beam_search to extra_body

vllm/model_executor/guided_logits_processors.py

vllm/sampling_params.py

github-actions · 2024-10-29T02:04:08Z

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

mergify · 2024-10-29T02:04:46Z

This pull request has merge conflicts that must be resolved before it can be
merged. @maximzubkov please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

maximzubkov added 7 commits March 17, 2024 10:30

Add deepcopy for logits processors

9535c1e

Extend clone

7e1edef

Update multi-logit-processors

afe7aa5

Fix empty fsm state

a1bca5e

Add a small comment explaining multi-logits-processsors

00db489

Add n into tests

5ea2899

Minor fix to codestyle

1128f86

maximzubkov mentioned this pull request Mar 17, 2024

[Bug]: Bug in Guided Generation Logits Processorwith n>1 #3448

Closed

maximzubkov commented Mar 17, 2024

View reviewed changes

vllm/model_executor/guided_logits_processors.py Show resolved Hide resolved

maximzubkov commented Mar 17, 2024

View reviewed changes

vllm/model_executor/guided_logits_processors.py Show resolved Hide resolved

maximzubkov commented Mar 17, 2024

View reviewed changes

maximzubkov added 5 commits March 17, 2024 19:08

Minor fix to logit_processor_refs

c0e4028

Fix codestyler

d5e62e7

Fix seq input

fea77ba

Refactor Regex Copy

4f31aa4

Fix initialization if

39b26b9

simon-mo self-requested a review March 17, 2024 22:32

simon-mo self-assigned this Mar 17, 2024

simon-mo reviewed Mar 26, 2024

View reviewed changes

simon-mo added the action-required label Mar 26, 2024

maximzubkov added 5 commits April 7, 2024 19:19

Merge branch 'main' into fix-beam-search-logits-processor

4c2a75a

Update logits processor apply

d62e522

Update copy to get rid of redundant operations

4e6444c

Fix codestyle

76f7520

Extend tests

c63b61c

github-actions bot added the stale label Oct 29, 2024

mergify bot added the needs-rebase label Oct 29, 2024

github-actions bot added unstale and removed stale labels Nov 2, 2024

simon-mo requested review from DarkLight1337 and robertgshaw2-redhat as code owners November 26, 2024 05:49

hmellor closed this Feb 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Fix beam search logits processor #3454

[Bugfix] Fix beam search logits processor #3454

maximzubkov commented Mar 17, 2024

maximzubkov Mar 17, 2024

maximzubkov Mar 17, 2024

maximzubkov left a comment

maximzubkov commented Mar 17, 2024

simon-mo left a comment

simon-mo Mar 26, 2024

github-actions bot commented Oct 29, 2024

mergify bot commented Oct 29, 2024

		@@ -99,8 +120,16 @@ def __call__(self, input_ids: List[int],

		return scores

		def __deepcopy__(self, memo):

[Bugfix] Fix beam search logits processor #3454

[Bugfix] Fix beam search logits processor #3454

Conversation

maximzubkov commented Mar 17, 2024

maximzubkov Mar 17, 2024

Choose a reason for hiding this comment

maximzubkov Mar 17, 2024

Choose a reason for hiding this comment

maximzubkov left a comment

Choose a reason for hiding this comment

maximzubkov commented Mar 17, 2024

simon-mo left a comment

Choose a reason for hiding this comment

simon-mo Mar 26, 2024

Choose a reason for hiding this comment

github-actions bot commented Oct 29, 2024

mergify bot commented Oct 29, 2024