Automatically apply chat template in non-chat scenarios #1533

sbalandi · 2025-01-13T11:51:14Z

src/README.md

src/cpp/src/icontinuous_batching.cpp

README.md

AlexKoff88 · 2025-01-14T06:55:52Z

If using a chat template is supposed to be a default behavior for .generate() method it is not aligned with HF Transformers lib. We should turn it off in the tools at least (both WWB and LLM-Bench).

ilya-lavrenov · 2025-01-14T07:06:02Z

If using a chat template is supposed to be a default behavior for .generate() method it is not aligned with HF Transformers lib. We should turn it off in the tools at least (both WWB and LLM-Bench).

what's about HF e2e pipeline?
do they apply chat_template by default?

@eaidova

AlexKoff88 · 2025-01-14T07:18:43Z

If using a chat template is supposed to be a default behavior for .generate() method it is not aligned with HF Transformers lib. We should turn it off in the tools at least (both WWB and LLM-Bench).

what's about HF e2e pipeline? do they apply chat_template by default?

@eaidova

text2text-generation pipeline does not use a chat template by default from what I know.

ilya-lavrenov · 2025-01-14T08:09:14Z

If using a chat template is supposed to be a default behavior for .generate() method it is not aligned with HF Transformers lib. We should turn it off in the tools at least (both WWB and LLM-Bench).

what's about HF e2e pipeline? do they apply chat_template by default?
@eaidova

text2text-generation pipeline does not use a chat template by default from what I know.

what if it's instruction model?

AlexKoff88 · 2025-01-14T08:56:47Z

If using a chat template is supposed to be a default behavior for .generate() method it is not aligned with HF Transformers lib. We should turn it off in the tools at least (both WWB and LLM-Bench).

what's about HF e2e pipeline? do they apply chat_template by default?
@eaidova

text2text-generation pipeline does not use a chat template by default from what I know.

what if it's instruction model?

Double-checked, and it seems like HF changed the behaviour at some point for text-generation pipeline. Details. But the input should be formatted appropreatelly to trigger chat template usage. If the user just passes a string data, no chat template is applied.

ilya-lavrenov · 2025-01-14T13:08:31Z

If using a chat template is supposed to be a default behavior for .generate() method it is not aligned with HF Transformers lib. We should turn it off in the tools at least (both WWB and LLM-Bench).

what's about HF e2e pipeline? do they apply chat_template by default?
@eaidova

text2text-generation pipeline does not use a chat template by default from what I know.

what if it's instruction model?

Double-checked, and it seems like HF changed the behaviour at some point for text-generation pipeline. Details. But the input should be formatted appropreatelly to trigger chat template usage. If the user just passes a string data, no chat template is applied.

do you think it's better to add explicit flag, then?

pipe.generate(prompt, apply_chat_template=True, max_new_tokens=40)

AlexKoff88 · 2025-01-14T13:35:33Z

If using a chat template is supposed to be a default behavior for .generate() method it is not aligned with HF Transformers lib. We should turn it off in the tools at least (both WWB and LLM-Bench).

what's about HF e2e pipeline? do they apply chat_template by default?
@eaidova

text2text-generation pipeline does not use a chat template by default from what I know.

what if it's instruction model?

Double-checked, and it seems like HF changed the behaviour at some point for text-generation pipeline. Details. But the input should be formatted appropreatelly to trigger chat template usage. If the user just passes a string data, no chat template is applied.

do you think it's better to add explicit flag, then?
pipe.generate(prompt, apply_chat_template=True, max_new_tokens=40)

This option looks good to me but for drop-in replacement of HF API to OV GenAI it is better to follow HF approach with message format. Anyway, they should have more experience and user's feedback.

sbalandi · 2025-01-14T17:42:27Z

If using a chat template is supposed to be a default behavior for .generate() method it is not aligned with HF Transformers lib. We should turn it off in the tools at least (both WWB and LLM-Bench).

what's about HF e2e pipeline? do they apply chat_template by default?
@eaidova

text2text-generation pipeline does not use a chat template by default from what I know.

what if it's instruction model?

Double-checked, and it seems like HF changed the behaviour at some point for text-generation pipeline. Details. But the input should be formatted appropreatelly to trigger chat template usage. If the user just passes a string data, no chat template is applied.

do you think it's better to add explicit flag, then?
pipe.generate(prompt, apply_chat_template=True, max_new_tokens=40)
This option looks good to me but for drop-in replacement of HF API to OV GenAI it is better to follow HF approach with message format. Anyway, they should have more experience and user's feedback.

Should both ways be added - possibility to put messages to the generate() function, apply chat_template if input value is messages and leave as is if it's string and add apply_chat_template as input parameter for generate() ?

sbalandi · 2025-01-28T23:26:05Z

to discuss:

Test tests/python_tests/test_sampling.py::test_greedy with use_cb=True , model_id katuni4ka/tiny-random-phi3 , prompt="I have an interview about product speccing with the company Weekend Health. Give me an example of a question they might ask with regards about a new feature" and max_new_tokens=300 fails on Linux, so was added apply_chat_template=False
Test tests/python_tests/test_llm_pipeline.py::test_string_inputs with beam search args and prompt 'Alan Turing was a' fails on MacOS , prompt was changed to Why is the Sun yellow?

samples/cpp/text_generation/README.md

src/cpp/include/openvino/genai/generation_config.hpp

src/cpp/include/openvino/genai/whisper_generation_config.hpp

src/cpp/src/whisper_generation_config.cpp

sbalandi · 2025-01-29T22:26:30Z

to discuss:

Test tests/python_tests/test_sampling.py::test_greedy with use_cb=True , model_id katuni4ka/tiny-random-phi3 , prompt="I have an interview about product speccing with the company Weekend Health. Give me an example of a question they might ask with regards about a new feature" and max_new_tokens=300 fails on Linux, so was added apply_chat_template=False

Test tests/python_tests/test_llm_pipeline.py::test_string_inputs with beam search args and prompt 'Alan Turing was a' fails on MacOS , prompt was changed to Why is the Sun yellow?

Created bugs: CVS161498 , CVS161499

See #1533 Co-authored-by: Alexander Kozlov <kozzzloff@list.ru>

github-actions bot added category: visual language Visual language pipeline category: LLM LLM pipeline (stateful, static) no-match-files labels Jan 13, 2025

ilya-lavrenov reviewed Jan 13, 2025

View reviewed changes

src/README.md Outdated Show resolved Hide resolved

src/cpp/src/icontinuous_batching.cpp Show resolved Hide resolved

ilya-lavrenov assigned ilya-lavrenov and Wovchena Jan 13, 2025

ilya-lavrenov added this to the 2025.0 milestone Jan 13, 2025

ilya-lavrenov requested a review from AlexKoff88 January 13, 2025 13:16

Wovchena reviewed Jan 13, 2025

View reviewed changes

src/cpp/src/icontinuous_batching.cpp Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

github-actions bot added category: GHA CI based on Github actions category: tokenizers Tokenizer class or submodule update category: GenAI C++ API Changes in GenAI C++ public headers labels Jan 13, 2025

sbalandi force-pushed the chat_templ branch 2 times, most recently from f1ece12 to e5fa889 Compare January 13, 2025 21:42

github-actions bot added the category: samples GenAI samples label Jan 13, 2025

sbalandi force-pushed the chat_templ branch from e5fa889 to 863f556 Compare January 13, 2025 21:46

sbalandi force-pushed the chat_templ branch from 863f556 to f5d74b4 Compare January 14, 2025 13:42

sbalandi force-pushed the chat_templ branch from f5d74b4 to 926a9cf Compare January 17, 2025 18:34

github-actions bot added category: sampling Sampling / Decoding algorithms category: Python API Python API for GenAI labels Jan 17, 2025

sbalandi force-pushed the chat_templ branch 2 times, most recently from 34e5dfd to 4d3783f Compare January 17, 2025 19:08

ilya-lavrenov requested a review from as-suvorov January 27, 2025 12:07

as-suvorov approved these changes Jan 27, 2025

View reviewed changes

sbalandi force-pushed the chat_templ branch 4 times, most recently from d7ab914 to bb6a577 Compare January 28, 2025 19:28

ilya-lavrenov enabled auto-merge January 28, 2025 20:15

sbalandi force-pushed the chat_templ branch 2 times, most recently from d60caa3 to d55e1e0 Compare January 28, 2025 22:07

Automatically apply chat template in non-chat scenarios

a736d0b

ilya-lavrenov added this pull request to the merge queue Jan 29, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 29, 2025

ilya-lavrenov added this pull request to the merge queue Jan 29, 2025

Wovchena reviewed Jan 29, 2025

View reviewed changes

samples/cpp/text_generation/README.md Show resolved Hide resolved

src/cpp/include/openvino/genai/generation_config.hpp Show resolved Hide resolved

src/cpp/include/openvino/genai/whisper_generation_config.hpp Outdated Show resolved Hide resolved

ilya-lavrenov removed this pull request from the merge queue due to a manual request Jan 29, 2025

sbalandi force-pushed the chat_templ branch 2 times, most recently from faf1043 to 71b8c52 Compare January 29, 2025 11:53

Wovchena reviewed Jan 29, 2025

View reviewed changes

src/cpp/src/whisper_generation_config.cpp Show resolved Hide resolved

sbalandi force-pushed the chat_templ branch from 71b8c52 to ec6eee5 Compare January 29, 2025 12:21

Wovchena approved these changes Jan 29, 2025

View reviewed changes

Wovchena enabled auto-merge January 29, 2025 13:11

make test pass

ec6eee5

Wovchena added this pull request to the merge queue Jan 29, 2025

Merged via the queue into openvinotoolkit:master with commit 020bdab Jan 29, 2025
62 checks passed

ilya-lavrenov mentioned this pull request Jan 30, 2025

WWB: simplify code around start_chat / use_template #1650

Merged

ilya-lavrenov added a commit that referenced this pull request Jan 30, 2025

WWB: simplify code around start_chat / use_template (#1650)

debf2c6

See #1533 Co-authored-by: Alexander Kozlov <kozzzloff@list.ru>

sbalandi mentioned this pull request Jan 30, 2025

Fix using lm_bemch/wwb with version w/o apply_chat_template #1651

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically apply chat template in non-chat scenarios #1533

Automatically apply chat template in non-chat scenarios #1533

sbalandi commented Jan 13, 2025 •

edited

Loading

AlexKoff88 commented Jan 14, 2025

ilya-lavrenov commented Jan 14, 2025 •

edited

Loading

AlexKoff88 commented Jan 14, 2025

ilya-lavrenov commented Jan 14, 2025

AlexKoff88 commented Jan 14, 2025

ilya-lavrenov commented Jan 14, 2025

AlexKoff88 commented Jan 14, 2025

sbalandi commented Jan 14, 2025

sbalandi commented Jan 28, 2025

sbalandi commented Jan 29, 2025 •

edited

Loading

Automatically apply chat template in non-chat scenarios #1533

Automatically apply chat template in non-chat scenarios #1533

Conversation

sbalandi commented Jan 13, 2025 • edited Loading

AlexKoff88 commented Jan 14, 2025

ilya-lavrenov commented Jan 14, 2025 • edited Loading

AlexKoff88 commented Jan 14, 2025

ilya-lavrenov commented Jan 14, 2025

AlexKoff88 commented Jan 14, 2025

ilya-lavrenov commented Jan 14, 2025

AlexKoff88 commented Jan 14, 2025

sbalandi commented Jan 14, 2025

sbalandi commented Jan 28, 2025

sbalandi commented Jan 29, 2025 • edited Loading

sbalandi commented Jan 13, 2025 •

edited

Loading

ilya-lavrenov commented Jan 14, 2025 •

edited

Loading

sbalandi commented Jan 29, 2025 •

edited

Loading