Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically apply chat template in non-chat scenarios #1533

Merged
merged 2 commits into from
Jan 29, 2025

Conversation

sbalandi
Copy link
Contributor

@sbalandi sbalandi commented Jan 13, 2025

@github-actions github-actions bot added category: visual language Visual language pipeline category: LLM LLM pipeline (stateful, static) no-match-files labels Jan 13, 2025
src/README.md Outdated Show resolved Hide resolved
src/cpp/src/icontinuous_batching.cpp Show resolved Hide resolved
src/cpp/src/icontinuous_batching.cpp Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@github-actions github-actions bot added category: GHA CI based on Github actions category: tokenizers Tokenizer class or submodule update category: GenAI C++ API Changes in GenAI C++ public headers labels Jan 13, 2025
@sbalandi sbalandi force-pushed the chat_templ branch 2 times, most recently from f1ece12 to e5fa889 Compare January 13, 2025 21:42
@github-actions github-actions bot added the category: samples GenAI samples label Jan 13, 2025
@AlexKoff88
Copy link
Collaborator

If using a chat template is supposed to be a default behavior for .generate() method it is not aligned with HF Transformers lib. We should turn it off in the tools at least (both WWB and LLM-Bench).

@ilya-lavrenov
Copy link
Contributor

ilya-lavrenov commented Jan 14, 2025

If using a chat template is supposed to be a default behavior for .generate() method it is not aligned with HF Transformers lib. We should turn it off in the tools at least (both WWB and LLM-Bench).

what's about HF e2e pipeline?
do they apply chat_template by default?

@eaidova

@AlexKoff88
Copy link
Collaborator

If using a chat template is supposed to be a default behavior for .generate() method it is not aligned with HF Transformers lib. We should turn it off in the tools at least (both WWB and LLM-Bench).

what's about HF e2e pipeline? do they apply chat_template by default?

@eaidova

text2text-generation pipeline does not use a chat template by default from what I know.

@ilya-lavrenov
Copy link
Contributor

If using a chat template is supposed to be a default behavior for .generate() method it is not aligned with HF Transformers lib. We should turn it off in the tools at least (both WWB and LLM-Bench).

what's about HF e2e pipeline? do they apply chat_template by default?
@eaidova

text2text-generation pipeline does not use a chat template by default from what I know.

what if it's instruction model?

@AlexKoff88
Copy link
Collaborator

If using a chat template is supposed to be a default behavior for .generate() method it is not aligned with HF Transformers lib. We should turn it off in the tools at least (both WWB and LLM-Bench).

what's about HF e2e pipeline? do they apply chat_template by default?
@eaidova

text2text-generation pipeline does not use a chat template by default from what I know.

what if it's instruction model?

Double-checked, and it seems like HF changed the behaviour at some point for text-generation pipeline. Details. But the input should be formatted appropreatelly to trigger chat template usage. If the user just passes a string data, no chat template is applied.

@ilya-lavrenov
Copy link
Contributor

If using a chat template is supposed to be a default behavior for .generate() method it is not aligned with HF Transformers lib. We should turn it off in the tools at least (both WWB and LLM-Bench).

what's about HF e2e pipeline? do they apply chat_template by default?
@eaidova

text2text-generation pipeline does not use a chat template by default from what I know.

what if it's instruction model?

Double-checked, and it seems like HF changed the behaviour at some point for text-generation pipeline. Details. But the input should be formatted appropreatelly to trigger chat template usage. If the user just passes a string data, no chat template is applied.

do you think it's better to add explicit flag, then?

pipe.generate(prompt, apply_chat_template=True, max_new_tokens=40)

@AlexKoff88
Copy link
Collaborator

If using a chat template is supposed to be a default behavior for .generate() method it is not aligned with HF Transformers lib. We should turn it off in the tools at least (both WWB and LLM-Bench).

what's about HF e2e pipeline? do they apply chat_template by default?
@eaidova

text2text-generation pipeline does not use a chat template by default from what I know.

what if it's instruction model?

Double-checked, and it seems like HF changed the behaviour at some point for text-generation pipeline. Details. But the input should be formatted appropreatelly to trigger chat template usage. If the user just passes a string data, no chat template is applied.

do you think it's better to add explicit flag, then?

pipe.generate(prompt, apply_chat_template=True, max_new_tokens=40)

This option looks good to me but for drop-in replacement of HF API to OV GenAI it is better to follow HF approach with message format. Anyway, they should have more experience and user's feedback.

@sbalandi
Copy link
Contributor Author

If using a chat template is supposed to be a default behavior for .generate() method it is not aligned with HF Transformers lib. We should turn it off in the tools at least (both WWB and LLM-Bench).

what's about HF e2e pipeline? do they apply chat_template by default?
@eaidova

text2text-generation pipeline does not use a chat template by default from what I know.

what if it's instruction model?

Double-checked, and it seems like HF changed the behaviour at some point for text-generation pipeline. Details. But the input should be formatted appropreatelly to trigger chat template usage. If the user just passes a string data, no chat template is applied.

do you think it's better to add explicit flag, then?

pipe.generate(prompt, apply_chat_template=True, max_new_tokens=40)

This option looks good to me but for drop-in replacement of HF API to OV GenAI it is better to follow HF approach with message format. Anyway, they should have more experience and user's feedback.

Should both ways be added - possibility to put messages to the generate() function, apply chat_template if input value is messages and leave as is if it's string and add apply_chat_template as input parameter for generate() ?

@github-actions github-actions bot added category: sampling Sampling / Decoding algorithms category: Python API Python API for GenAI labels Jan 17, 2025
@sbalandi sbalandi force-pushed the chat_templ branch 2 times, most recently from 34e5dfd to 4d3783f Compare January 17, 2025 19:08
@sbalandi sbalandi force-pushed the chat_templ branch 4 times, most recently from d7ab914 to bb6a577 Compare January 28, 2025 19:28
@sbalandi sbalandi force-pushed the chat_templ branch 2 times, most recently from d60caa3 to d55e1e0 Compare January 28, 2025 22:07
@sbalandi
Copy link
Contributor Author

to discuss:

  1. Test tests/python_tests/test_sampling.py::test_greedy with use_cb=True , model_id katuni4ka/tiny-random-phi3 , prompt="I have an interview about product speccing with the company Weekend Health. Give me an example of a question they might ask with regards about a new feature" and max_new_tokens=300 fails on Linux, so was added apply_chat_template=False
  2. Test tests/python_tests/test_llm_pipeline.py::test_string_inputs with beam search args and prompt 'Alan Turing was a' fails on MacOS , prompt was changed to Why is the Sun yellow?

@ilya-lavrenov ilya-lavrenov added this pull request to the merge queue Jan 29, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 29, 2025
@ilya-lavrenov ilya-lavrenov added this pull request to the merge queue Jan 29, 2025
@ilya-lavrenov ilya-lavrenov removed this pull request from the merge queue due to a manual request Jan 29, 2025
@sbalandi sbalandi force-pushed the chat_templ branch 2 times, most recently from faf1043 to 71b8c52 Compare January 29, 2025 11:53
@Wovchena Wovchena enabled auto-merge January 29, 2025 13:11
@Wovchena Wovchena added this pull request to the merge queue Jan 29, 2025
Merged via the queue into openvinotoolkit:master with commit 020bdab Jan 29, 2025
62 checks passed
@sbalandi
Copy link
Contributor Author

sbalandi commented Jan 29, 2025

to discuss:

  1. Test tests/python_tests/test_sampling.py::test_greedy with use_cb=True , model_id katuni4ka/tiny-random-phi3 , prompt="I have an interview about product speccing with the company Weekend Health. Give me an example of a question they might ask with regards about a new feature" and max_new_tokens=300 fails on Linux, so was added apply_chat_template=False
  2. Test tests/python_tests/test_llm_pipeline.py::test_string_inputs with beam search args and prompt 'Alan Turing was a' fails on MacOS , prompt was changed to Why is the Sun yellow?

Created bugs: CVS161498 , CVS161499

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: GenAI C++ API Changes in GenAI C++ public headers category: GHA CI based on Github actions category: llm_bench Label for tool/llm_bench folder category: LLM LLM pipeline (stateful, static) category: Python API Python API for GenAI category: samples GenAI samples category: sampling Sampling / Decoding algorithms category: tokenizers Tokenizer class or submodule update category: visual language Visual language pipeline category: whisper Whisper pipeline category: WWB PR changes WWB no-match-files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants