Render chat template tojson filter as unicode #31041

CISC · 2024-05-26T09:21:58Z

What does this PR do?

Rant about lack of special capabilties templates...

I wish more models that have extra capabilities like f.ex. function calling would start using the power available in chat templates instead of implementing their own inference mini-stacks (Mistral, Berkeley, et al.; I'm looking at you) that you would have to add as model-specific dependencies to use those features "as intended".

Shout out to Cohere for actually making an effort here, however it's a shame they did not follow it to completion to also include the final tool role to enable the full round-trip with natural language response to the function call (see my Mistral GGUF for a fully functional example of this).

I've been using the jinja2 tojson filter in a couple of chat templates to render function calling instructions, which is working very well, however sometimes this will introduce escaped unicode characters, which is undesirable.

I'm mainly using GGUFs, so have not used tojson with transformers yet, but I imagine this could be a useful feature for many recent models if only they started using chat templates properly (see rant).

This PR simply changes the default parameters of tojson so that the JSON is rendered in unicode.

Examples of (GGUF) models using tojson:

Submitted similar PR to abetlen/llama-cpp-python#1486

Who can review?

@ArthurZucker

CISC · 2024-05-26T09:38:15Z

The failing tests seem to be an unrelated issue in main branch, and not this PR.

Rocketknight1

LGTM!

Rocketknight1 · 2024-05-28T13:14:46Z

Also @CISC and @junrae6454, since you're both interested in tool use with chat templates, we're actually planning an overhaul of that, and several of the model templates as well. Please see PR #30621, and let me know if you have any feedback!

CISC · 2024-05-28T13:28:05Z

@Rocketknight1 Oh, hey, that might impact some of my ongoing work in llama-cpp-python, thanks for the heads up!

HuggingFaceDocBuilderDev · 2024-05-28T13:33:21Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Rocketknight1 · 2024-05-28T14:02:47Z

Merging without core maintainer review because it's a one-line fix that only affects the tojson filter in Jinja, so it shouldn't have any other side-effects on the library. cc @amyeroberts and @ArthurZucker for visibility anyway!

ArthurZucker

Thanks and LGTM

CISC added 2 commits May 26, 2024 10:41

Render chat template tojson filter as unicode

aecfdd2

ruff--

c9c1bce

CISC mentioned this pull request May 28, 2024

Set ensure_ascii=False in JSON dump within apply_chat_template #31079

Closed

5 tasks

Merge branch 'huggingface:main' into tojson-unicode

03007f6

Rocketknight1 approved these changes May 28, 2024

View reviewed changes

Rocketknight1 merged commit 22dab24 into huggingface:main May 28, 2024
21 checks passed

ArthurZucker reviewed May 28, 2024

View reviewed changes

CISC deleted the tojson-unicode branch May 28, 2024 14:32

junrae6454 mentioned this pull request May 29, 2024

Change JSON serialization to custom json.dumps #31100

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Render chat template tojson filter as unicode #31041

Render chat template tojson filter as unicode #31041

CISC commented May 26, 2024

CISC commented May 26, 2024 •

edited

Loading

Rocketknight1 left a comment •

edited

Loading

Rocketknight1 commented May 28, 2024

CISC commented May 28, 2024

HuggingFaceDocBuilderDev commented May 28, 2024

Rocketknight1 commented May 28, 2024

ArthurZucker left a comment

Render chat template tojson filter as unicode #31041

Render chat template tojson filter as unicode #31041

Conversation

CISC commented May 26, 2024

What does this PR do?

Who can review?

CISC commented May 26, 2024 • edited Loading

Rocketknight1 left a comment • edited Loading

Choose a reason for hiding this comment

Rocketknight1 commented May 28, 2024

CISC commented May 28, 2024

HuggingFaceDocBuilderDev commented May 28, 2024

Rocketknight1 commented May 28, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

CISC commented May 26, 2024 •

edited

Loading

Rocketknight1 left a comment •

edited

Loading