Clarifying max_tokens Usage and Limits in OpenAI API (Issue #3195 Update) #3208
Algorithm5838
started this conversation in
General
Replies: 1 comment 12 replies
-
it will break if there is lots messages in chats because we had this and this for prevent hallucination |
Beta Was this translation helpful? Give feedback.
12 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
After reviewing OpenAI's documentation, I now have a clearer understanding of the
max_tokens
parameter. Initially, when I opened an issue #3195, I had misunderstood its function. I've also noticed that there may be a similar misunderstanding reflected in the description ofmax_tokens
in the settings:TLDR
Terminology:
input
=input tokens
=prompt
=message
— These terms are synonymous and refer to the user's prompt.output
=generated tokens
=completion
— These terms are synonymous and refer to the AI's response.context window
=context length
— These terms are synonymous and refer to the model's context.Clarifications:
max_tokens
pertain only to theoutput
:gpt-4-1106-preview
,gpt-4-vision-preview
, andgpt-3.5-turbo-1106
(source),max_tokens
, as of date, should not exceed4096
:input
andmax_tokens
must not surpass thecontext window
, or an error will occur (refer to the first link for details):max_tokens
to allocate more tokens forinput
, with the trade-off being a shorter AI response, as it will be confined to the set number ofmax_tokens
.With this understanding, we can discern the causes of the errors in my previous issue.
Error 1, with model
gpt-3.5-turbo-1106
:The error arose because our requested
max_tokens
of8192
exceeded the model's maximum allowedmax_tokens
of4096
, as noted above.Error 2, with model
gpt-4
:This error occurred because the sum of the
max_tokens
of8192
and the input tokens requested exceeded the model's maximum context length of8192
, as explained previously.Conclusion:
In implementation, the maximum value for
max_tokens
should not be higher thancontext window
-1, for example4095
.There are two ways to go about it, in my opinion, for the general user, but I don't know how to code it:
max_tokens
for the users by dividingcontext window
by 2. For example: ifcontext window
was4096
,max_token
would be2048
, and so on. It should not exceed this, the user though should only be able to decrease it.man_tokens
for the users by using this formula: (max_tokens
=context window
-input tokens
). We already know the value ofcontext window
, since it is chosen by the user, we just need to calculateinput tokens
and do the subtraction.I prefer the first option.
Note: An additional good approach for the first option, would be to introduce a checkbox with a warning, allowing informed users to opt-in to increasing the
max_tokens
value if they understand the implications. The warning could be something like:Either way, I find it useful if the user could know the value of tokens of their prompts during writing it and after, and the AI's tokens.
All this should be considered if you want to control the length of the AI's response, or if there were models that require it.
Beta Was this translation helpful? Give feedback.
All reactions