Clarifying max_tokens Usage and Limits in OpenAI API (Issue #3195 Update) #3208

Algorithm5838 · 2023-11-10T01:22:19Z

Algorithm5838
Nov 10, 2023

After reviewing OpenAI's documentation, I now have a clearer understanding of the max_tokens parameter. Initially, when I opened an issue #3195, I had misunderstood its function. I've also noticed that there may be a similar misunderstanding reflected in the description of max_tokens in the settings:

Max Tokens
Maximum length of input tokens and generated tokens

TLDR

Terminology:

input = input tokens = prompt = message — These terms are synonymous and refer to the user's prompt.
output = generated tokens = completion — These terms are synonymous and refer to the AI's response.
context window = context length — These terms are synonymous and refer to the model's context.

Clarifications:

According to OpenAI's API Reference, max_tokens pertain only to the output:

The maximum number of tokens to generate in the chat completion.

As stated by OpenAI in their descriptions of the models gpt-4-1106-preview, gpt-4-vision-preview, and gpt-3.5-turbo-1106 (source), max_tokens, as of date, should not exceed 4096:

Returns a maximum of 4,096 output tokens.

The sum of input and max_tokens must not surpass the context window, or an error will occur (refer to the first link for details):

The total length of input tokens and generated tokens is limited by the model's context length.

Users can reduce the max_tokens to allocate more tokens for input, with the trade-off being a shorter AI response, as it will be confined to the set number of max_tokens.

With this understanding, we can discern the causes of the errors in my previous issue.

Error 1, with model `gpt-3.5-turbo-1106`:

{
  "error": {
    "message": "max_tokens is too large: 8192. This model supports at most 4096 completion tokens, whereas you provided 8192.",
    "type": "invalid_request_error",
    "param": "max_tokens",
    "code": null
  }
}

The error arose because our requested max_tokens of 8192 exceeded the model's maximum allowed max_tokens of 4096, as noted above.

Error 2, with model `gpt-4`:

{
  "error": {
    "message": "This model's maximum context length is 8192 tokens. However, you requested 8256 tokens (64 in the messages, 8192 in the completion). Please reduce the length of the messages or completion.",
    "type": "invalid_request_error",
    "param": "messages",
    "code": "context_length_exceeded"
  }
}

This error occurred because the sum of the max_tokens of 8192 and the input tokens requested exceeded the model's maximum context length of 8192, as explained previously.

Conclusion:

In implementation, the maximum value for max_tokens should not be higher than context window-1, for example 4095.
There are two ways to go about it, in my opinion, for the general user, but I don't know how to code it:

General (easy): Calculating max_tokens for the users by dividing context window by 2. For example: if context window was 4096, max_token would be 2048, and so on. It should not exceed this, the user though should only be able to decrease it.
Dynamic: Calculating man_tokens for the users by using this formula: (max_tokens = context window - input tokens). We already know the value of context window, since it is chosen by the user, we just need to calculate input tokens and do the subtraction.
I prefer the first option.

Note: An additional good approach for the first option, would be to introduce a checkbox with a warning, allowing informed users to opt-in to increasing the max_tokens value if they understand the implications. The warning could be something like:

Warning: The sum of input tokens and max_tokens must not exceed the model's context window.
Increasing max_tokens would force you to decrease input tokens, and decreasing it would allow you to use more input tokens.

Either way, I find it useful if the user could know the value of tokens of their prompts during writing it and after, and the AI's tokens.

All this should be considered if you want to control the length of the AI's response, or if there were models that require it.

H0llyW00dzZ · 2023-11-10T04:33:27Z

H0llyW00dzZ
Nov 10, 2023

it will break if there is lots messages in chats because we had this

and this for prevent hallucination

12 replies

H0llyW00dzZ Nov 10, 2023

nah it will affect max_tokens which can break conversation
and memory prompt is summarizing your conversation (example)

Algorithm5838 Nov 10, 2023
Author

Your point is valid in the sense that any tokens used in the conversation, including system and memory prompts, count towards the total token count that must not exceed the model's context window limit. However, these are part of the input tokens, not the max_tokens which are reserved for the AI's response.
At least, this is my understanding from the sources that I read.

H0llyW00dzZ Nov 10, 2023

Yes, it's also not currently possible to fix this issue, such as through refactoring or adding max_tokens, because it would break the old models cause context_length_exceeded errors.

Algorithm5838 Nov 10, 2023
Author

Yeah, maybe it is better to leave it as is.
There will always be max_tokens, even if we haven't set it, because OpenAI will automatically calculate it themselves: max_tokens = context window - input. You can test it by sending 4000 tokens in a 4k model, and the AI will generate less than 100 tokens in response.

H0llyW00dzZ Nov 10, 2023

Yes, the only thing that requires max_tokens is for GPT-4-Vision

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarifying max_tokens Usage and Limits in OpenAI API (Issue #3195 Update) #3208

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 12 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Clarifying max_tokens Usage and Limits in OpenAI API (Issue #3195 Update) #3208

Algorithm5838 Nov 10, 2023

TLDR

Terminology:

Clarifications:

Error 1, with model gpt-3.5-turbo-1106:

Error 2, with model gpt-4:

Conclusion:

Replies: 1 comment · 12 replies

H0llyW00dzZ Nov 10, 2023

H0llyW00dzZ Nov 10, 2023

Algorithm5838 Nov 10, 2023 Author

H0llyW00dzZ Nov 10, 2023

Algorithm5838 Nov 10, 2023 Author

H0llyW00dzZ Nov 10, 2023

Algorithm5838
Nov 10, 2023

Error 1, with model `gpt-3.5-turbo-1106`:

Error 2, with model `gpt-4`:

Replies: 1 comment 12 replies

H0llyW00dzZ
Nov 10, 2023

Algorithm5838 Nov 10, 2023
Author

Algorithm5838 Nov 10, 2023
Author