Increased latency when using structuredOutputs on first #3656

viktorlarsson · 2024-11-13T14:14:01Z

viktorlarsson
Nov 13, 2024

Not sure this is related to ai-sdk, but my thoughts are that this is on open ai, but I am finding it hard to find where to be able to discuss this.

I've been utilizing OpenAI's Structured Outputs feature with the gpt-4o-2024-08-06 model, alongside Langfuse for tracing. While this feature ensures that model-generated outputs conform precisely to specified JSON schemas, I've encountered notable latency issues when using tools.

Key Observations:

Initial Schema Processing Latency:

The first time I use a new schema, it seems that open ai might be running it as a context-free grammar (CFG) process, which introduces a latency of approximately 2 to 60 seconds, depending on the schema's complexity. But even on simple schemas the latency is happening.

Subsequent Requests

After the initial processing, the schema is cached, allowing subsequent requests to proceed at regular inference speeds. However, I've observed that even repeat prompts experience increased latency compared to using plain JSON outputs.

Scope and Duration of Caching

The exact scope (e.g., API key-specific or global) and duration of schema caching remain unclear.

What have I tried

I have cached the data using kv, added a console.log in the start and then in the end of execute. All of the responses are within 0.1-0.2s. The actual rendering of the tool call could take anywhere from 2s to 30s. This for example is not even cached and quite simple.


dateTool
{
  "dateRangeStart": "2024-01-01",
  "dateRangeEnd": "2024-12-31",
  "confidenceScore": 0.8
}

I have tried to return the result as string, and the latency goes down completely. But the issue here is that the subsequent agents zod params are hallucinating some of the data, sometimes.

I will try to make a repro case, but I just wanted to check if anyone else has experienced this?

Example of a insane latency that occured.

Answered by lgrammel

Nov 13, 2024

This is a known OpenAI limitation that they announced when they released structured outputs.

See https://platform.openai.com/docs/guides/structured-outputs#how-to-use

Note: the first request you make with any schema will have additional latency as our API processes the schema, but subsequent requests with the same schema will not have additional latency.

View full answer

lgrammel · 2024-11-13T14:16:43Z

lgrammel
Nov 13, 2024
Maintainer

This is a known OpenAI limitation that they announced when they released structured outputs.

See https://platform.openai.com/docs/guides/structured-outputs#how-to-use

Note: the first request you make with any schema will have additional latency as our API processes the schema, but subsequent requests with the same schema will not have additional latency.

2 replies

viktorlarsson Nov 13, 2024
Author

But does the tool params tool({ schema: }) some how use structuredOutputs? Would you say making better describe on this schema would takeaway some of the potential hallucinations?

lgrammel Nov 13, 2024
Maintainer

structured outputs apply to both tool calls and json schema answers. we switch tool calling to strict mode (structured) when structured outputs are enabled. if you want tool calls without strict mode, switch off structured outputs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increased latency when using structuredOutputs on first #3656

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Increased latency when using structuredOutputs on first #3656

viktorlarsson Nov 13, 2024

Key Observations:

Initial Schema Processing Latency:

Subsequent Requests

Scope and Duration of Caching

What have I tried

Replies: 1 comment · 2 replies

lgrammel Nov 13, 2024 Maintainer

viktorlarsson Nov 13, 2024 Author

lgrammel Nov 13, 2024 Maintainer

viktorlarsson
Nov 13, 2024

Replies: 1 comment 2 replies

lgrammel
Nov 13, 2024
Maintainer

viktorlarsson Nov 13, 2024
Author

lgrammel Nov 13, 2024
Maintainer