Increased latency when using structuredOutputs on first #3656
-
Not sure this is related to ai-sdk, but my thoughts are that this is on open ai, but I am finding it hard to find where to be able to discuss this. I've been utilizing OpenAI's Structured Outputs feature with the gpt-4o-2024-08-06 model, alongside Langfuse for tracing. While this feature ensures that model-generated outputs conform precisely to specified JSON schemas, I've encountered notable latency issues when using tools. Key Observations:Initial Schema Processing Latency:The first time I use a new schema, it seems that open ai might be running it as a context-free grammar (CFG) process, which introduces a latency of approximately 2 to 60 seconds, depending on the schema's complexity. But even on simple schemas the latency is happening. Subsequent RequestsAfter the initial processing, the schema is cached, allowing subsequent requests to proceed at regular inference speeds. However, I've observed that even repeat prompts experience increased latency compared to using plain JSON outputs. Scope and Duration of CachingThe exact scope (e.g., API key-specific or global) and duration of schema caching remain unclear. What have I tried
I will try to make a repro case, but I just wanted to check if anyone else has experienced this? Example of a insane latency that occured. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
This is a known OpenAI limitation that they announced when they released structured outputs. See https://platform.openai.com/docs/guides/structured-outputs#how-to-use
|
Beta Was this translation helpful? Give feedback.
This is a known OpenAI limitation that they announced when they released structured outputs.
See https://platform.openai.com/docs/guides/structured-outputs#how-to-use