WebSockets: do not yield chunks unless request ID matches #94
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Right now, if a new request is made over a connected websocket before a previous request completes (for example, a user interrupts a voice agent and the agent needs to abandon an ongoing TTS request and start a different one), the client does not associate chunks with a specific request, so it may get chunks from the previous request and think they are from this one.
Here we pay attention to the "start" message instead of just mindlessly yielding audio chunks. Then we make sure any "end" message matches the same request ID.
We also make sure the number of sent requests and received responses stays in sync, to avoid an edge case where we sent a bunch of requests in quick succession and get back a bunch of "start" messages. We want to make sure that if we just made the nth request, we only pay attention to the nth response.