-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(query): Use long running async view for queries with poll #27758
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
This PR replaces the polling mechanism with EventSource for handling queries, aiming to improve performance and reduce latency, particularly for long-running queries.
- Added new
eventsource
endpoint in/posthog/api/query.py
using Django'sStreamingHttpResponse
for real-time query updates - Implemented feature flag 'query-eventsource' in
/frontend/src/queries/query.ts
to control EventSource functionality - Added
queryEventSource
andqueryEventSourceUrl
methods in/frontend/src/lib/api.ts
for server-sent events handling - Modified execution modes to ensure proper blocking behavior for eventsource requests
- Implemented connection abort handling and error management for EventSource connections
3 file(s) reviewed, 4 comment(s)
Edit PR Review Bot Settings | Greptile
frontend/src/lib/api.ts
Outdated
queryEventSourceUrl(): string { | ||
return new ApiRequest().queryEventSource().assembleFullUrl() | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: consider adding error handling and connection management for EventSource
Size Change: +251 B (+0.02%) Total Size: 1.16 MB ℹ️ View Unchanged
|
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
714d562
to
72478d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great idea – the wins here definitely are for queries under 60s, and the difference above that wouldn't matter as much. However, I think we can much more simply build this into the existing QueryViewSet.create
method, instead of adding a separate hand-rolled request handler outside of DRF. Let's just add this polling-in-the-backend approach to that viewset method
posthog/api/query.py
Outdated
ASYNC_FALLBACK_TO_POLLING_TIMEOUT = 50 | ||
|
||
|
||
async def query_async(request: Request, *args, **kwargs) -> HttpResponse: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not overload the term async
so much, because async queries already are a thing in the existing query endpoint. This is rather an UX optimization to async querying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah fair.. I'm just going to roll it out, then I'll make this the default way for queries to happen once we've rolled it out
@Twixes not sure what you mean by adding it to the existing viewset. DRF doesn’t support async and the performance impact of having a ton of open connections would be high i suspect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh, so annoying DRF doesn't support async. Looks like they don't even really have plans. So in terms of duplicated logic, proposing a non-DRF but still simpler approach below:
It looks like the code of |
6435168
to
3f264a3
Compare
It looks like the code of |
3f264a3
to
f930a4b
Compare
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
frontend/src/lib/api.ts
Outdated
public queryAsync(teamId?: TeamType['id']): ApiRequest { | ||
return this.environmentsDetail(teamId).addPathComponent('query_async') | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still I'm gonna say it's a bad idea to use "query_async" for something on top of existing async querying. More appropriate would be queryStream
/query_stream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or not "stream" anymore, since we're not streaming 🤔 More like query_awaiting
… query_v2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed query_awaiting
, steppin on toes
8ca8e43
to
8ff5ca3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, let's try it out. One thing to note: this only helps with ad-hoc queries (primarily the insight editor), but not insight/dashboard loading (since these endpoints have querying built-in, for minimized rounds trips indeed)
9f75598
to
fa0ac07
Compare
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
📸 UI snapshots have been updated1 snapshot changes in total. 0 added, 1 modified, 0 deleted:
Triggered by this commit. |
Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Michael Matloka <michael@matloka.com>
Co-authored-by: github-actions <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Michael Matloka <michael@matloka.com>
Problem
Polling is slow, doubly so when the browser is super busy fetching a bunch of other things
Changes
Use an async view that checks celery to see if the response is done. If it's not done after 50 seconds, kick it do polling instead. This avoids us having to have long timeouts in the load balancer etc. Async view means that this should not increase load on django too much (in fact, will probably reduce it as we'll be doing way fewer requests).
👉 Stay up-to-date with PostHog coding conventions for a smoother review.
Does this work well for both Cloud and self-hosted?
How did you test this code?