-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: WIP Cache parsing of static hogql queries #27778
base: master
Are you sure you want to change the base?
Conversation
🔍 Existing Issues For ReviewYour pull request is modifying functions with the following pre-existing issues: 📄 File: posthog/hogql/parser.py
Did you find this useful? React with a 👍 or 👎 |
This looks very magical def is_constant_in_current_stack(value: str):
"""Determine if a value is a static string literal anywhere in the current stack."""
for frame_info in inspect.stack():
frame = frame_info.frame
# Get all constants from the code object in the frame
code_context = frame.f_code.co_consts
if value in code_context:
return True
return False ... and like something that will break with every python version upgrade. Overall, yes, I think some caching layer makes sense, though I'm always weary of caching anything that is not JSON-able. Perhaps an AST<->text (de)serialization layer is something to look into? |
It works at least as far back as 3.8, which is from 2020, that's the earliest version I could install on my arm macbook without finding an intel machine. ChatGPT reckons any 3.X but I wouldn't believe that without trying it. I only use it if The part the decides whether or not it's a static query to be cached is whether we call parse_X_static vs parse_X, the |
This makes sense to me, I reckon this is needed to get this mergeable. |
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the |
For what it's worth, as part of "HogQL in Hog" I had implemented some kind of Json->Hog deserialization: https://github.com/PostHog/posthog/pull/26084/files#diff-18c9a45fc6de807def345b67d214c2ad237f1677862ba06ea47f6f3696dcaa18R24 I never got around to verifying if this is rock solid. The serialization side is partially done in the Hog compiler: https://github.com/PostHog/posthog/pull/26084/files#diff-8390a5c579a0dc2c4c112c00db08c157475a005ed0c27f0b7f098018532c05e5R855-R871 - and then HogVM evaluates that and creates the objects that get sent to the deserializer. I'll definitely need something like that there. Perhaps worth merging approaches? |
The "hx_ast" deserializer is now merged. The serializer is currently built into the Hog bytecode compiler, though can be easily extracted. All it does is convert a AST node (python dataclasses) into If you end up building some caching mechanism, you can probably reuse this. |
This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the |
WIP WIP WIP WIP WIP WIP
Problem
We have a lot of static query strings that we parse, which could be cached. Some of these queries strings take hundreds of ms to parse.
Changes
WIP WIP WIP WIP WIP WIP
This is a rough proof of concept, I just wanted to use a PR to demonstrate the approach and get feedback.
To use this in prod would involve writing some tests, and changing many more of our static queries to use it.
Does this work well for both Cloud and self-hosted?
Yes
How did you test this code?
WIP WIP WIP WIP WIP WIP