Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: WIP Cache parsing of static hogql queries #27778

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

robbie-c
Copy link
Member

WIP WIP WIP WIP WIP WIP

Problem

We have a lot of static query strings that we parse, which could be cached. Some of these queries strings take hundreds of ms to parse.

Changes

WIP WIP WIP WIP WIP WIP

This is a rough proof of concept, I just wanted to use a PR to demonstrate the approach and get feedback.

To use this in prod would involve writing some tests, and changing many more of our static queries to use it.

Does this work well for both Cloud and self-hosted?

Yes

How did you test this code?

WIP WIP WIP WIP WIP WIP

Copy link

sentry-io bot commented Jan 22, 2025

🔍 Existing Issues For Review

Your pull request is modifying functions with the following pre-existing issues:

📄 File: posthog/hogql/parser.py

Function Unhandled Issue
parse_expr SyntaxError: mismatched input 'distinct_id' expecting posthog.tasks.calculate_cohort.calculat...
Event Count: 2

Did you find this useful? React with a 👍 or 👎

@robbie-c robbie-c requested review from mariusandra and timgl January 22, 2025 13:49
@mariusandra
Copy link
Collaborator

This looks very magical

def is_constant_in_current_stack(value: str):
    """Determine if a value is a static string literal anywhere in the current stack."""
    for frame_info in inspect.stack():
        frame = frame_info.frame
        # Get all constants from the code object in the frame
        code_context = frame.f_code.co_consts
        if value in code_context:
            return True
    return False

... and like something that will break with every python version upgrade.

Overall, yes, I think some caching layer makes sense, though I'm always weary of caching anything that is not JSON-able. Perhaps an AST<->text (de)serialization layer is something to look into?

@robbie-c
Copy link
Member Author

robbie-c commented Jan 22, 2025

... and like something that will break with every python version upgrade.

It works at least as far back as 3.8, which is from 2020, that's the earliest version I could install on my arm macbook without finding an intel machine. ChatGPT reckons any 3.X but I wouldn't believe that without trying it.

I only use it if TEST is true anyway, so it wouldn't be the worst thing to just delete it if it ever broke.

The part the decides whether or not it's a static query to be cached is whether we call parse_X_static vs parse_X, the TEST-only check for staticness is just to keep us honest

@robbie-c
Copy link
Member Author

Perhaps an AST<->text (de)serialization layer is something to look into?

This makes sense to me, I reckon this is needed to get this mergeable.

@posthog-bot
Copy link
Contributor

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week. If you want to permanentely keep it open, use the waiting label.

@mariusandra
Copy link
Collaborator

mariusandra commented Jan 30, 2025

For what it's worth, as part of "HogQL in Hog" I had implemented some kind of Json->Hog deserialization: https://github.com/PostHog/posthog/pull/26084/files#diff-18c9a45fc6de807def345b67d214c2ad237f1677862ba06ea47f6f3696dcaa18R24

I never got around to verifying if this is rock solid. The serialization side is partially done in the Hog compiler: https://github.com/PostHog/posthog/pull/26084/files#diff-8390a5c579a0dc2c4c112c00db08c157475a005ed0c27f0b7f098018532c05e5R855-R871 - and then HogVM evaluates that and creates the objects that get sent to the deserializer.

I'll definitely need something like that there. Perhaps worth merging approaches?

@posthog-bot posthog-bot removed the stale label Jan 31, 2025
@mariusandra
Copy link
Collaborator

mariusandra commented Feb 3, 2025

The "hx_ast" deserializer is now merged.

The serializer is currently built into the Hog bytecode compiler, though can be easily extracted. All it does is convert a AST node (python dataclasses) into {__hx_ast: 'SelectQuery', ... other fields ...}

If you end up building some caching mechanism, you can probably reuse this.

@posthog-bot
Copy link
Contributor

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week. If you want to permanentely keep it open, use the waiting label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants