perf: WIP Cache parsing of static hogql queries #27778

robbie-c · 2025-01-22T13:49:44Z

WIP WIP WIP WIP WIP WIP

Problem

We have a lot of static query strings that we parse, which could be cached. Some of these queries strings take hundreds of ms to parse.

Changes

WIP WIP WIP WIP WIP WIP

This is a rough proof of concept, I just wanted to use a PR to demonstrate the approach and get feedback.

To use this in prod would involve writing some tests, and changing many more of our static queries to use it.

Does this work well for both Cloud and self-hosted?

Yes

How did you test this code?

WIP WIP WIP WIP WIP WIP

sentry-io · 2025-01-22T13:49:54Z

🔍 Existing Issues For Review

Your pull request is modifying functions with the following pre-existing issues:

📄 File: posthog/hogql/parser.py

Function	Unhandled Issue
`parse_expr`	SyntaxError: mismatched input 'distinct_id' expecting posthog.tasks.calculate_cohort.calculat... `Event Count:` 2

_{Did you find this useful? React with a 👍 or 👎}

mariusandra · 2025-01-22T13:57:30Z

This looks very magical

def is_constant_in_current_stack(value: str):
    """Determine if a value is a static string literal anywhere in the current stack."""
    for frame_info in inspect.stack():
        frame = frame_info.frame
        # Get all constants from the code object in the frame
        code_context = frame.f_code.co_consts
        if value in code_context:
            return True
    return False

... and like something that will break with every python version upgrade.

Overall, yes, I think some caching layer makes sense, though I'm always weary of caching anything that is not JSON-able. Perhaps an AST<->text (de)serialization layer is something to look into?

robbie-c · 2025-01-22T14:46:26Z

... and like something that will break with every python version upgrade.

It works at least as far back as 3.8, which is from 2020, that's the earliest version I could install on my arm macbook without finding an intel machine. ChatGPT reckons any 3.X but I wouldn't believe that without trying it.

I only use it if TEST is true anyway, so it wouldn't be the worst thing to just delete it if it ever broke.

The part the decides whether or not it's a static query to be cached is whether we call parse_X_static vs parse_X, the TEST-only check for staticness is just to keep us honest

robbie-c · 2025-01-22T15:24:13Z

Perhaps an AST<->text (de)serialization layer is something to look into?

This makes sense to me, I reckon this is needed to get this mergeable.

posthog-bot · 2025-01-30T07:30:59Z

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week. If you want to permanentely keep it open, use the waiting label.

mariusandra · 2025-01-30T09:30:39Z

For what it's worth, as part of "HogQL in Hog" I had implemented some kind of Json->Hog deserialization: https://github.com/PostHog/posthog/pull/26084/files#diff-18c9a45fc6de807def345b67d214c2ad237f1677862ba06ea47f6f3696dcaa18R24

I never got around to verifying if this is rock solid. The serialization side is partially done in the Hog compiler: https://github.com/PostHog/posthog/pull/26084/files#diff-8390a5c579a0dc2c4c112c00db08c157475a005ed0c27f0b7f098018532c05e5R855-R871 - and then HogVM evaluates that and creates the objects that get sent to the deserializer.

I'll definitely need something like that there. Perhaps worth merging approaches?

mariusandra · 2025-02-03T11:25:33Z

The "hx_ast" deserializer is now merged.

The serializer is currently built into the Hog bytecode compiler, though can be easily extracted. All it does is convert a AST node (python dataclasses) into {__hx_ast: 'SelectQuery', ... other fields ...}

If you end up building some caching mechanism, you can probably reuse this.

posthog-bot · 2025-02-14T07:31:09Z

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week. If you want to permanentely keep it open, use the waiting label.

Cache parsing of static hogql queries

cafad73

robbie-c requested review from mariusandra and timgl January 22, 2025 13:49

robbie-c added 2 commits January 22, 2025 13:55

Comments

ed844eb

Revert env file change

ed4e5de

Remove unused

7369b24

robbie-c mentioned this pull request Jan 22, 2025

Sprint - Jan 27 - Feb 7 #27540

Closed

posthog-bot added the stale label Jan 30, 2025

posthog-bot removed the stale label Jan 31, 2025

posthog-bot added the stale label Feb 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: WIP Cache parsing of static hogql queries #27778

perf: WIP Cache parsing of static hogql queries #27778

robbie-c commented Jan 22, 2025

sentry-io bot commented Jan 22, 2025

mariusandra commented Jan 22, 2025

robbie-c commented Jan 22, 2025 •

edited

Loading

robbie-c commented Jan 22, 2025

posthog-bot commented Jan 30, 2025

mariusandra commented Jan 30, 2025 •

edited

Loading

mariusandra commented Feb 3, 2025 •

edited

Loading

posthog-bot commented Feb 14, 2025

perf: WIP Cache parsing of static hogql queries #27778

Are you sure you want to change the base?

perf: WIP Cache parsing of static hogql queries #27778

Conversation

robbie-c commented Jan 22, 2025

Problem

Changes

Does this work well for both Cloud and self-hosted?

How did you test this code?

sentry-io bot commented Jan 22, 2025

🔍 Existing Issues For Review

mariusandra commented Jan 22, 2025

robbie-c commented Jan 22, 2025 • edited Loading

robbie-c commented Jan 22, 2025

posthog-bot commented Jan 30, 2025

mariusandra commented Jan 30, 2025 • edited Loading

mariusandra commented Feb 3, 2025 • edited Loading

posthog-bot commented Feb 14, 2025

robbie-c commented Jan 22, 2025 •

edited

Loading

mariusandra commented Jan 30, 2025 •

edited

Loading

mariusandra commented Feb 3, 2025 •

edited

Loading