Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor exportEvents buffer #8573

Merged
merged 10 commits into from
Feb 16, 2022
Merged

Refactor exportEvents buffer #8573

merged 10 commits into from
Feb 16, 2022

Conversation

yakkomajuri
Copy link
Contributor

Changes

We've established the buffer implementation used in exportEvents (and provided to users via plugin-contrib) can be problematic. See #7755 & #8477.

So this PR refactors the buffer used by exportEvents in two key ways:

  1. Adds a promise manager layer (currently only used by the buffer, but later to be used across the board) that makes sure we don't go above n voided buffer promises at any given point, so that we hold back on adding too much load to worker event loops.
  2. Passes in the old buffer to flush rather than reading from state. The previous implementation voided the flush promise meaning when we got a message that went above the buffer limit by itself, we would actually trigger 2 flushes that could handle the same buffer.

This is part of the work for #8477 and the promise manager was scoped down to only target the internal exportEvents buffer for now, to be expanded later.

How did you test this code?

Added tests.

@@ -82,7 +82,8 @@ There's a multitude of settings you can use to control the plugin server. Use th
| CAPTURE_INTERNAL_METRICS | whether to capture internal metrics for posthog in posthog | `false` |
| PISCINA_USE_ATOMICS | corresponds to the piscina useAtomics config option (https://github.com/piscinajs/piscina#constructor-new-piscinaoptions) | `true` |
| PISCINA_ATOMICS_TIMEOUT | (advanced) corresponds to the length of time (in ms) a piscina worker should block for when looking for tasks - instances with high volumes (100+ events/sec) might benefit from setting this to a lower value | `5000` |
| HEALTHCHECK_MAX_STALE_SECONDS | 'maximum number of seconds the plugin server can go without ingesting events before the healthcheck fails' |
| HEALTHCHECK_MAX_STALE_SECONDS | 'maximum number of seconds the plugin server can go without ingesting events before the healthcheck fails' | `7200` |
| MAX_PENDING_PROMISES_PER_WORKER | (advanced) maximum number of promises that a worker can have running at once in the background. currently only targets the exportEvents buffer. | `100` |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently arbitrary. I'm improving metrics so we can maybe get to a better number.

Currently this is the equivalent of 10x TASKS_PER_WORKER, which is quite high

Copy link
Collaborator

@mariusandra mariusandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Just one suggestion regarding void inline, otherwise looks good to me.

plugin-server/src/worker/vm/promise-manager.ts Outdated Show resolved Hide resolved
@yakkomajuri
Copy link
Contributor Author

Oh, no Promise.any in Node 14.

Guess it's soon time to pick up the task of getting us over to 16. Or 17 if we're feeling edgy.


public async awaitPromisesIfNeeded(): Promise<void> {
while (this.pendingPromises.size > this.config.MAX_PENDING_PROMISES_PER_WORKER) {
await Promise.race(this.pendingPromises)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#8573 (comment)

I think Promise.race is actually even better suited here.

cc @mariusandra

@yakkomajuri yakkomajuri merged commit c7810cb into master Feb 16, 2022
@yakkomajuri yakkomajuri deleted the new-buffer branch February 16, 2022 15:07
EDsCODE added a commit that referenced this pull request Feb 16, 2022
* master: (48 commits)
  refactor update public jobs query (#8596)
  revert use atomics flag (#8575)
  fix max retries (#8647)
  Remove weekly email code (#8643)
  Set default value not value (#8644)
  Add ECS plugin server healthcheck (#8642)
  Query person_distinct_id2 not person_distinct_id (#8358)
  increase isNewPerson TTL to 4 hours (#8637)
  Refactor exportEvents buffer (#8573)
  Remove dead actions endpoint code (#8625)
  drop unused functions if instances have them (#8565)
  Disable browsable API outside of DEBUG (#8635)
  update dlq metrics order (#8636)
  Make self capture robust to errors (#8629)
  Delete old events-model adjacent code (#8623)
  Increase viewport size (#8632)
  Revert "Use insight short ID in `insightLogic` key also on the insight page (#8613)" (#8631)
  detect response error status of 0 as an error too (#8603)
  Allow disabling a plugin's logs (#8519)
  fix dlq logic for pagination (#8627)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants