Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ECS plugin server healthcheck #8642

Merged
merged 2 commits into from
Feb 16, 2022
Merged

Add ECS plugin server healthcheck #8642

merged 2 commits into from
Feb 16, 2022

Conversation

yakkomajuri
Copy link
Contributor

Changes

Unsure if I did this correctly but about to jump on a call and looking to get this in soon.

Essentially, one of our problems now is that ECS will see the new plugin server tasks as healthy too soon, when the VMs might not be ready to process events.

Adding this should keep us processing events for longer on new deploys and avoid the backpressure spikes we currently have.

Maybe @guidoiaquinti @fuziontech can see if I'm doing this correctly, but have a bunch of calls now

Copy link
Contributor

@guidoiaquinti guidoiaquinti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I don't think we have a good way to test this atm. Let's validate tasks are healthy once this goes out.

@yakkomajuri yakkomajuri enabled auto-merge (squash) February 16, 2022 15:15
@yakkomajuri yakkomajuri disabled auto-merge February 16, 2022 15:25
@yakkomajuri yakkomajuri enabled auto-merge (squash) February 16, 2022 15:30
@yakkomajuri yakkomajuri merged commit 0949680 into master Feb 16, 2022
@yakkomajuri yakkomajuri deleted the ecs-plugins-health branch February 16, 2022 15:37
EDsCODE added a commit that referenced this pull request Feb 16, 2022
* master: (48 commits)
  refactor update public jobs query (#8596)
  revert use atomics flag (#8575)
  fix max retries (#8647)
  Remove weekly email code (#8643)
  Set default value not value (#8644)
  Add ECS plugin server healthcheck (#8642)
  Query person_distinct_id2 not person_distinct_id (#8358)
  increase isNewPerson TTL to 4 hours (#8637)
  Refactor exportEvents buffer (#8573)
  Remove dead actions endpoint code (#8625)
  drop unused functions if instances have them (#8565)
  Disable browsable API outside of DEBUG (#8635)
  update dlq metrics order (#8636)
  Make self capture robust to errors (#8629)
  Delete old events-model adjacent code (#8623)
  Increase viewport size (#8632)
  Revert "Use insight short ID in `insightLogic` key also on the insight page (#8613)" (#8631)
  detect response error status of 0 as an error too (#8603)
  Allow disabling a plugin's logs (#8519)
  fix dlq logic for pagination (#8627)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants