Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: shared webhooks #110

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
171 changes: 171 additions & 0 deletions 110-shared-webhooks/proposal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
# Summary

Decouple webhooks from individual resources, allowing Concourse to process a webhook payload and queue a `check` for the approriate resource(s) (and eventually [var sources]).

Resources backed by [prototypes] subscribe to webhooks automatically based on their `source` configuration.


# Motivation

Concourse's primary method of detecting new resource versions is through periodically running `check`. The default polling frequency for each resource is 1 minute.

Polling can lead to a lot of pressure on the other end of the `check` requests. This can lead to rate limiting, or worse, your CI/CD stack DDoSing your own internal services.

To reduce polling pressure it is typical to configure webhooks, by configuring a `webhook_token` on a resource and either setting a higher `check_every` or configuring `--resource-with-webhook-checking-interval` cluster-wide. This token value is then given to the service, as part of a per-resource URL.

This is a tedious operation and has severe limitations, as some external services (e.g. GitHub) have a limit on how many webhooks can be configured on a repo/org. This means you can only configure webhooks for up to 20 resources (minus whatever other hooks you have), which is a painful limitation when there's a common resource used in many pipelines.

Additionally, especially when operating a large Concourse deployment spanning many different teams, it's difficult to get users to configure webhooks since it requires per-resource configuration (both in and out of Concourse). It would be much simpler to define webhooks globally and have them automatically apply to matching resources.

It is also infeasible to use webhooks with dynamic groups of [instanced pipelines]. For instance, if you set an instanced pipeline for each GitHub pull request, you would need to somehow update your webhook configuration within GitHub to point to the correct resources.

Relevant issues/discussions:
* https://github.com/concourse/concourse/issues/2230
* https://github.com/concourse/concourse/discussions/6869#discussioncomment-949051


# Proposal

I propose adding a new webhook entity that can be managed via `fly`:

```sh
$ fly set-webhook --name github --type github --token abc123def456 (--team my-team | --global)
url: https://ci.concourse-ci.org/api/v1/teams/my-team/webhooks/github?token=abc123def456
```

...and a `webhooks` field to be added to the [prototype info response] that tells Concourse what webhook payload(s) should trigger a check for the resource. For instance, the resource:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schmurfy thanks for your comment - going to move the conversation to a thread so it's hopefully easier to follow.

although I am not familiar enough with the way prototypes works it seems overly complicated. This may not be the type of feedback you wanted but since prototypes are in development since I started using concourse (~ 1.5 years ago) I imagine that following through with this proposal means that the current webhook system is going to be there for quite a while

We definitely appreciate that the work on prototypes has been slow moving. We're only just starting to prioritize this prototypes work, so hopefully we'll be able to make some headway soon.

One nice thing about prototypes is how much flexibility they enable. For instance, this proposal just builds upon the existing prototype concepts in a way that would be fully backward compatible, so they provide the ability to add new behaviour as new use cases come up - meaning prototypes don't need to be blocked on this webhooks proposal.

There's also the possibility of implementing some form of shared webhooks without needing to wait for prototypes to be ready - see #110 (comment).

I am currently using concourse-webhook-broadcaster because there is not real way to use webhooks otherwise with dynamically generated pipelines (for pull-request, release, deployment).

This is an interesting point, and something I looked at for inspiration when writing the proposal. Having external tools that sit between Concourse and external services is an interesting idea, but ultimately, I think it'd be much nicer to have built-in support for webhooks without needing to manage yet another deployment (or deployments (and without needing to keep these tools in sync with changes to Concourse - just noticed that you're the author of sapcc/webhook-broadcaster#10)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda missed your answer xD
We currently use concourse-webhook-broadcaster not because we really want to but because in its current state concourse can't provide the feature, as soon as concourse supports it natively we will probably make the switch.
in the mean time it allows us to setup one webhook per repository for a decently large number of pipelines running concurrently now.


```yaml
resources:
- name: my-repo
type: git # note: this is the git prototype, not resource_type
source:
uri: git@github.com:concourse/concourse.git
branch: master
```

may produce the following `info` response:

```json
{
"interface_version": "1.0",
"icon": "mdi:github-circle",
"messages": ["check"],
"webhooks": [{
"type": "github",
"payload_filter": {
"repository": {"full_name": "concourse/concourse"},
"ref": "refs/heads/master"
}
}]
}
```

When a webhook comes in to the webhook endpoint, the endpoint handler will find all of the resources that satisfy the payload (and [webhook type](#webhook-types)) and queue a new check for each of them. Resources that have been checked via a webhook will have their [`check` frequency reduced](#reducing-check-frequency) significantly.

Webhooks can either be global or scoped to a team. Configuring a webhook globally allows Concourse operators to define a single webhook that can e.g. be configured as an organization-level GitHub webhook, meaning you can handle many repositories with a single webhook.

The webhook endpoint for a global webhook is `/api/v1/webhooks/some-webhook`. The webhook endpoint for a team scoped webhook would be `/api/v1/teams/some-team/webhooks/some-webhook`.


## Info Response

Prototypes can optionally specify a `webhooks` configuration in the [prototype info response] - an array of JSON objects containing:

* `type`: the [type](#webhook-types) of the webhook
* `payload_filter`: a JSON object used to filter the incoming webhook payloads. Webhook payloads are filtered by [containment] (i.e. a payload satisfies this checkable entity if it "contains" the `payload_filter`)

If a webhook payload comes in satisfying *any* of the `webhooks` (i.e. it is for a webhook of the same `type`, and the payload contains the `payload_filter`), a `check` will be created.

For instance, the following example filters on any open or close events for merge requests to the GitLab repository `git@example.com:mike/diaspora.git`:

```json
{
"interface_version": "1.0",
"icon": "mdi:github-circle",
"messages": ["check"],
"webhooks": [
{
"type": "gitlab",
"payload_filter": {
"object_kind": "merge_request",
"repository": {"url": "git@example.com:mike/diaspora.git"},
"object_attributes": {"action": "open"}
}
},
{
"type": "gitlab",
"payload_filter": {
"object_kind": "merge_request",
"repository": {"url": "git@example.com:mike/diaspora.git"},
"object_attributes": {"action": "close"}
}
}
]
}
```


## Webhook Types

Every webhook needs to be configured with a `type` that defines the external service that will be publishing to it. This serves two purposes:

1. Namespace webhooks to the type of external service, since just using the `payload_filter` may be ambiguous (if two external services use the same payload field names)
2. Support services that require the webhook endpoint to behave in a certain way. For instance, Slack requires the webhook endpoint to respond to a [`url_verification` request](https://api.slack.com/events/url_verification) when the webhook is first configured. Some external services may also provide special methods of proving that the webhook payload originates from a valid source (e.g. providing a special header that the endpoint must validate). Custom `types` may be implemented to support this behaviour.

Note that the implementation of these webhook `types` must be built in to Concourse. Different types *may* accept different configuration options in `fly set-webhook`

However, many external services don't require any special behaviour. To avoid requiring handling of every external service in Concourse, the `--type` field is just an arbitrary string. If the `type` is one of the explicitly implemented types (e.g. `slack` to support the `url_verification` request flow), the webhook endpoint will adopt that behaviour - otherwise, the type will solely be used to disambiguate webhook payloads.

When the type is not explicitly handled, Concourse will still need to validate the identity of the caller, but we can typically just use tokens as we do now (see [`resource.webhook_token`]).


## Reducing `check` Frequency

In order to reduce the polling pressure of resources that are backed by a webhook, we can keep track of when the last webhook check for a resource config completed.

When we try to queue a check in Lidar, we check if has been checked by a webhook. If so, we could set the default check interval to something much larger (e.g. `24h`). The intent of configuring a high interval, rather than disabling the polling entirely, is to be resilient to missed events.


# Open Questions

### Can we avoid needing to implement `types` within Concourse?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One idea here would be to simply register webhooks on a team- or Concourse-install level, and then have a webhook_ref field under resource to register that resource with the global webhook handler.

Pro: doesn't require waiting for prototypes
Con: not as "clean" as automatic registration with prototypes

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's what I was thinking originally. However, I feel the benefits of automatic registration are super valuable in large-scale deployments with many independent teams, where orchestrating a change to hundreds/thousands of pipelines may not be so easy.

That said, maybe we could support both automatic registration and manual registration via the pipeline config. That way, we can still see some benefits without needing to wait for prototype support

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, even configuring the webhook manually via the pipeline config may still require implementing some types within Concourse (at least with how I'm envisioning it) - e.g. a slack webhook still needs special logic to respond to the webhook challenge.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, obviously there are many benefits to a prototype-driven model.

Honestly, my perspective right now is clouded by a current pipeline that I'm working on where I have a templated Terraform configuration that is currently instanced 4x with a current need to scale it to 100x and plans to scale it to 400-500x and beyond within a year, with the configuration per instance plus the template stored in the same repository. I want to have a job per instance, so that individual instances can be triggered when the configuration for an individual instance changes, without triggering the rest of the instance jobs unnecessarily. This would require different resources per instance, which doesn't scale with webhooks (as I will max out the webhook count on the repository when I hit about 20 instances), so currently I have a single resource for all of the instances which triggers every instance job. So it's rather inefficient.

Copy link
Author

@aoldershaw aoldershaw Aug 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a cool use-case, and exactly the kind that this RFC is motivated by! Some clarifying questions:

  • Do all of these instances have configuration within the same repository?
  • Are you looking to use a single webhook on the repository for all of these instances?
  • Is each resource that's responding to these webhooks just tracking the config file for the single instance?

Assuming the answer to all of these questions is yes for the time being...

I was thinking it could look something like this:

resources:
- name: config
  type: git
  source:
    uri: git@github.com:my-org/my-repo.git
    paths: [config/abc.yml]
  check_every: 24h
  webhooks:
  - type: github
    filter:
      repository: {full_name: my-org/my-repo}
      commits:
      - modified: [config/abc.yml]

(I imagine a filter is important here, since otherwise every commit would trigger 400-500 checks)

There's some redundancy here that would be avoided with the prototypes approach, but this feels like a decent compromise. WDYT?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, yes, and no (but basically yes). Actually we were inspired by what https://github.com/concourse/governance did with keeping most of the data in YAML, so we have a layout like

data/
  templates/
    instance.yaml
    role.yaml
  roles/
    chef.yaml
    programmer.yaml
    teacher.yaml
  instance/
    foo.yaml
    bar.yaml
tf/
  locals.tf
  example.tf

Where, for instance, foo (as defined inside the YAML) might be a chef and a programmer and bar might be a programmer and a teacher. Terraform compiles what the different roles are supposed to mean in terms of specific Terraform resources, and creating a new instance is as simple as cp data/templates/instance.yaml data/instance/baz.yaml and then re-generating the pipeline (probably I will eventually move it to an instanced pipeline, especially if this PR succeeds), where a specific instance of Terraform is isolated by running e.g. terraform workspace select foo && terraform apply -var instance_id=foo, which allows Terraform to arbitrarily scale while staying performant by not keeping too many instances in the same state.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a crack at implementing the manual form of this RFC here: concourse/concourse#7445

Only supports team-scoped webhooks, but would be easy to extend to global. Curious to hear your thoughts, and whether you feel it would suit your needs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aoldershaw Nice! I needed to go back and forth a little bit to see how it would filter the webhook events with commits.modified (it's not inherently obvious, and would much benefit from being documented as an example) but it looks like it fits the bill. Can't wait for a release!


One thought here is to allow [prototypes] to define a webhook handler (rather than requiring each prototype to define its `webhooks` configuration in the [prototype info response]). Webhooks would still be linked to a type, but this type would reference a prototype (rather than an arbitrary string that is occasionally special cased). When a webhook payload comes in, we could invoke the webhook message on the linked prototype.

This has the benefit of being much more flexible and decoupling Concourse from the specific types of webhooks.

However, it comes at the cost of needing to run a container for each webhook payload, which counter-acts many of the benefits of reducing check frequency in the first place.

Given that there probably aren't *that* many external services that mandate special behaviour, I suspect implementing `types` within Concourse is the way to go.


### Is filtering based on webhook payload [containment] flexible enough?

Will we ever be in a situation where we need something more flexible than specifying a subset of the fields of the webhook payload?

Perhaps we may want to filter by something other than strict equality of the fields (e.g. substring search).

### Are there any use-cases for [`resource.webhook_token`] under this proposal?

### Is there a risk of `types` becoming fragmented?

Since `types` are arbitrary, is it likely that different prototypes would use different types for the same external service? This could make it difficult to define a single webhook to handle this service.


# Answered Questions


# New Implications

* A Concourse operator can reduce check frequency across the cluster by adding a single webhook (without requiring changes to any pipelines!)
* Webhooks for instanced pipelines are now much easier to implement, allowing proper webhook support with multi-branch workflows



[var sources]: https://github.com/concourse/rfcs/pull/39
[instanced pipelines]: https://github.com/concourse/rfcs/pull/34
[`resource.webhook_token`]: https://concourse-ci.org/resources.html#schema.resource.webhook_token
[prototypes]: https://github.com/concourse/rfcs/pull/37
[prototype info response]: https://github.com/concourse/rfcs/blob/master/037-prototypes/proposal.md#prototype-info
[containment]: https://www.postgresql.org/docs/9.6/datatype-json.html#JSON-CONTAINMENT