Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

presubmits were not triggering for kubernetes/kubernetes #21090

Closed
BenTheElder opened this issue Mar 1, 2021 · 16 comments · Fixed by #21100
Closed

presubmits were not triggering for kubernetes/kubernetes #21090

BenTheElder opened this issue Mar 1, 2021 · 16 comments · Fixed by #21100
Assignees
Labels
area/prow Issues or PRs related to prow kind/bug Categorizes issue or PR as related to a bug. kind/oncall-hotlist Categorizes issue or PR as tracked by test-infra oncall.

Comments

@BenTheElder
Copy link
Member

What happened:

When a PR is pushed | opened in Kubernetes we're not seeing jobs trigger, just the automatic github statuses for required jobs like:

pull-kubernetes-conformance-kind-ga-only-parallel Expected — Waiting for status to be reported

If you comment /test all manually jobs are triggered and run as expected.

What you expected to happen:

Tests should start when PRs that do not need ok-to-test are opened / pushed

How to reproduce it (as minimally and precisely as possible):

Push to or open a PR in github.com/kubernetes/kubernetes

Please provide links to example occurrences, if any:

kubernetes/kubernetes#96968 (comment)

Anything else we need to know?:

Seems to be happening to all new PRs in this repo at least.
/area prow

@BenTheElder BenTheElder added the kind/bug Categorizes issue or PR as related to a bug. label Mar 1, 2021
@k8s-ci-robot k8s-ci-robot added the area/prow Issues or PRs related to prow label Mar 1, 2021
@spiffxp
Copy link
Member

spiffxp commented Mar 1, 2021

another example: kubernetes/kubernetes#99609

@BenTheElder
Copy link
Member Author

2021-03-01 12:30:58.351 PST
panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x90 pc=0x782120] goroutine 2773 [running]: regexp.(*Regexp).doExecute(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc0035a85d6, 0x6, 0x0, 0x0, ...) GOROOT/src/regexp/exec.go:527 +0x560 regexp.(*Regexp).doMatch(...) GOROOT/src/regexp/exec.go:514 regexp.(*Regexp).MatchString(...) GOROOT/src/regexp/regexp.go:525 k8s.io/test-infra/prow/plugins/blockade.compileApplicableBlockades(0xc0035a8630, 0xa, 0xc0035a8620, 0xa, 0xc0035a85d6, 0x6, 0xc00214d260, 0xc001968400, 0x8, 0x9, ...) prow/plugins/blockade/blockade.go:221 +0xb5f k8s.io/test-infra/prow/plugins/blockade.handle(0x7fa94aa1a628, 0xc0038dee10, 0xc00214d260, 0xc001968400, 0x8, 0x9, 0x22b8680, 0xc00214d500, 0x20b85f8, 0xc003d42fd8, ...) prow/plugins/blockade/blockade.go:172 +0x1d5 k8s.io/test-infra/prow/plugins/blockade.handlePullRequest(0x233d3c0, 0xc0038dee10, 0x231d1c0, 0xc002f88780, 0x23364c0, 0xc002f5c6e0, 0xc002f8e450, 0x22f9560, 0xc00000f0e8, 0xc002f88800, ...) prow/plugins/blockade/blockade.go:126 +0x105 k8s.io/test-infra/prow/hook.(*Server).handlePullRequestEvent.func1(0xc0015b15e0, 0xc00000e950, 0xc002ef2a00, 0xc00438b290, 0x8, 0x20b8608) prow/hook/events.go:202 +0x3c8 created by k8s.io/test-infra/prow/hook.(*Server).handlePullRequestEvent prow/hook/events.go:192 +0x612

@BenTheElder
Copy link
Member Author

we just had some PRs to blockade, looks like we introduced an NPE

@spiffxp
Copy link
Member

spiffxp commented Mar 1, 2021

https://github.com/organizations/kubernetes/settings/hooks/10485935 - hooks are being delivered

EDIT: sorry, this link probably isn't visible to most
Screen Shot 2021-03-01 at 3 33 44 PM

@BenTheElder
Copy link
Member Author

#21021 was pretty recent
started using it in #21082 15 hours ago

@BenTheElder BenTheElder added the kind/oncall-hotlist Categorizes issue or PR as tracked by test-infra oncall. label Mar 1, 2021
@spiffxp
Copy link
Member

spiffxp commented Mar 1, 2021

revert #21092

@spiffxp
Copy link
Member

spiffxp commented Mar 1, 2021

revert deployed

kubernetes/kubernetes#99609 (comment) a /retest worked on a stuck pr

@spiffxp
Copy link
Member

spiffxp commented Mar 1, 2021

#21093 - ben has a PR open to fix, but may not make it into today's autobump pr

@BenTheElder
Copy link
Member Author

I think we should probably take another pass over this plugin before enabling this feature again, since I still haven't had a chance to trace back how we got to the NPE fully, but #21093 fixes gating on nil at the callsite where we NPEd at least.

@alvaroaleman also had a suggestion around ensuring hook recovers panics from plugins.

@BenTheElder
Copy link
Member Author

#21098 for the latter

@spiffxp
Copy link
Member

spiffxp commented Mar 1, 2021

/retitle presubmits were not triggering for kubernetes/kubernetes

@k8s-ci-robot k8s-ci-robot changed the title presubmits are not triggering (or reporting at least?) on PR pushes in kubernetes presubmits were not triggering for kubernetes/kubernetes Mar 1, 2021
@spiffxp
Copy link
Member

spiffxp commented Mar 3, 2021

Pulling out of slack

tl;dr I think setup a log-based metric in stack driver, setup prometheus to ingest metrics exported by stackdriver, keep alerting in prow’s monitoring stack

@alvaroaleman do y'all have something like this (or anything really) setup to detect panics in prow components?

Think this be a followup issue but AFK

@alvaroaleman
Copy link
Member

@alvaroaleman do y'all have something like this (or anything really) setup to detect panics in prow components?

We don't have something specifically for panics, but we have a Slack alert for Prow pods crashlooping which I believe would have been triggered by this.

@chaodaiG
Copy link
Contributor

chaodaiG commented Mar 3, 2021

We don't have something specifically for panics, but we have a Slack alert for Prow pods crashlooping which I believe would have been triggered by this.

@alvaroaleman , can we have this upstreamed? Or can you share where the config is located? I'd be happy to do the leg work

@alvaroaleman
Copy link
Member

It's here @chaodaiG : https://github.com/openshift/release/blob/ac1b4f17255011592a2fb104d121668fd6b85ef5/clusters/app.ci/prow-monitoring/mixins/_prometheus/prow_alerts.libsonnet#L9

That alert is a fairly standard thing but requires kube-state-metrics to be set up: https://github.com/kubernetes/kube-state-metrics

@chaodaiG
Copy link
Contributor

Loop back here: the prometheus alert was set up in #21394

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/prow Issues or PRs related to prow kind/bug Categorizes issue or PR as related to a bug. kind/oncall-hotlist Categorizes issue or PR as tracked by test-infra oncall.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants