Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simple flakefinder implementation and job configuration #146

Closed
wants to merge 21 commits into from

Conversation

dhiller
Copy link
Contributor

@dhiller dhiller commented Jul 24, 2019

Note: Work originally started by rmohr

Closes #142

flakefinder

flakefinder does the following:

  1. Report creation
    1. Fetching all merged PRs within a given time period
    2. Getting the last commit of each PR which got merged
    3. Correlate the PR with all prowjobs which were running against this last commit
  2. creates/updates html document in GCS bucket /reports/flakefinder/
    1. flakefinder-$date.html - extract skipped/failed/success from the junit results and create a html table
    2. index.html

Output is published and visible here:
https://storage.googleapis.com/kubevirt-prow/reports/flakefinder/index.html

Selecting the right builds:

  • Filter out not merged PRs
  • Filter out identical prow jobs on multiple PRs (can be because of the merge pool)
  • Filter out jobs which don't have a junit result
  • Only shows test results for all lanes where a test at least failed once on one of the found lanes
  • Only take prow jobs into account which were run on the commit which got merged

job configuration

Job is currently configured to be run every week. Configuration was put in github/ci/prow/files/jobs/kubevirt/kubevirt-periodics.yaml

/robots/flakefinder/BUILD.bazel

The build file for flakefinder now conains steps to create and push a docker image to the registry

@kubevirt-bot kubevirt-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 24, 2019
@kubevirt-bot kubevirt-bot requested review from rmohr and slintes July 24, 2019 18:03
@dhiller
Copy link
Contributor Author

dhiller commented Jul 24, 2019

This first version is working already, although a lot of cleanup is needed.

@dhiller dhiller changed the title [wip] simple flakefinder implementation (hijacked from rmohr) [wip] simple flakefinder implementation Jul 25, 2019
@dhiller
Copy link
Contributor Author

dhiller commented Aug 15, 2019

I'm trying to get the bazel build working, running into compile errors that I do not understand:

compile: error running compiler: exit status 2
src/main/tools/linux-sandbox-pid1.cc:437: waitpid returned 2
src/main/tools/linux-sandbox-pid1.cc:457: child exited with code 1
src/main/tools/linux-sandbox.cc:204: child exited normally with exitcode 1
[1,014 / 1,028] 5 actions running
    GoCompile .../kubernetes/typed/apps/v1beta2/linux_amd64_stripped/go_default_library%/kubevirt.io/project-infra/vendor/k8s.io/client-go/kubernetes/typed/apps/v1beta2.a; 0s linux-sandbox
    GoCompile .../typed/batch/v2alpha1/linux_amd64_stripped/go_default_library%/kubevirt.io/project-infra/vendor/k8s.io/client-go/kubernetes/typed/batch/v2alpha1.a; 0s linux-sandbox
/home/dhiller/.cache/bazel/_bazel_dhiller/a0ca392453d5f464a8fd5a27e3e5fe84/sandbox/linux-sandbox/2029/execroot/__main__/vendor/google.golang.org/api/option/option.go:163:54: cannot use "kubevirt.io/project-infra/vendor/google.golang.org/api/internal".NewPoolResolver(int(w), o) (type *"kubevirt.io/project-infra/vendor/google.golang.org/api/internal".PoolResolver) as type "google.golang.org/grpc/naming".Resolver in argument to grpc.RoundRobin:
        *"kubevirt.io/project-infra/vendor/google.golang.org/api/internal".PoolResolver does not implement "google.golang.org/grpc/naming".Resolver (wrong type for Resolve method)
                have Resolve(string) ("kubevirt.io/project-infra/vendor/google.golang.org/grpc/naming".Watcher, error)
                want Resolve(string) ("google.golang.org/grpc/naming".Watcher, error)
[1,014 / 1,028] 5 actions running
    GoCompile .../kubernetes/typed/apps/v1beta2/linux_amd64_stripped/go_default_library%/kubevirt.io/project-infra/vendor/k8s.io/client-go/kubernetes/typed/apps/v1beta2.a; 0s linux-sandbox
    GoCompile .../typed/batch/v2alpha1/linux_amd64_stripped/go_default_library%/kubevirt.io/project-infra/vendor/k8s.io/client-go/kubernetes/typed/batch/v2alpha1.a; 0s linux-sandbox
Target //robots/flakefinder:flakefinder failed to build
ERROR: /home/dhiller/go/src/kubevirt.io/project-infra/robots/flakefinder/BUILD.bazel:49:1 GoCompile vendor/google.golang.org/api/option/linux_amd64_stripped/go_default_library%/kubevirt.io/project-infra/vendor/google.golang.org/api/option.a failed (Exit 1) linux-sandbox failed: error executing command
  (cd /home/dhiller/.cache/bazel/_bazel_dhiller/a0ca392453d5f464a8fd5a27e3e5fe84/sandbox/linux-sandbox/2029/execroot/__main__ && \
  exec env - \
    CGO_ENABLED=1 \
    GOARCH=amd64 \
    GOOS=linux \
    GOROOT=external/go_sdk \
    GOROOT_FINAL=GOROOT \
    PATH=/usr/bin:/bin \
    TMPDIR=/tmp \
  /home/dhiller/.cache/bazel/_bazel_dhiller/install/647aeaa1f1905cc4d2153f6a94607228/_embedded_binaries/linux-sandbox -t 15 -w /home/dhiller/.cache/bazel/_bazel_dhiller/a0ca392453d5f464a8fd5a27e3e5fe84/sandbox/linux-sandbox/2029/execroot/__main__ -w /tmp -w /dev/shm -D -- bazel-out/host/bin/external/go_sdk/builder compile -sdk external/go_sdk -installsuffix linux_amd64 -src vendor/google.golang.org/api/option/credentials_go19.go -src vendor/google.golang.org/api/option/credentials_notgo19.go -src vendor/google.golang.org/api/option/option.go -arc 'golang.org/x/oauth2=kubevirt.io/project-infra/vendor/golang.org/x/oauth2=bazel-out/k8-fastbuild/bin/vendor/golang.org/x/oauth2/linux_amd64_stripped/go_default_library%/kubevirt.io/project-infra/vendor/golang.org/x/oauth2.a=' -arc 'golang.org/x/oauth2/google=kubevirt.io/project-infra/vendor/golang.org/x/oauth2/google=bazel-out/k8-fastbuild/bin/vendor/golang.org/x/oauth2/google/linux_amd64_stripped/go_default_library%/kubevirt.io/project-infra/vendor/golang.org/x/oauth2/google.a=' -arc 'google.golang.org/api/internal=kubevirt.io/project-infra/vendor/google.golang.org/api/internal=bazel-out/k8-fastbuild/bin/vendor/google.golang.org/api/internal/linux_amd64_stripped/go_default_library%/kubevirt.io/project-infra/vendor/google.golang.org/api/internal.a=' -arc 'google.golang.org/grpc=google.golang.org/grpc=bazel-out/k8-fastbuild/bin/external/org_golang_google_grpc/linux_amd64_stripped/go_default_library%/google.golang.org/grpc.a=' -o bazel-out/k8-fastbuild/bin/vendor/google.golang.org/api/option/linux_amd64_stripped/go_default_library%/kubevirt.io/project-infra/vendor/google.golang.org/api/option.a -package_list bazel-out/host/bin/external/go_sdk/packages.txt -p kubevirt.io/project-infra/vendor/google.golang.org/api/option -- -trimpath .)
INFO: Elapsed time: 81.630s, Critical Path: 17.78s
INFO: 1002 processes: 1002 linux-sandbox.

Using go build is working fine locally.

@slintes can you help me understand what is the problem here?

github.com/onsi/ginkgo v1.7.0 // indirect
github.com/onsi/gomega v1.4.3 // indirect
github.com/sirupsen/logrus v1.4.2
google.golang.org/api v0.5.0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I have seen this issue before. The issue was related to the version mismatch.
Since we have switched to golang modules, it might happen.

I would propose to add

google.golang.org/grpc v1.19.0

here. And see whether it helps. It should help go mod to figure out the minimal version required.

Copy link

@petrkotas petrkotas Aug 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please make sure all bazel caches are clean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your help, @petrkotas!

I suppose I just should add the line in go.mod, right?

I understood from reading the go modules manual that you would normally do a go get google.golang.org/grpc@v1.19.0, but that produced a lot of other entries? However, will try out and tell how it goes. Thanks again!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure, maybe it has to go into replaces section, since this is not required directly by us, but should be overridden in our dependencies IIUC?

Copy link

@petrkotas petrkotas Aug 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK when you do not include the direct version, go mod will scan through the code dependencies, it will find there is grpc required, will select minimum usable version and will include it.

If you include the minimum version yourself, you are forcing go mod to use the one you require.

Yes just place it somewhere in between the commented lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried it, didn't help.

I got feedback from the bazel mailing list pointing me towards rules_go - Avoiding conflicts (proto related paragraph) and this related GitHub issue. Am still trying to understand and integrate this, could still need some help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for reference, the solution has been described here.

@@ -13,7 +13,7 @@ go_library(
deps = [
"//vendor/golang.org/x/oauth2:go_default_library",
"//vendor/golang.org/x/oauth2/google:go_default_library",
"//vendor/google.golang.org/grpc/naming:go_default_library",
"@org_golang_google_grpc//naming:go_default_library",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh, I don't think editing files in the vendor dir is a good idea. I expect that it will be overwritten sooner or later, e.g. when running go mod vendor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought so too but that was the only thing that helped.

IIUC the correct solution to this would be to make use of the go_proto_library rules? Do you have an idea on how to do this? I couldn't figure this out myself...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Actually the combination of

go mod vendor
bazel run //:gazelle

overrides the changes in vendor/google.golang.org/api/internal/BUILD.bazel again :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have no quick idea, will try to find some time to have look tomorrow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm currently reading up on gazelle dependency resolution, trying out the # gazelle resolve directive might be a way.

Copy link
Contributor Author

@dhiller dhiller Aug 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this fixed it. Will need to read up why gazelle added some crlf to the yaml.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed it's not gazelle, the file is retrieved like this when using go mod vendor, so leaving it like this for the moment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU this is a file that is needed for appveyor under Windows, AFAIR the crlf is default there, so I guess this is fine.

Copy link
Contributor Author

@dhiller dhiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:(

@dhiller dhiller force-pushed the flakefinder branch 2 times, most recently from b8c7958 to da43481 Compare August 21, 2019 10:28
@dhiller
Copy link
Contributor Author

dhiller commented Aug 21, 2019

/test periodic-publish-flakefinder-reports

1 similar comment
@dhiller
Copy link
Contributor Author

dhiller commented Aug 21, 2019

/test periodic-publish-flakefinder-reports

@dhiller
Copy link
Contributor Author

dhiller commented Aug 21, 2019

/retest

1 similar comment
@dhiller
Copy link
Contributor Author

dhiller commented Aug 21, 2019

/retest

@dhiller
Copy link
Contributor Author

dhiller commented Aug 21, 2019

/test periodic-publish-flakefinder-reports

2 similar comments
@dhiller
Copy link
Contributor Author

dhiller commented Aug 22, 2019

/test periodic-publish-flakefinder-reports

@dhiller
Copy link
Contributor Author

dhiller commented Aug 22, 2019

/test periodic-publish-flakefinder-reports

rmohr and others added 10 commits August 22, 2019 12:34
bazel go module support is not yet that great
Write report to kubevirt bucket under
/reports/flakefinder/flakefinder-$isodate.html, then create index.html
as wrapping page. This page contains the links to the last 50 reports.
NOTE: this is not yet working, `bazel build //robots/flakefinder`
yields a compile error that I need support with to understand and
fix.
- Pin image id
- Remove unnecessary build steps
- Add reasonable title attribute to html report files
- Add details to README
@dhiller
Copy link
Contributor Author

dhiller commented Aug 22, 2019

I just noticed that the complete set has not been implemented. This PR currently only supports creating weekly reports.

I think we should first go with the weekly report which is ready now, then I will work on creating the other ones.

Opinions?

/cc @davidvossel @cynepco3hahue @stu-gott

@dhiller
Copy link
Contributor Author

dhiller commented Aug 23, 2019

In regards to my comment above I opened #168 to further work on this.

</head>
<body>

<table>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you format that HTML? It's quite hard to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -0,0 +1,197 @@
/*
Copyright 2018 The Kubernetes Authors.
Copy link

@mfranczy mfranczy Aug 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Kubernetes Authors? Is this code a copy from some k8s repo? If not then it should be Kubevirt :)

EDIT: or This file is part of the KubeVirt project

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@mfranczy
Copy link

So far the PR looks good to me, I didn't finish the detailed review (I will finish it on Monday). Do you mind to add unit tests for flakefinder? I think it would be useful.

@dhiller
Copy link
Contributor Author

dhiller commented Aug 23, 2019

@mfranczy I totally agree that I should have written tests. I'm doing this in the new PR to extend the functionality as planned. Maybe we can live with this one it as it is now?

@mfranczy
Copy link

@dhiller I am okay with your plan.

Copy link

@mfranczy mfranczy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found only two small things, other than that looks good!


#### Run the job locally using [`phaino`](https://github.com/kubernetes/test-infra/tree/master/prow/cmd/phaino)

phaino --privileged /tmp/prowjob.yaml

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use generic examples, please replace /home/dhiller by $HOME

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, good point...

merged time.Duration
}

type client interface {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you format that interface that methods have the same indents number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah interesting, looks like a tabs vs. spaces issue. Will correct this.

- Fix formatting
- remove personal path
@mfranczy
Copy link

/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Aug 26, 2019
@dhiller
Copy link
Contributor Author

dhiller commented Aug 26, 2019

I just noticed that the complete set has not been implemented. This PR currently only supports creating weekly reports.

I think we should first go with the weekly report which is ready now, then I will work on creating the other ones.

FTR the further work has been done in #168 which is feature complete right now. I'm working on cleaning up right now.

@dhiller
Copy link
Contributor Author

dhiller commented Aug 27, 2019

Just noticed one issue: the 24h report is empty, will have a look.

@dhiller
Copy link
Contributor Author

dhiller commented Aug 27, 2019

Another one is that the monthly report has very few entries.

@dhiller
Copy link
Contributor Author

dhiller commented Aug 27, 2019

The reason for the 24h report not having any entries is that the junit xml flie can't be found.

@slintes
Copy link
Contributor

slintes commented Aug 29, 2019

superseded by #168

/close

@kubevirt-bot
Copy link
Contributor

@slintes: Closed this PR.

In response to this:

superseded by #168

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

gabrielecerami pushed a commit to gabrielecerami/project-infra that referenced this pull request Oct 7, 2020
* hack: add simple pre-flight check to cluster-up

It still happens nowdays that we run kubevirtci on wrongly provisioned
boxes with incorrect kvm setup.
This patch adds a simple pre-flight check script to catch the most
common issues, hopefully preventing more time wasted in debugging
cluster running on these wrong setups.

Signed-off-by: Francesco Romani <fromani@redhat.com>

* pre-flight check: address reviewer comments

Signed-off-by: Francesco Romani <fromani@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm Indicates that a PR is ready to be merged. size/XXL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants