Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm jobs broke on last commit #504

Closed
dfarrell07 opened this issue Apr 14, 2021 · 2 comments · Fixed by #519
Closed

Helm jobs broke on last commit #504

dfarrell07 opened this issue Apr 14, 2021 · 2 comments · Fixed by #519
Labels
bug Something isn't working

Comments

@dfarrell07
Copy link
Member

dfarrell07 commented Apr 14, 2021

What happened:

It seems like the most recently merged PR broke the Lighthouse+Helm jobs.

In the flake finder, the jobs from 5 days and before were all passing:

https://github.com/submariner-io/lighthouse/actions/workflows/flake_finder.yml

https://github.com/submariner-io/lighthouse/actions/runs/731556462

The jobs 4 days and more recently are all failing in the same way:

https://github.com/submariner-io/lighthouse/actions/runs/734764600

In the PR-triggered E2E, the PR before the one in question passed:

#501

The PR-triggered E2E on the PR in question failed, but the PR was merged:

#502

Reverting the PR fixes the Helm jobs:

dfarrell07#1

As compared to PRs with the same base run at about the same time, where the Helm jobs fail:

#503

There were no PRs merged to the Helm repo in the relevant timeframe.

From what I can see of the logs, the nginx connectivity tests pass and the first E2E test fails.

2021-04-09T10:07:13.5731683Z �[36m[e2e]$ go test -v -timeout 30m -args -ginkgo.v -ginkgo.randomizeAllSpecs -ginkgo.trace -submariner-namespace submariner-operator -dp-context cluster1 -dp-context cluster2 -dp-context cluster3 -ginkgo.reportPassed -test.timeout 15m -ginkgo.reportFile /go/src/github.com/submariner-io/lighthouse/output/e2e-junit.xml�[0m
2021-04-09T10:07:13.5746162Z �[36m[e2e]$ tee /go/src/github.com/submariner-io/lighthouse/output/e2e-tests.log�[0m
2021-04-09T10:07:13.5769569Z �[36m[e2e]$ generate_context_flags�[0m
2021-04-09T10:07:13.5783287Z �[36m[e2e]$ generate_context_flags�[0m
2021-04-09T10:07:13.5795957Z �[36m[e2e]$ [cluster1] printf  -dp-context cluster1�[0m
2021-04-09T10:07:13.5807342Z �[36m[e2e]$ [cluster2] printf  -dp-context cluster2�[0m
2021-04-09T10:07:13.5818055Z �[36m[e2e]$ [cluster3] printf  -dp-context cluster3�[0m
2021-04-09T10:08:31.3962901Z === RUN   TestE2E
2021-04-09T10:08:31.4031540Z Running Suite: Submariner E2E suite
2021-04-09T10:08:31.4037156Z ===================================
2021-04-09T10:08:31.4039199Z Random Seed: �[1m1617962911�[0m - Will randomize all specs
2021-04-09T10:08:31.4040041Z Will run �[1m15�[0m of �[1m15�[0m specs
2021-04-09T10:08:31.4040353Z 
2021-04-09T10:08:31.4061802Z �[1mSTEP�[0m: Creating kubernetes clients
2021-04-09T10:08:31.4745593Z �[1mSTEP�[0m: Creating lighthouse clients
2021-04-09T10:08:31.4938688Z �[0m[discovery] Test Service Discovery Across Clusters�[0m �[90mwhen a pod tries to resolve a service in a specific remote cluster by its cluster name�[0m 
2021-04-09T10:08:31.4940034Z   �[1mshould resolve the service on the specified cluster�[0m
2021-04-09T10:08:31.4941170Z   �[37m/go/src/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:75�[0m
2021-04-09T10:08:31.4942264Z �[1mSTEP�[0m: Creating namespace objects with basename "discovery"
2021-04-09T10:08:31.5035065Z �[1mSTEP�[0m: Generated namespace "e2e-tests-discovery-splzk" in cluster "cluster1" to execute the tests in
2021-04-09T10:08:31.5036467Z �[1mSTEP�[0m: Creating namespace "e2e-tests-discovery-splzk" in cluster "cluster2"
2021-04-09T10:08:31.5276311Z �[1mSTEP�[0m: Creating namespace "e2e-tests-discovery-splzk" in cluster "cluster3"
2021-04-09T10:08:31.6137826Z �[1mSTEP�[0m: Creating an Nginx Deployment on "cluster1"
2021-04-09T10:08:36.7363539Z �[1mSTEP�[0m: Creating a Nginx Service on "cluster1"
2021-04-09T10:08:36.7701456Z �[1mSTEP�[0m: Creating serviceExport nginx-demo.e2e-tests-discovery-splzk on "cluster1"
2021-04-09T10:08:36.8030588Z �[1mSTEP�[0m: Creating an Nginx Deployment on "cluster2"
2021-04-09T10:08:41.8156114Z �[1mSTEP�[0m: Creating a Nginx Service on "cluster2"
2021-04-09T10:08:41.8281696Z �[1mSTEP�[0m: Creating serviceExport nginx-demo.e2e-tests-discovery-splzk on "cluster2"
2021-04-09T10:08:41.8841811Z �[1mSTEP�[0m: Retrieving ServiceExport nginx-demo.e2e-tests-discovery-splzk on "cluster2"
2021-04-09T10:11:51.8995875Z �[1mSTEP�[0m: Deleting namespace "e2e-tests-discovery-splzk" on cluster "cluster1"
2021-04-09T10:11:51.9242669Z �[1mSTEP�[0m: Deleting namespace "e2e-tests-discovery-splzk" on cluster "cluster2"
2021-04-09T10:11:51.9307508Z �[1mSTEP�[0m: Deleting namespace "e2e-tests-discovery-splzk" on cluster "cluster3"
2021-04-09T10:11:51.9530563Z �[1mSTEP�[0m: Retrieving EndpointSlices for "" in ns "e2e-tests-discovery-splzk" on "cluster2"
2021-04-09T10:11:51.9589337Z �[1mSTEP�[0m: Retrieving EndpointSlices for "" in ns "e2e-tests-discovery-splzk" on "cluster1"
2021-04-09T10:11:51.9733184Z 
2021-04-09T10:11:51.9769178Z �[91m�[1m• Failure [200.479 seconds]�[0m
2021-04-09T10:11:51.9769861Z [discovery] Test Service Discovery Across Clusters
2021-04-09T10:11:51.9771580Z �[90m/go/src/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:40�[0m
2021-04-09T10:11:51.9772710Z   when a pod tries to resolve a service in a specific remote cluster by its cluster name
2021-04-09T10:11:51.9773922Z   �[90m/go/src/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:74�[0m
2021-04-09T10:11:51.9775008Z     �[91m�[1mshould resolve the service on the specified cluster [It]�[0m
2021-04-09T10:11:51.9776101Z     �[90m/go/src/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:75�[0m
2021-04-09T10:11:51.9776691Z 
2021-04-09T10:11:51.9777515Z     �[91mFailed to retrieve ServiceExport. No ServiceExportConditions
2021-04-09T10:11:51.9778253Z     Unexpected error:
2021-04-09T10:11:51.9778825Z         <*errors.errorString | 0xc00039c0f0>: {
2021-04-09T10:11:51.9779434Z             s: "timed out waiting for the condition",
2021-04-09T10:11:51.9779855Z         }
2021-04-09T10:11:51.9780281Z         timed out waiting for the condition
2021-04-09T10:11:51.9780889Z     occurred�[0m
2021-04-09T10:11:51.9781148Z 
2021-04-09T10:11:51.9783221Z     /go/src/github.com/submariner-io/lighthouse/vendor/github.com/submariner-io/shipyard/test/e2e/framework/framework.go:488
2021-04-09T10:11:51.9783986Z 
2021-04-09T10:11:51.9784612Z     �[91mFull Stack Trace�[0m
2021-04-09T10:11:51.9785966Z     github.com/submariner-io/shipyard/test/e2e/framework.AwaitUntil(0x1553d7c, 0x16, 0xc000521098, 0x15e3408, 0x0, 0xc00069e370)
2021-04-09T10:11:51.9788058Z     	/go/src/github.com/submariner-io/lighthouse/vendor/github.com/submariner-io/shipyard/test/e2e/framework/framework.go:488 +0x1c6
2021-04-09T10:11:51.9789970Z     github.com/submariner-io/lighthouse/test/e2e/framework.(*Framework).AwaitServiceExportedStatusCondition(0xc00011edc8, 0x1, 0xc0006a0740, 0xa, 0xc000695800, 0x19)
2021-04-09T10:11:51.9791806Z     	/go/src/github.com/submariner-io/lighthouse/test/e2e/framework/framework.go:128 +0x25e
2021-04-09T10:11:51.9793638Z     github.com/submariner-io/lighthouse/test/e2e/discovery.RunServiceDiscoveryClusterNameTest(0xc00011edc8)
2021-04-09T10:11:51.9795501Z     	/go/src/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:371 +0x490
2021-04-09T10:11:51.9796763Z     github.com/submariner-io/lighthouse/test/e2e/discovery.glob..func2.6.1()
2021-04-09T10:11:51.9798019Z     	/go/src/github.com/submariner-io/lighthouse/test/e2e/discovery/service_discovery.go:76 +0x2a
2021-04-09T10:11:51.9799223Z     github.com/submariner-io/shipyard/test/e2e.RunE2ETests(0xc000347980, 0xc8d328797b)
2021-04-09T10:11:51.9800493Z     	/go/src/github.com/submariner-io/lighthouse/vendor/github.com/submariner-io/shipyard/test/e2e/e2e.go:92 +0x125
2021-04-09T10:11:51.9801884Z     github.com/submariner-io/lighthouse/test/e2e.TestE2E(0xc000347980)
2021-04-09T10:11:51.9802976Z     	/go/src/github.com/submariner-io/lighthouse/test/e2e/e2e_test.go:26 +0x2b
2021-04-09T10:11:51.9803709Z     testing.tRunner(0xc000347980, 0x15e33e0)
2021-04-09T10:11:51.9804295Z     	/usr/lib/golang/src/testing/testing.go:1123 +0xef
2021-04-09T10:11:51.9804827Z     created by testing.(*T).Run
2021-04-09T10:11:51.9805369Z     	/usr/lib/golang/src/testing/testing.go:1168 +0x2b3

https://pastebin.com/5Tbf5Xb9

Environment:

Lighthouse CI

@dfarrell07 dfarrell07 added the bug Something isn't working label Apr 14, 2021
@dfarrell07 dfarrell07 linked a pull request Apr 15, 2021 that will close this issue
@dfarrell07
Copy link
Member Author

It seems this is actually exposing a deeper issue, as without this PR the Lighthouse jobs actually run with subctl, not Helm.

@tpantelis
Copy link
Contributor

The problem is that the helm jobs aren't deploying the LH components. Looking the helm install command executed by the jobs:

[lighthouse]$ [cluster2] helm --kube-context cluster2 install submariner-operator submariner-latest/submariner-operator --create-namespace --namespace submariner-operator ...
-set broker.globalnet=false --set submariner.serviceDiscovery=false --set submariner.cableDriver=libreswan --set submariner.clusterId=cluster2 --set submariner.clusterCidr=10.2.0.0/16 --set submariner.serviceCidr=100.2.0.0/16 --set submariner.globalCidr= --set serviceAccounts.globalnet.create=false --set serviceAccounts.lighthouseAgent.create=false --set serviceAccounts.lighthouseCoreDns.create=false ... --set submariner.serviceDiscovery=true,lighthouse.image.repository=localhost:5000/lighthouse-agent,lighthouse.image.tag=local,lighthouseCoredns.image.repository=localhost:5000/lighthouse-coredns,lighthouseCoredns.image.tag=local,serviceAccounts.lighthouse.create=true

we see that submariner.serviceDiscovery is first set to false then to true. Also the LH service account create flags are set to false (serviceAccounts.lighthouse.create is true but it's invalid). The problem is that the deploy_helm lib in shipyard uses ${service_discovery} parsed from the command line to set these params but the LH Makefile doesn't pass it. Instead it sets submariner.serviceDiscovery=true via --deploytool_submariner_args but it doesn't set the correct serviceAccounts.* flags. The Makefile should pass --service_discovery to the shipyard script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants