Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SubM operator, tests, e2e integration #151

Merged
merged 1 commit into from
Oct 8, 2019

Conversation

dfarrell07
Copy link
Member

Signed-off-by: Daniel Farrell dfarrell@redhat.com

@mangelajo
Copy link
Contributor

Something happened with CI:

[submariner]$ PACKAGES=*.go operators pkg
Incorrect formatting, please run goimports
[submariner]$ exit 1
[submariner]$ ./scripts/ci keep 1.14.2 false false helm
FATA[0216] exit status 1
Makefile:17: recipe for target 'ci' failed
make: *** [ci] Error 1
The command "make ci e2e status=keep" exited with 2.
Done. Your build exited with 1.

Copy link
Contributor

@mangelajo mangelajo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial comments, I will keep reviewing tomorrow.

@dfarrell07
Copy link
Member Author

Something happened with CI:

[submariner]$ PACKAGES=*.go operators pkg
Incorrect formatting, please run goimports
[submariner]$ exit 1
[submariner]$ ./scripts/ci keep 1.14.2 false false helm
FATA[0216] exit status 1
Makefile:17: recipe for target 'ci' failed
make: *** [ci] Error 1
The command "make ci e2e status=keep" exited with 2.
Done. Your build exited with 1.

It was an issue with formatting. I fixed those issues in the latest changes.

@dfarrell07
Copy link
Member Author

I am seeing other linting errors still, via the extra checks enabled in CI (I was just running e2e locally):

WARN [runner/megacheck] Can't run megacheck because of compilation errors in packages [github.com/submariner-io/submariner/operators/go]: operators/go/routeagent_controller.go:1: : found packages routeagent (routeagent_controller.go) and submariner (submariner_controller.go) in /go/src/github.com/submariner-io/submariner/operators/go and 16 more errors: run `golangci-lint run --no-config --disable-all -E typecheck` to see all errors 
panic: interface conversion: types.Type is nil, not *types.Struct

@dfarrell07
Copy link
Member Author

@manosnoam @skitt - Can you guys try the latest here and report what you get? I'm currently hitting some this unexpected error, trying to track down the problem:

$ kubectl logs submariner-pod -n submariner

<snip>
+ sysctl -w net.ipv4.conf.all.send_redirects=0
sysctl: setting key "net.ipv4.conf.all.send_redirects": Read-only file system

@skitt
Copy link
Member

skitt commented Sep 17, 2019

@manosnoam @skitt - Can you guys try the latest here and report what you get? I'm currently hitting some this unexpected error, trying to track down the problem:

$ kubectl logs submariner-pod -n submariner

<snip>
+ sysctl -w net.ipv4.conf.all.send_redirects=0
sysctl: setting key "net.ipv4.conf.all.send_redirects": Read-only file system

Same for me.

@skitt
Copy link
Member

skitt commented Sep 17, 2019

Did you change the container’s security context recently?

@dfarrell07
Copy link
Member Author

dfarrell07 commented Sep 17, 2019

Did you change the container’s security context recently?

No, I was trying to keep things mostly stable as we finalize everything this week.

I think it's something that changed under us when I rebased. If I make a new branch from 5cfe40d and then cherry-pick this commit on top (with a little which vs command -v change to account for container base image change), I don't see this error (my Engine pod deploys fine and all tests pass).

@skitt
Copy link
Member

skitt commented Sep 18, 2019

I think it's something that changed under us when I rebased. If I make a new branch from 5cfe40d and then cherry-pick this commit on top (with a little which vs command -v change to account for container base image change), I don't see this error (my Engine pod deploys fine and all tests pass).

Right, that’s very plausible. I’ll try bisecting.

@skitt
Copy link
Member

skitt commented Sep 18, 2019

I think it's something that changed under us when I rebased. If I make a new branch from 5cfe40d and then cherry-pick this commit on top (with a little which vs command -v change to account for container base image change), I don't see this error (my Engine pod deploys fine and all tests pass).

Right, that’s very plausible. I’ll try bisecting.

And for some unfathomable reason the Submariner pods now deploy correctly again...

@manosnoam
Copy link
Contributor

@manosnoam @skitt - Can you guys try the latest here and report what you get? I'm currently hitting some this unexpected error, trying to track down the problem:

For me it fails earlier on ./gen_subm_operator.sh:
go get k8s.io/kube-state-metrics/pkg/collector: no matching versions for query "latest"

Copy link
Contributor

@mkolesnik mkolesnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review

cp $controller_file_src $controller_file_dst

popd
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this function is very similar to the other one (add_subm_engine_to_operator) and thus it would be nicer to DRY and have just one function with parameters.
It would also be very useful for future operators (admiral, coastguard, lighthouse, etc)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good idea. I raised a Jira to track this: https://jira.coreos.com/browse/HYCLD-251

The general theme of a lot of answers to other review comments applies here - "yes, good idea, I'm just trying to keep it as simple as possible for now". There are definitely opportunities to make things more DRY and easy to extend for future controllers.

@dfarrell07 dfarrell07 force-pushed the operator branch 2 times, most recently from 10d0627 to a3a2da9 Compare September 18, 2019 20:31
@dfarrell07
Copy link
Member Author

I think it's something that changed under us when I rebased. If I make a new branch from 5cfe40d and then cherry-pick this commit on top (with a little which vs command -v change to account for container base image change), I don't see this error (my Engine pod deploys fine and all tests pass).

Right, that’s very plausible. I’ll try bisecting.

And for some unfathomable reason the Submariner pods now deploy correctly again...

Yeah, it's a strange one. I'm trying to bisect as well. So far I'm pretty sure I can consistently reproduce it when I base this commit on the latest from master, and that the same Operator code fully passes for me when it's based on 3b46ebd or b0fccd7.

@dfarrell07
Copy link
Member Author

@manosnoam @skitt - Can you guys try the latest here and report what you get? I'm currently hitting some this unexpected error, trying to track down the problem:

For me it fails earlier on ./gen_subm_operator.sh:
go get k8s.io/kube-state-metrics/pkg/collector: no matching versions for query "latest"

I think this was resolved on https://jira.coreos.com/browse/HYCLD-250.

@dfarrell07
Copy link
Member Author

I think it's something that changed under us when I rebased. If I make a new branch from 5cfe40d and then cherry-pick this commit on top (with a little which vs command -v change to account for container base image change), I don't see this error (my Engine pod deploys fine and all tests pass).

Right, that’s very plausible. I’ll try bisecting.

And for some unfathomable reason the Submariner pods now deploy correctly again...

Yeah, it's a strange one. I'm trying to bisect as well. So far I'm pretty sure I can consistently reproduce it when I base this commit on the latest from master, and that the same Operator code fully passes for me when it's based on 3b46ebd or b0fccd7.

@skitt I'm pretty confident I've bisected it to be between 5306204 (working) and 4b7def5 (failing). I've verified both of those "bookends" of the biset twice and seen the same results. It is possible that 4b7def5 is failing for a different reason however (and so that end of the search could be wrong) - it's not the same error we reported above on latest:

F0918 21:48:18.559787 1 main.go:99] Error running route controller: createIPTableChains returned error. Unable to create SUBMARINER-POSTROUTING chain in iptables: running [/usr/sbin/iptables -t nat -S]: exit status 125: chroot: canno _t change root directory to '/host': No such file or directory

I've also just completed a run with 28f1b1d that passed.

@mangelajo
Copy link
Contributor

Still not fully tested because I'm having issues with git.apache.org (dep from operator sdk..)

but consider this for the scripts directory:

$ cat scripts/gen-operator 
#!/bin/sh

mkdir -p /root/go/bin
export PATH=$PATH:/root/go/bin
export GOFLAGS=

kind delete cluster || true

RELEASE_VERSION=v0.10.0
curl -LO https://github.com/operator-framework/operator-sdk/releases/download/${RELEASE_VERSION}/operator-sdk-${RELEASE_VERSION}-x86_64-linux-gnu

chmod +x operator-sdk-${RELEASE_VERSION}-x86_64-linux-gnu 
mv operator-sdk-${RELEASE_VERSION}-x86_64-linux-gnu /root/go/bin/operator-sdk

cd operators/go
./gen_subm_operator.sh

That way we can just generate with
make gen-operator

Copy link
Contributor

@mkolesnik mkolesnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More food for thought (will continue reviewing on Sun)


clusterCidr_cluster2=10.245.0.0/16
clusterCidr_cluster3=10.246.0.0/16
serviceCidr_cluster2=100.95.0.0/16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here..


# These all need to end up in pod container/environment vars
sed -i "/spec:/a \ \ submariner_namespace: $subm_ns" $cr_file
if [[ $context = cluster2 ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get it from the "hashtable"

kubectl get submariner $deployment_name --namespace=$subm_ns -o jsonpath='{.spec.submariner_debug}' | grep $subm_debug
kubectl get submariner $deployment_name --namespace=$subm_ns -o jsonpath='{.spec.submariner_namespace}' | grep $subm_ns
kubectl get submariner $deployment_name --namespace=$subm_ns -o jsonpath='{.spec.submariner_natenabled}' | grep $natEnabled
if [[ $context = cluster2 ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get it from the "hashtable"

kubectl get pod $subm_engine_pod_name --namespace=$subm_ns -o jsonpath='{.spec.containers..command}' | grep submariner.sh
kubectl get pod $subm_engine_pod_name --namespace=$subm_ns -o jsonpath='{.spec.containers..env}'
kubectl get pod $subm_engine_pod_name --namespace=$subm_ns -o jsonpath='{.spec.containers..env}' | grep "name:SUBMARINER_NAMESPACE value:$subm_ns"
if [[ $context = cluster2 ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get it from the "hashtable"

kubectl exec -it $subm_engine_pod_name --namespace=$subm_ns -- env | grep "SUBMARINER_DEBUG=$subm_debug"
kubectl exec -it $subm_engine_pod_name --namespace=$subm_ns -- env | grep "BROKER_K8S_APISERVERTOKEN=$SUBMARINER_BROKER_TOKEN"
kubectl exec -it $subm_engine_pod_name --namespace=$subm_ns -- env | grep "BROKER_K8S_REMOTENAMESPACE=$SUBMARINER_BROKER_NS"
if [[ $context = cluster2 ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get it from the "hashtable"

@dfarrell07 dfarrell07 force-pushed the operator branch 2 times, most recently from 91e7bf7 to c05f538 Compare September 20, 2019 02:25
@dfarrell07
Copy link
Member Author

@skitt I haven't found a commit more recent than the one I'm currently basing the PR on (5306204) that totally passses. I've seen everything pass with this base three times. I see the Read-only file system failure on submariner-pod on commits from the HEAD of master back to e40d95c (0919214 and 9aad038 for example). There are 11 commits between that working 5306204 and failing e40d95c, all part of a larger change by @sridhargaddam. I see different results testing them, but I imagine that's because I'm consuming part of a larger change. On 79a29bf (the commit right after the known-working base now) I see everything deploy but the connection tests fail. On 4b7def5 and abe5b01 (2 more of the 11 between the known working/current-failure commits) I see cannot change root directory to '/host': No such file or directory on submariner-routeagent-pod.

Does it consistently pass for you with this (5306204) as the base? Have you seen this same operator code pass on any more recent commit as the base?

@sridhargaddam
Copy link
Member

Yes @dfarrell07 , the VxLAN support patch is a big change and it depends on couple of patches in submariner-charts (as mentioned in the commit message 79a29bf)
The error that you pointed out "cannot change root directory to '/host': No such file or directory", will be seen if the route-agent-ds.yaml file is not updated.
Basically, you will need this fix https://github.com/submariner-io/submariner-charts/pull/3/files

Can you please check if all the three dependent patches mentioned in the commit message (79a29bf) are in your build.

@skitt
Copy link
Member

skitt commented Sep 20, 2019

I get the impression the operator currently doesn’t handle running in a namespace other than the target namespace (submariner), does it? It would seem that this is a requirement for catalog-deployable operators. What currently happens is that the operator is deployed along with its service account, in a namespace managed by the catalog (marketplace, but it should be operators), and the service account and role bindings get created alongside it. Then when it tries to deploy pods in the target namespace, it fails...

@skitt
Copy link
Member

skitt commented Sep 20, 2019

Why doesn’t the operator include all the CRDs used in the end-to-end tests? Are they supposed to be deployed in some other way?

@dfarrell07
Copy link
Member Author

dfarrell07 commented Sep 20, 2019

I get the impression the operator currently doesn’t handle running in a namespace other than the target namespace (submariner), does it? It would seem that this is a requirement for catalog-deployable operators. What currently happens is that the operator is deployed along with its service account, in a namespace managed by the catalog (marketplace, but it should be operators), and the service account and role bindings get created alongside it. Then when it tries to deploy pods in the target namespace, it fails...

@skitt - I made some tweaks and I'm now able to deploy/pass all tests in any namespace (configured by the $subm_ns var in lib_operator_deploy_subm.sh). Details on the Jira I raised to track this: https://jira.coreos.com/browse/HYCLD-260

$ kubectl get pods --all-namespaces=true
NAMESPACE     NAME                                             READY     STATUS    RESTARTS   AGE
operators     submariner-operator-7bfbd96b5-9nmjt              1/1       Running   0          19m
operators     submariner-pod                                   1/1       Running   0          19m
operators     submariner-routeagent-pod                        1/1       Running   0          19m

@dfarrell07
Copy link
Member Author

Why doesn’t the operator include all the CRDs used in the end-to-end tests? Are they supposed to be deployed in some other way?

@skitt - Sorry, I might not understand what you mean - what CRDs are we talking about?

With latest on PR/151, I see these CRDs.

$ kubectl get crds
NAME                        CREATED AT
clusters.submariner.io      2019-09-20T19:06:29Z
endpoints.submariner.io     2019-09-20T19:06:25Z
routeagents.submariner.io   2019-09-20T19:06:31Z
submariners.submariner.io   2019-09-20T19:06:34Z

With the latest from master (0919214), I see these CRDs:

$ kubectl config use-context cluster2
Switched to context "cluster2".
$ kubectl get crds
NAME                      CREATED AT
clusters.submariner.io    2019-09-20T19:44:24Z
endpoints.submariner.io   2019-09-20T19:44:25Z
$ kubectl config use-context cluster1
Switched to context "cluster1".
$ kubectl get crds
NAME                      CREATED AT
clusters.submariner.io    2019-09-20T19:44:22Z
endpoints.submariner.io   2019-09-20T19:44:22Z

@skitt
Copy link
Member

skitt commented Sep 23, 2019

Why doesn’t the operator include all the CRDs used in the end-to-end tests? Are they supposed to be deployed in some other way?

@skitt - Sorry, I might not understand what you mean - what CRDs are we talking about?

Sorry I wasn’t very clear @dfarrell07! When I run the operator generator, I get two CRDs in the deploy directory, RouteAgent and Submariner; but the e2e deployment scripts handle Cluster and Endpoint too. If the operator needs those, then we need to declare them in the operator CSV and make them available when we build the operator package. There’s nothing difficult, I was only wondering if there’s was a reason the generator doesn’t produce them.

popd
}

function create_subm_endpoints_crd() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a nifty bash thing, it's a basic programming principle.
Since you're already using functions, at least try to minimize the code duplication if it's easy, it lends to better readability and maintainability.

clusters_crd_file=deploy/crds/submariner_clusters_crd.yaml

# TODO: Can/should we create this with Op-SDK?
cat <<EOF > $clusters_crd_file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't make sense to agree to disagree actually, I explained why it makes much more sense to keep outside the script than embedded inside it

@mkolesnik
Copy link
Contributor

FYI I'm working on getting e2e to run

@dfarrell07
Copy link
Member Author

@mangelajo I rebased this on master as we discussed. I verified (a number of times) that we consistently see only the new expected-failing unit tests I mentioned.

@mangelajo
Copy link
Contributor

mangelajo commented Oct 2, 2019 via email

Copy link
Contributor

@mkolesnik mkolesnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments

@dfarrell07
Copy link
Member Author

CI using the Operator is timing out while still correctly-running:

The job exceeded the maximum time limit for jobs, and has been terminated.

https://travis-ci.com/submariner-io/submariner/builds/130283937

@dfarrell07
Copy link
Member Author

CI using the Operator is timing out while still correctly-running:

The job exceeded the maximum time limit for jobs, and has been terminated.

https://travis-ci.com/submariner-io/submariner/builds/130283937

In the latest CI run the unit tests at the end were running/passing when the job timed out. That likely means things would pass, since all the Operator verifications have already passed.

https://travis-ci.com/submariner-io/submariner/builds/130431372

(have to scrollllll)

@dfarrell07
Copy link
Member Author

CI using the Operator is timing out while still correctly-running:

The job exceeded the maximum time limit for jobs, and has been terminated.

https://travis-ci.com/submariner-io/submariner/builds/130283937

In the latest CI run the unit tests at the end were running/passing when the job timed out. That likely means things would pass, since all the Operator verifications have already passed.

https://travis-ci.com/submariner-io/submariner/builds/130431372

(have to scrollllll)

It looks like the maximum Travis job time for public repos is 50m and that it can't be changed.

https://docs.travis-ci.com/user/customizing-the-build#build-timeouts

CI bat signal to signal @dimaunx

Copy link
Contributor

@mangelajo mangelajo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can merge this now, and eventually move this all to it's own repo, but it's good for an alpha release IMHO.

Signed-off-by: Daniel Farrell <dfarrell@redhat.com>
Signed-off-by: Miguel Angel Ajo Pelayo <majopela@redhat.com>
Signed-off-by: Stephen Kitt <skitt@redhat.com>
Signed-off-by: Mike Kolesnik <mkolesni@redhat.com>
Signed-off-by: Janki Chhatbar <jchhatba@redhat.com>
@tpantelis tpantelis merged commit ce49514 into submariner-io:master Oct 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants