Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64 CI #188

Closed
BenTheElder opened this issue Dec 19, 2018 · 56 comments
Closed

ARM64 CI #188

BenTheElder opened this issue Dec 19, 2018 · 56 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Milestone

Comments

@BenTheElder
Copy link
Member

per discussion in #kind slack, we should setup some CI with openlab to get kind on arm64 xref #166

@dims was able to get arm64 working, but we'll need some set this up to keep it working once that goes in, as the maintainers do not have access to suitable arm machines to test on otherwise.

/assign
/kind feature
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Dec 19, 2018
@BenTheElder BenTheElder added this to the 1.0 milestone Dec 19, 2018
@dixudx
Copy link
Member

dixudx commented Jan 11, 2019

@lubinsz You might help on this?

@BenTheElder
Copy link
Member Author

note that we will need to fix #166 first, however that is very doable. Dims previously made a quick patch that worked, but we haven't PRed anything yet.

@lubinsz
Copy link

lubinsz commented Jan 11, 2019

@BenTheElder @dixudx
I see.
At least, it contains a multi-arch image issue.
Let me apply an internal legal request for this project firstly ...

@dims
Copy link
Member

dims commented Jan 11, 2019

@lubinsz see my previous patch in #166 (comment)

@hh
Copy link

hh commented Feb 18, 2019

https://github.com/WorksOnArm/cluster/issues/154 gives us access to Packet hardware.
How would we like it configured?

  • Running k8s?
    This would allow us to specify this cluster to run kind jobs and fit into sig-testings usual approach to testing.

  • running docker only
    We would need to ssh in, run kind + test, + cleanup.

/cc @devaii

@BenTheElder
Copy link
Member Author

I think running docker / SSH only is the most well understood path currently, we can treat these similar to a node or cadvisor e2e job, and put credentials in Prow to access them.

Long term it might be interesting to be able to run prowjobs on these machines directly, but that will require more work to maintain the cluster and it will take figuring out how we want to handle distributing other credentials.

@hh
Copy link

hh commented Feb 18, 2019

When trying to kind build, we note that docker-ce and friends are not available directly from the same repos:

E: Version '18.06.*' for 'docker-ce' was not found
The command '/bin/sh -c curl -fsSL "https://download.docker.com/linux/$(. /etc/os-release; echo "$ID")/gpg" | apt-key add -     && apt-key fingerprint 0EBFCD88     && ARCH="${ARCH}" add-apt-repository         "deb [arch=${ARCH}] https://download.docker.com/linux/$(. /etc/os-release; echo "$ID") $(lsb_release -cs) stable"     && clean-install "docker-ce=${DOCKER_VERSION}"' returned a non-zero code: 100
ERRO[22:13:41] Docker build Failed! exit status 100         
Error: build failed: exit status 100

Probably need some debugging:

root@kind:~# kind create cluster
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.13.3) 🖼 
ERRO[22:14:59] machine-id-setup error: exit status 1        
 ✗ [control-plane] Creating node container 📦 
Error: failed to create cluster: machine-id-setup error: exit status 1

Version check:

root@kind:~# docker version
Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.1
 Git commit:        e68fc7a
 Built:             Fri Jan 25 14:35:17 2019
 OS/Arch:           linux/arm64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.1
  Git commit:       e68fc7a
  Built:            Thu Jan 24 10:49:48 2019
  OS/Arch:          linux/arm64
  Experimental:     false

@devaips
Copy link

devaips commented Feb 18, 2019

It looks like the issue is due to the ARCH variable in the base-image being hard-coded to AMD64.

See: https://github.com/kubernetes-sigs/kind/blob/master/images/base/Dockerfile#L29
It looks like the ARCH variable is used later for the CNI plugin tarball, as well.

I am working on a patch.

@BenTheElder
Copy link
Member Author

Yeah, there's a bunch of places marked TODO for handling this because I wasn't sure where / how to plumb it through, I think using runtime.GOARCH should be fine, Dims's previous patch is here: #188 (comment)

@dims
Copy link
Member

dims commented Feb 19, 2019

@BenTheElder unfortunately the paste with patch expired

@dims
Copy link
Member

dims commented Mar 12, 2019

we still need CI, #358 works well!

@dims
Copy link
Member

dims commented Apr 12, 2019

Thanks to @ZhengZhenyu and other awesome folks at OpenLab (https://github.com/theopenlab) We now have a functional KinD on ARM CI !!!

Please see:
http://status.openlabtesting.org/builds?job_name=kind-integration-test-arm64

@mrhillsman
Copy link

Now that we have the jobs running successfully for a few days we would like to know how the kind community would like to do the testgrid reporting. I believe there are two options:

  • push to openlab owned gcs bucket
  • push to kind owned gcs bucket

I believe the second option is possible but would require setting up a user/auth acct for openlab and of course the other would be for openlab to resolve; using the existing bucket we use for cloud-provider-openstack or setup a new one

I could be wrong but we are ready to get the reporting to the proper place so the community can work as expected on any issues surfaced.

@dims
Copy link
Member

dims commented Apr 16, 2019

cc @BenTheElder - please see the question from Melvin ^^

@BenTheElder
Copy link
Member Author

either works! see also https://github.com/kubernetes/test-infra/tree/master/testgrid/conformance, we can setup a GCS bucket if we don't want to use any existing ones.

@BenTheElder
Copy link
Member Author

[sorry for the huge delay, this slipped through my inbox :(]

@mrhillsman
Copy link

No problem @BenTheElder totally understand. Will create a new one just to keep things separated since that is possible

@BenTheElder
Copy link
Member Author

Re-Reading through this..
I created gs://k8s-conformance-kind-arm64-openlab if we need it, @mrhillsman @dims shoot me an email if we need that and I'll coordinate the service account credentials there 😅

@mrhillsman
Copy link

ack @BenTheElder
/cc @dims

@kiwik
Copy link

kiwik commented May 13, 2019

Hi @mrhillsman , @dims and @BenTheElder I add an issue in OpenLab side to trace this job theopenlab/openlab#257

@ZhengZhenyu
Copy link

@aojea Hmm, are you sure? 0.26 seems not ok + kind build node-image --base-image kindest/base:latest --type=bazel --kube-root=/home/zuul/src/k8s.io/kubernetes
2019-08-07 03:04:48.421646 | ubuntu-xenial-arm64 | Starting local Bazel server and connecting to it...
2019-08-07 03:04:58.516265 | ubuntu-xenial-arm64 | Loading:
2019-08-07 03:04:58.527968 | ubuntu-xenial-arm64 | Loading: 0 packages loaded
2019-08-07 03:04:59.535464 | ubuntu-xenial-arm64 | Loading: 0 packages loaded
2019-08-07 03:05:00.545164 | ubuntu-xenial-arm64 | Loading: 0 packages loaded
2019-08-07 03:05:02.538017 | ubuntu-xenial-arm64 | Loading: 0 packages loaded
2019-08-07 03:05:03.539170 | ubuntu-xenial-arm64 | Loading: 0 packages loaded
2019-08-07 03:05:04.539531 | ubuntu-xenial-arm64 | Loading: 0 packages loaded
2019-08-07 03:05:05.579609 | ubuntu-xenial-arm64 | Loading: 0 packages loaded
2019-08-07 03:05:06.849602 | ubuntu-xenial-arm64 | Loading: 0 packages loaded
2019-08-07 03:05:06.850068 | ubuntu-xenial-arm64 | currently loading: build ... (4 packages)
2019-08-07 03:05:07.866957 | ubuntu-xenial-arm64 | Analyzing: 4 targets (4 packages loaded, 0 targets configured)
2019-08-07 03:05:09.435783 | ubuntu-xenial-arm64 | Analyzing: 4 targets (17 packages loaded, 31 targets configured)
2019-08-07 03:05:11.236828 | ubuntu-xenial-arm64 | Analyzing: 4 targets (18 packages loaded, 31 targets configured)
2019-08-07 03:05:13.445792 | ubuntu-xenial-arm64 | Analyzing: 4 targets (18 packages loaded, 31 targets configured)
2019-08-07 03:05:15.845550 | ubuntu-xenial-arm64 | Analyzing: 4 targets (18 packages loaded, 31 targets configured)
2019-08-07 03:05:18.655987 | ubuntu-xenial-arm64 | Analyzing: 4 targets (18 packages loaded, 31 targets configured)
2019-08-07 03:05:28.225680 | ubuntu-xenial-arm64 | Analyzing: 4 targets (18 packages loaded, 31 targets configured)
2019-08-07 03:05:32.819114 | ubuntu-xenial-arm64 | Analyzing: 4 targets (176 packages loaded, 2572 targets configured)
2019-08-07 03:05:34.893333 | ubuntu-xenial-arm64 | INFO: SHA256 (https://codeload.github.com/golang/tools/zip/bf090417da8b6150dcfe96795325f5aa78fff718) = 11629171a39a1cb4d426760005be6f7cb9b4182e4cb2756b7f1c5c2b6ae869fe
2019-08-07 03:05:34.980855 | ubuntu-xenial-arm64 | DEBUG: Rule 'debian-iptables-arm64' indicated that a canonical reproducible form can be obtained by modifying arguments digest = "sha256:1a63fdd216fe7b84561d40ab1ebaa0daae1fc73e4232a6caffbd8353d9a14cea"
2019-08-07 03:05:35.093497 | ubuntu-xenial-arm64 | DEBUG: Rule 'debian-base-arm64' indicated that a canonical reproducible form can be obtained by modifying arguments digest = "sha256:17be039c7035bd0897d954c51914ad41cd7e2b0b7c170b3d89ed021833df2fb1"
2019-08-07 03:05:38.109812 | ubuntu-xenial-arm64 | Analyzing: 4 targets (825 packages loaded, 8718 targets configured)
2019-08-07 03:05:40.007761 | ubuntu-xenial-arm64 | DEBUG: Rule 'org_golang_x_tools' indicated that a canonical reproducible form can be obtained by modifying arguments sha256 = "11629171a39a1cb4d426760005be6f7cb9b4182e4cb2756b7f1c5c2b6ae869fe"
2019-08-07 03:05:44.975087 | ubuntu-xenial-arm64 | Analyzing: 4 targets (1619 packages loaded, 13612 targets configured)
2019-08-07 03:05:50.481029 | ubuntu-xenial-arm64 | ERROR: /home/zuul/src/k8s.io/kubernetes/staging/src/k8s.io/apimachinery/pkg/util/sets/BUILD:25:1: in _go_genrule rule //staging/src/k8s.io/apimachinery/pkg/util/sets:set-gen:
2019-08-07 03:05:50.481413 | ubuntu-xenial-arm64 | Traceback (most recent call last):
2019-08-07 03:05:50.482013 | ubuntu-xenial-arm64 | File "/home/zuul/src/k8s.io/kubernetes/staging/src/k8s.io/apimachinery/pkg/util/sets/BUILD", line 25
2019-08-07 03:05:50.482273 | ubuntu-xenial-arm64 | _go_genrule(name = 'set-gen')
2019-08-07 03:05:50.483085 | ubuntu-xenial-arm64 | File "/root/.cache/bazel/_bazel_root/e96b51b7d54f19bc30c74f61e708a712/external/io_k8s_repo_infra/defs/go.bzl", line 37, in _go_genrule_impl
2019-08-07 03:05:50.483316 | ubuntu-xenial-arm64 | all_srcs += dep.files
2019-08-07 03:05:50.484437 | ubuntu-xenial-arm64 | + operator on a depset is forbidden. See https://docs.bazel.build/versions/master/skylark/depsets.html for recommendations. Use --incompatible_depset_union=false to temporarily disable this check.
2019-08-07 03:05:50.782848 | ubuntu-xenial-arm64 | ERROR: Analysis of target '//build:docker-artifacts' failed; build aborted: Analysis of target '//staging/src/k8s.io/apimachinery/pkg/util/sets:set-gen' failed; build aborted
2019-08-07 03:05:50.828674 | ubuntu-xenial-arm64 | INFO: Elapsed time: 62.451s
2019-08-07 03:05:50.829031 | ubuntu-xenial-arm64 | INFO: 0 processes.
2019-08-07 03:05:50.836660 | ubuntu-xenial-arm64 | FAILED: Build did NOT complete successfully (1979 packages loaded, 17048 targets configured)
2019-08-07 03:05:50.845262 | ubuntu-xenial-arm64 | FAILED: Build did NOT complete successfully (1979 packages loaded, 17048 targets configured)
2019-08-07 03:05:50.857241 | ubuntu-xenial-arm64 | time="03:05:50" level=error msg="Failed to build Kubernetes: exit status 1"
2019-08-07 03:05:50.857738 | ubuntu-xenial-arm64 | Error: error building node image: failed to build kubernetes: exit status 1

@ZhengZhenyu
Copy link

I'm building again for 0.24

@ZhengZhenyu
Copy link

@aojea Hi, sorry for the delay, I've tried serveral versions, and finally rolled back to 0.23.2 and manually tested(no log will be updated), you should be able to see the results after next periodic run.

@aojea
Copy link
Contributor

aojea commented Aug 8, 2019

@ZhengZhenyu you did it, now is building the cluster.
However, seems that the testgrid config has changed and I can´t find the dashboard to check the errors, will try to check tomorrow

@aojea
Copy link
Contributor

aojea commented Aug 13, 2019

@ZhengZhenyu the e2e tests are running but is failing to upload the results because the script seems to need python > 3.6 but the node has python 3.5

2019-08-12 20:36:02.654884 | ubuntu-xenial-arm64 |   File "/usr/lib/python3.5/subprocess.py", line 693, in run
2019-08-12 20:36:02.655245 | ubuntu-xenial-arm64 |     with Popen(*popenargs, **kwargs) as process:
2019-08-12 20:36:02.655688 | ubuntu-xenial-arm64 | TypeError: __init__() got an unexpected keyword argument 'encoding'

The encoding argument is not present in python 3.5

Changed in version 3.6: Added encoding and errors parameters

Is it possible to use python > 3.6?

@ZhengZhenyu
Copy link

@aojea sure, I will try

@aojea
Copy link
Contributor

aojea commented Aug 13, 2019

we hit another problem , seems the account is not longer valid

2019-08-13 20:35:25.925443 | ubuntu-xenial-arm64 | WARNING: [kind-arm64-openlab-logs@k8s-federated-conformance.iam.gserviceaccount.com] appears to be a service account. Service account tokens cannot be revoked, but they will expire automatically. To prevent use of the service account token earlier than the expiration, revoke the parent service account or service account key.
2019-08-13 20:35:25.930826 | ubuntu-xenial-arm64 | Revoked credentials:
2019-08-13 20:35:25.931313 | ubuntu-xenial-arm64 |  - kind-arm64-openlab-logs@k8s-federated-conformance.iam.gserviceaccount.com
2019-08-13 20:35:26.083693 | ubuntu-xenial-arm64 | Traceback (most recent call last):
2019-08-13 20:35:26.084662 | ubuntu-xenial-arm64 |   File "upload_e2e.py", line 328, in <module>
2019-08-13 20:35:26.085125 | ubuntu-xenial-arm64 |     main(sys.argv[1:])
2019-08-13 20:35:26.085801 | ubuntu-xenial-arm64 |   File "upload_e2e.py", line 318, in main
2019-08-13 20:35:26.086858 | ubuntu-xenial-arm64 |     upload_string(gcs_dir+'/started.json', started_json, args.dry_run)
2019-08-13 20:35:26.087320 | ubuntu-xenial-arm64 |   File "upload_e2e.py", line 175, in upload_string
2019-08-13 20:35:26.087583 | ubuntu-xenial-arm64 |     proc.communicate(input=text)
2019-08-13 20:35:26.088014 | ubuntu-xenial-arm64 |   File "/usr/lib/python3.6/subprocess.py", line 848, in communicate
2019-08-13 20:35:26.088260 | ubuntu-xenial-arm64 |     self._stdin_write(input)
2019-08-13 20:35:26.088730 | ubuntu-xenial-arm64 |   File "/usr/lib/python3.6/subprocess.py", line 801, in _stdin_write
2019-08-13 20:35:26.088972 | ubuntu-xenial-arm64 |     self.stdin.write(input)
2019-08-13 20:35:26.089336 | ubuntu-xenial-arm64 | TypeError: a bytes-like object is required, not 'str'
2019-08-13 20:35:26.089645 | ubuntu-xenial-arm64 | Run: ['gcloud', 'auth', 'revoke']

@aojea
Copy link
Contributor

aojea commented Aug 14, 2019

@dims @ZhengZhenyu do you have an idea on what can be the problem with the service account? ^^

These are the logs https://logs.openlabtesting.org/logs/periodic-6/18/github.com/kubernetes-sigs/kind/master/kind-integration-test-arm64/230250b/

@ZhengZhenyu
Copy link

@aojea Hi, sorry for the delay, we also had the similar problem in cloud-provider-openstack job, and my colleague checked yesterday, it turns out it is a wrong use of subprocess, the stdin should probably be deleted.

@ZhengZhenyu
Copy link

@aojea Hi, seems the job successed once in 8.20 and I can see both results from testgrid and openlab:
https://logs.openlabtesting.org/logs/periodic-6/18/github.com/kubernetes-sigs/kind/master/kind-integration-test-arm64/4631b06/
https://k8s-testgrid.appspot.com/conformance-kind#kind,%20v1.14%20(dev,%20ARM64)
And the tests seems actually running for the first time

But then the job starts to fail again.

@ZhengZhenyu
Copy link

@aojea Hmm, seems there is something wrong setting up the env again and the tests did not run and thus nothing can be uploaded.

@aojea
Copy link
Contributor

aojea commented Aug 26, 2019

@ZhengZhenyu the e2e.sh script changed at that time but don't know if one of those changes broke the openlab CI. Seems that the containers for the kubernetes components are not able to spawn, i.e the kubelet fails and there are no logs for the kubeapi-server , ....

97b044c#diff-d9fa0450190d60ba133fb92282a94725

I've sent a PR to try to align the CI job with the new changed on the e2e.sh, and we can iterate from m there.

theopenlab/openlab-zuul-jobs#625

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 24, 2019
@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 24, 2019
@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

This was referenced Feb 21, 2020
@BenTheElder BenTheElder removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 24, 2021
@BenTheElder
Copy link
Member Author

KIND supports arm64 out of the box now, but getting CI for every possible configuration is not super maintainable for us. We rely on upstream Kubernetes working on ARM through whatever ARM supporters want to do, and then kind avoids doing things architecture specific, sticking to portable languages and tools.

@kubernetes-sigs kubernetes-sigs deleted a comment from fejta-bot Jun 24, 2021
@kubernetes-sigs kubernetes-sigs deleted a comment from fejta-bot Jun 24, 2021
@kubernetes-sigs kubernetes-sigs deleted a comment from fejta-bot Jun 24, 2021
yankay pushed a commit to yankay/kind that referenced this issue Mar 17, 2022
hack: README - add steps for single E2E test
stg-0 added a commit to stg-0/kind that referenced this issue Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
Development

No branches or pull requests