Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add wait-timeout flag to start command and refactor util/kubernetes #5121

Merged
merged 7 commits into from
Aug 20, 2019

Conversation

medyagh
Copy link
Member

@medyagh medyagh commented Aug 18, 2019

  • Removes more than a few unused functions such as :

    • NewPodStore, StartPods, WaitForPodDelete, WaitForEvent , WaitForServiceEndpointsNum, VersionedExtraOption
  • Moves funcs out of pkg/utils into pgk/kube.

  • Moves ExtraOptions type and its funcs from pkg/util to pkg/minikube/config package. and Rename func .ContainsString to .ContainsParam

  • Choose better timeout based on kubernetes consts

  • Added logs for how long it took for each component and k8s-app to come up ( to be used later to fine-tune our default waiting time) which will appear in the logs like :

     kube.go:103] duration metric: took 1m17.514019465s to wait for component=etcd ...
     kube.go:103] duration metric: took 7.289066ms to wait for component=kube-scheduler ...
    
  • Added a new flag for start cmd wait-timeout that specifies max wait per component.

  • Reduced the default wait per component from 5 minutes to 3 minutes for end users

  • Increase default wait per component from 5 minutes to 13 minutes for parallel integration tests.

  • Parameterized integration tests to accept wait-timeout

  • Fixed the test setup for givsor test which was put after the e2e test run. moved up the script.

Closes #5122
and Hopefully reduces some test flakes due to timeout


topics I like to know the reviewer's opinion on :

  • name of the package, "kube" vs "kubernetes"
  • path of package, current: "k8s.io/minikube/pkg/kube" vs "k8s.io/minikube/pkg/util/kube"

Golang Parallel Test logging gotcha! :

I was hoping to see the duration metric logs in the tests !

	elapsed := time.Since(start)
       glog.Infof("duration metric: took %s to wait for %s ...", elapsed, label)

but I found out sometimes it doesn't log at all and sometimes logs at most 2 sets of them.

That makes me believe golang only outputs the not PAUSED Tests and the paused tests (which their VM still are running and our wait for running func is still working on them) will not output to the logs. I created an issue in golang to track this : golang/go#33706

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: medyagh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 18, 2019
@medyagh medyagh changed the title refactor util/kuberentes and ExtraOptions Refactor util/kuberentes and ExtraOptions Aug 18, 2019
@medyagh medyagh requested a review from tstromberg August 18, 2019 08:44
@medyagh medyagh changed the title Refactor util/kuberentes and ExtraOptions Refactor util/kuberentes and ExtraOptions and add wait-timeout flag to start cmd Aug 18, 2019
@medyagh medyagh changed the title Refactor util/kuberentes and ExtraOptions and add wait-timeout flag to start cmd Add wait-timeout flag to start cmd and refactor util/kuberentes and ExtraOptions and Aug 18, 2019
@medyagh medyagh changed the title Add wait-timeout flag to start cmd and refactor util/kuberentes and ExtraOptions and Add wait-timeout flag to start command and refactor util/kubernetes into a package Aug 18, 2019
@medyagh
Copy link
Member Author

medyagh commented Aug 18, 2019

the flaky test is related to virtualbox corrupt issue #5083
also I created an issue for making waitCluster smarter after : #5125

@medyagh medyagh changed the title Add wait-timeout flag to start command and refactor util/kubernetes into a package Add wait-timeout flag to start command and refactor util/kubernetes Aug 18, 2019
Copy link
Contributor

@tstromberg tstromberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking on the much needed refactor.

@@ -241,6 +241,11 @@ export MINIKUBE_HOME="${TEST_HOME}/.minikube"
export MINIKUBE_WANTREPORTERRORPROMPT=False
export KUBECONFIG="${TEST_HOME}/kubeconfig"

# Build the gvisor image. This will be copied into minikube and loaded by ctr.
# Used by TestContainerd for Gvisor Test.
docker build -t gcr.io/k8s-minikube/gvisor-addon:latest -f testdata/gvisor-addon-Dockerfile out
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added by Priya to fix the gvisor test ( to be added before the integration tests) but it was added after the minikube clean up ! so it was not being used by the test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this only moved the command up the script

@@ -148,6 +150,7 @@ func initMinikubeFlags() {
startCmd.Flags().String(networkPlugin, "", "The name of the network plugin.")
startCmd.Flags().Bool(enableDefaultCNI, false, "Enable the default CNI plugin (/etc/cni/net.d/k8s.conf). Used in conjunction with \"--network-plugin=cni\".")
startCmd.Flags().Bool(waitUntilHealthy, true, "Wait until Kubernetes core services are healthy before exiting.")
startCmd.Flags().Duration(waitTimeout, 3*time.Minute, "max time to wait for Kubernetes core services to be healthy.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This default seems quite short for certain environments: Previously, it was 5-minutes per pod. 5 minutes overall perhaps?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should had added Per Kubernetes service... ( this is not for all) and it used to be 5 min per componenent

pkg/kube/kube.go Outdated Show resolved Hide resolved
pkg/kube/kube.go Outdated Show resolved Hide resolved
pkg/kube/kube.go Outdated Show resolved Hide resolved
pkg/kube/kube.go Outdated Show resolved Hide resolved
pkg/kube/kube.go Outdated Show resolved Hide resolved
pkg/minikube/bootstrapper/kubeadm/versions.go Outdated Show resolved Hide resolved
pkg/minikube/bootstrapper/kubeadm/versions.go Show resolved Hide resolved
@@ -32,7 +32,7 @@ func TestMain(m *testing.M) {
os.Exit(m.Run())
}

var startTimeout = flag.Int("timeout", 25, "number of minutes to wait for minikube start")
var startTimeout = flag.Duration("timeout", 25*time.Minute, "max duration to wait for a full minikube start")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8x the default we give to users is crazy. I can understand 2x, but more than that I feel like we are making ourselves reliable at the expense of users.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the confusion, the --wait-time out is Per component not for all start.

before this PR the total wait was 5 x Per component
this PR reduces it to 3 min for end user (per component)

so it is not too much more than what we do for the end user. (if we count 5* per component")

but I intend to have another clean up PR in the integration tests ,to get rid of all kind of Retrying Start in parallel. once fixed all the flakes ( certs, and corruptions...)

@medyagh
Copy link
Member Author

medyagh commented Aug 19, 2019

@tstromberg I believe I solved all the comments

@medyagh
Copy link
Member Author

medyagh commented Aug 20, 2019

/retest this please

@medyagh medyagh merged commit c3cfedf into kubernetes:master Aug 20, 2019
@medyagh medyagh deleted the refactor_util_kube branch August 20, 2019 05:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make wait for start time duration configurable
3 participants