Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected "Error creating: Unauthorized" #8972

Closed
aran opened this issue Jul 26, 2023 · 11 comments · Fixed by #9015
Closed

Unexpected "Error creating: Unauthorized" #8972

aran opened this issue Jul 26, 2023 · 11 comments · Fixed by #9015
Assignees
Labels
area/status-check priority/p1 High impact feature/bug.

Comments

@aran
Copy link
Contributor

aran commented Jul 26, 2023

Expected behavior

skaffold dev -p dev reliably deploys a helm-based skaffold deploy to kubernetes running in Colima.

Actual behavior

I’m seeing a skaffold dev fail with “Failed to create Pod for Deployment <…>: Error creating: Unauthorized”
However with --keep-running-on-failures and --tolerate-failures-until-deadline, I am able to see with kubectl get deployment -A that the deployment has in fact succeeded. Running with --verbosity='trace' isn’t showing any other clues that look obvious.

Information

  • Skaffold version:& {Version:v2.6.2 ConfigVersion:skaffold/v4beta6 GitVersion: GitCommit:0d544d26766058c5a49ba80907edf438a01c90d4 BuildDate:2023-07-18T19:56:24Z GoVersion:go1.20.6 Compiler:gc Platform:darwin/arm64 User:}
  • Operating system: macOS
  • Installed via: Homebrew
  • Contents of skaffold.yaml:
---
"apiVersion": "skaffold/v4beta5"
"build":
  "artifacts":
  - "bazel":
      "target": "//foundation/cloudnative-pg:cloudnative-pg.tar"
    "image": "cloudnative-pg"
  - "bazel":
      "target": "//foundation/cloudnative-pg:busybox.tar"
    "image": "busybox"
  "tagPolicy":
    "sha256": {}
"kind": "Config"
"metadata":
  "name": "cloudnative-pg"
"profiles":
- "deploy":
    "helm":
      "releases":
      - "createNamespace": true
        "name": "cloudnative-pg-release"
        "namespace": "cnpg-system"
        "remoteChart": "cloudnative-pg"
        "repo": "https://cloudnative-pg.io/charts/"
        "setValueTemplates":
          "image.repository": "{{.IMAGE_REPO_cloudnative_pg}}"
          "image.tag": "{{.IMAGE_TAG_cloudnative_pg}}@{{.IMAGE_DIGEST_cloudnative_pg}}"
          "test.image.repository": "{{.IMAGE_REPO_busybox}}"
          "test.image.tag": "{{.IMAGE_TAG_busybox}}@{{.IMAGE_DIGEST_busybox}}"
        "setValues":
          "image.pullPolicy": "IfNotPresent"
          "serviceAccount.create": "true"
          "serviceAccount.name": "cloudnative-pg-release"
          "test.image.pullPolicy": "IfNotPresent"
  "name": "dev"
- "deploy":
    "helm":
      "releases":
      - "createNamespace": false
        "name": "cloudnative-pg-release"
        "namespace": "cnpg-system"
        "remoteChart": "cloudnative-pg"
        "repo": "https://cloudnative-pg.io/charts/"
        "setValueTemplates":
          "image.repository": "{{.IMAGE_REPO_cloudnative_pg}}"
          "image.tag": "{{.IMAGE_TAG_cloudnative_pg}}@{{.IMAGE_DIGEST_cloudnative_pg}}"
          "test.image.repository": "{{.IMAGE_REPO_busybox}}"
          "test.image.tag": "{{.IMAGE_TAG_busybox}}@{{.IMAGE_DIGEST_busybox}}"
        "setValues":
          "image.pullPolicy": "IfNotPresent"
          "serviceAccount.create": "false"
          "serviceAccount.name": "cloudnative-pg-release"
          "test.image.pullPolicy": "IfNotPresent"
  "manifests":
    "rawYaml":
    - "namespace.yaml"
    - "sa.yaml"
  "name": "prod"
...

Steps to reproduce the behavior

skaffold dev -p dev in a folder with the above skaffold.yaml file.

On another mac, with same skaffold, colima and kubernetes versions, the same issue does not occur.

Arguably there's a logging enhancement implicit in this as it is hard to diagnose what is occurring from the existing logs.

skaffold_debug2.txt

@ericzzzzzzz ericzzzzzzz added the needs-reproduction needs reproduction from the maintainers to validate the issue is truly a skaffold bug label Jul 26, 2023
@ocni-dtu
Copy link

ocni-dtu commented Jul 28, 2023

I'm experiencing the same error.
The skaffold deployment fails, but if I let the deployment/pods run after skaffold's failure the pod successfully starts.

@davidedmondsMPG
Copy link

I'm getting similar installing Agones via Helm, in both the Render (subsequently deployed with kubectl with --server-side flag) and Deploy phases:

(anonymized)

apiVersion: skaffold/v4beta1
kind: Config
metadata:
  name: service

build:
  artifacts:
    - image: service
      ko:
        main: ./cmd/service

manifests:
  kustomize:
    paths:
      - # two paths for kustomization resources
  helm:
    releases:
      - name: agones
        namespace: agones-system
        createNamespace: true
        remoteChart: agones
        repo: https://agones.dev/chart/stable
        version: 1.33.0
        setValues:
          agones.ping.install: false

deploy:
  kubectl:
    flags:
      apply:
        - --server-side
        - --force-conflicts

portForward:
  - # three deployment port forwards

Log output (anonymised):

- priorityclass.scheduling.k8s.io/agones-system serverside-applied
# snip
Waiting for deployments to stabilize...
# snip
 - agones-system:deployment/agones-extensions: Failed to create Pod for Deployment agones-extensions: Error creating: pods "agones-extensions-5cd94689cb-" is forbidden: no PriorityClass with name agones-system was found
 - agones-system:deployment/agones-extensions failed. Error: Failed to create Pod for Deployment agones-extensions: Error creating: pods "agones-extensions-5cd94689cb-" is forbidden: no PriorityClass with name agones-system was found.
 - agones-system:deployment/agones-controller: Failed to create Pod for Deployment agones-controller: Error creating: pods "agones-controller-6d55775b94-" is forbidden: no PriorityClass with name agones-system was found
 - agones-system:deployment/agones-controller failed. Error: Failed to create Pod for Deployment agones-controller: Error creating: pods "agones-controller-6d55775b94-" is forbidden: no PriorityClass with name agones-system was found.

Running the Helm chart outside of skaffold results in working pods - I don't see them going into a failure state when watching in Kubectl, but they definitely go through that state in the event log:

0s          Normal    NoPods                   poddisruptionbudget/agones-controller-pdb   No matching pods found
0s          Normal    NoPods                   poddisruptionbudget/agones-extensions-pdb   No matching pods found
0s          Normal    NoPods                   poddisruptionbudget/agones-controller-pdb   No matching pods found
0s          Normal    NoPods                   poddisruptionbudget/agones-extensions-pdb   No matching pods found
0s          Normal    EnsuringLoadBalancer     service/agones-allocator                    Ensuring load balancer
0s          Normal    ScalingReplicaSet        deployment/agones-extensions                Scaled up replica set agones-extensions-69d554d97 to 2
0s          Normal    AppliedDaemonSet         service/agones-allocator                    Applied LoadBalancer DaemonSet kube-system/svclb-agones-allocator-ca3aa78a
0s          Normal    ScalingReplicaSet        deployment/agones-controller                Scaled up replica set agones-controller-bf5d58f54 to 2
0s          Warning   FailedCreate             replicaset/agones-controller-bf5d58f54      Error creating: pods "agones-controller-bf5d58f54-" is forbidden: no PriorityClass with name agones-system was found
0s          Normal    ScalingReplicaSet        deployment/agones-allocator                 Scaled up replica set agones-allocator-88cffb566 to 3
0s          Warning   FailedCreate             replicaset/agones-extensions-69d554d97      Error creating: pods "agones-extensions-69d554d97-" is forbidden: no PriorityClass with name agones-system was found
0s          Normal    Scheduled                pod/agones-allocator-88cffb566-lrcdw        Successfully assigned agones-system/agones-allocator-88cffb566-lrcdw to colima
1s          Normal    SuccessfulCreate         replicaset/agones-allocator-88cffb566       Created pod: agones-allocator-88cffb566-lrcdw
0s          Normal    Scheduled                pod/agones-extensions-69d554d97-jhm9w       Successfully assigned agones-system/agones-extensions-69d554d97-jhm9w to colima
0s          Normal    Scheduled                pod/agones-allocator-88cffb566-5cgkm        Successfully assigned agones-system/agones-allocator-88cffb566-5cgkm to colima
0s          Normal    Scheduled                pod/agones-allocator-88cffb566-krjht        Successfully assigned agones-system/agones-allocator-88cffb566-krjht to colima
0s          Normal    Scheduled                pod/agones-controller-bf5d58f54-6g75g       Successfully assigned agones-system/agones-controller-bf5d58f54-6g75g to colima
1s          Normal    SuccessfulCreate         replicaset/agones-allocator-88cffb566       Created pod: agones-allocator-88cffb566-krjht
0s          Normal    Scheduled                pod/agones-controller-bf5d58f54-snpj2       Successfully assigned agones-system/agones-controller-bf5d58f54-snpj2 to colima
0s          Normal    Scheduled                pod/agones-extensions-69d554d97-n4kgf       Successfully assigned agones-system/agones-extensions-69d554d97-n4kgf to colima
1s          Normal    SuccessfulCreate         replicaset/agones-allocator-88cffb566       Created pod: agones-allocator-88cffb566-5cgkm
1s          Normal    SuccessfulCreate         replicaset/agones-extensions-69d554d97      Created pod: agones-extensions-69d554d97-jhm9w
1s          Normal    SuccessfulCreate         replicaset/agones-controller-bf5d58f54      Created pod: agones-controller-bf5d58f54-6g75g
1s          Normal    SuccessfulCreate         replicaset/agones-controller-bf5d58f54      Created pod: agones-controller-bf5d58f54-snpj2
1s          Normal    SuccessfulCreate         replicaset/agones-extensions-69d554d97      Created pod: agones-extensions-69d554d97-n4kgf
0s          Warning   FailedMount              pod/agones-allocator-88cffb566-lrcdw        MountVolume.SetUp failed for volume "client-ca" : failed to sync secret cache: timed out waiting for the condition
0s          Normal    Pulled                   pod/agones-extensions-69d554d97-jhm9w       Container image "us-docker.pkg.dev/agones-images/release/agones-extensions:1.33.0" already present on machine
0s          Normal    Created                  pod/agones-extensions-69d554d97-jhm9w       Created container agones-extensions
0s          Normal    Pulled                   pod/agones-controller-bf5d58f54-snpj2       Container image "us-docker.pkg.dev/agones-images/release/agones-controller:1.33.0" already present on machine
0s          Normal    Created                  pod/agones-controller-bf5d58f54-snpj2       Created container agones-controller
0s          Normal    Started                  pod/agones-extensions-69d554d97-jhm9w       Started container agones-extensions
0s          Normal    Started                  pod/agones-controller-bf5d58f54-snpj2       Started container agones-controller
0s          Normal    Pulled                   pod/agones-allocator-88cffb566-5cgkm        Container image "us-docker.pkg.dev/agones-images/release/agones-allocator:1.33.0" already present on machine
0s          Normal    Pulled                   pod/agones-controller-bf5d58f54-6g75g       Container image "us-docker.pkg.dev/agones-images/release/agones-controller:1.33.0" already present on machine
0s          Normal    Created                  pod/agones-allocator-88cffb566-5cgkm        Created container agones-allocator
0s          Normal    Pulled                   pod/agones-allocator-88cffb566-krjht        Container image "us-docker.pkg.dev/agones-images/release/agones-allocator:1.33.0" already present on machine
0s          Normal    Created                  pod/agones-controller-bf5d58f54-6g75g       Created container agones-controller
0s          Normal    UpdatedLoadBalancer      service/agones-allocator                    Updated LoadBalancer with new IPs: [] -> [192.168.106.2]
0s          Normal    Pulled                   pod/agones-allocator-88cffb566-lrcdw        Container image "us-docker.pkg.dev/agones-images/release/agones-allocator:1.33.0" already present on machine
0s          Normal    Created                  pod/agones-allocator-88cffb566-krjht        Created container agones-allocator
1s          Normal    Created                  pod/agones-allocator-88cffb566-lrcdw        Created container agones-allocator
1s          Normal    Pulled                   pod/agones-extensions-69d554d97-n4kgf       Container image "us-docker.pkg.dev/agones-images/release/agones-extensions:1.33.0" already present on machine
1s          Normal    Created                  pod/agones-extensions-69d554d97-n4kgf       Created container agones-extensions
0s          Normal    Started                  pod/agones-allocator-88cffb566-5cgkm        Started container agones-allocator
0s          Normal    Started                  pod/agones-allocator-88cffb566-krjht        Started container agones-allocator
0s          Normal    Started                  pod/agones-controller-bf5d58f54-6g75g       Started container agones-controller
0s          Normal    Started                  pod/agones-extensions-69d554d97-n4kgf       Started container agones-extensions
0s          Normal    Started                  pod/agones-allocator-88cffb566-lrcdw        Started container agones-allocator

@aran
Copy link
Contributor Author

aran commented Jul 28, 2023

Maybe this should be split into two issues (or, kept as one issue, but with two distinct parts)

1 - (More actionable today) Add to the logging around pod/deployment failure detection, especially for helm.
2 - (Perhaps blocked on that) Actually repro and fix the underlying issue that is causing skaffold to have seemingly-incorrect behavior.

@ericzzzzzzz - what do you think? If you want me to split it up, happy to do that.

@ericzzzzzzz
Copy link
Contributor

Hi @davidedmondsMPG, sorry for the late reply.

The skaffold command fails at status check stage, that part logic is the same across deployers. Skaffold checks k8s event for failedCreate event, if this type event exists, that part logic will return an error. The error message "ailed to create Pod for Deployment agones-extensions: Error creating: pods "agones-extensions-5cd94689cb-" is forbidden: no PriorityClass with name agones-system was found", shows that PriorityClass was not found when creating the pod, then later we saw a success creation, it seemed that the corresponding pritoityClass resource available at a later time. For some failures, skaffold is not able to determine if they will be automatically fixed at a later time. We've seen some cases like that, for example, a cluster doesn't have enough resources to schedule deployments, the cluster can auto scale to provision another node, and the deployment will succeed eventually. It is also possible that the cluster has hit its cap, it will not scale up more nodes and the deployment will never succeed. That's why we introduced --tolerate-failures-until-deadline to let users determine how to handle cases like this.

@ericzzzzzzz
Copy link
Contributor

@aran similar to davidedmondsMPG's case, your k8s resource may rely on something that is created when installing the release, that's why you also need to use --tolerate-failures-until-deadline flag

@aran
Copy link
Contributor Author

aran commented Jul 31, 2023

@ericzzzzzzz we did run with --tolerate-failures-until-deadline and skaffold did not seem to see the successful deployment later

@pipaliyachirag
Copy link

any follow up?

@BenjaminBenetti
Copy link

I'm also suffering from this issue. Passing --tolerate-failures-until-deadline does prevent skaffold from throwing an error however it also will never recognize the successful deployment, preventing file sync from kicking in. My setup is pretty simple and I've been using similar configurations successfully for a while. I'm running on a freshly created local cluster.

skaffold: v2.6.2
helm: v3.11
k8s: v1.27.3 
kind: 0.20.0

I've done some testing and this issue appears in version v2.3.1 of skaffold on version v2.3.0 it deploys successfully. The reason for this seems to be a bunch of additional error handling code that was added to deployment.go in version v2.3.1.

The only notable clue is that this issue doesn't seem to show up right after creating the cluster. After letting the cluster run for a bit this issue seems to show up.

For anyone with this issue, you can try downgrading to v2.3.0

@ericzzzzzzz
Copy link
Contributor

ericzzzzzzz commented Aug 8, 2023

Hi, thank you all for the discussion! I'm gonna prioritize this! We'd like more info to have a better understanding of the issue, could you provide a minimum reproducible project for us to debug and identify the cause ? Thanks!

@ericzzzzzzz ericzzzzzzz self-assigned this Aug 8, 2023
@ericzzzzzzz ericzzzzzzz added priority/p1 High impact feature/bug. area/status-check labels Aug 8, 2023
@aran
Copy link
Contributor Author

aran commented Aug 8, 2023

@ericzzzzzzz A key sub-issue here is that it doesn't reliably reproduce and the logging doesn't indicate what happened to facilitate constructing a reproduction. Is it possible to increase logging at the 'debug' or 'trace' level around deployment status checks? By importing or copying the kubernetes deployment status data structures directly, e.g. these ones, and logging them in full at the 'debug' level, it should be possible to eliminate STATUSCHECK_UNKNOWN. Then with any luck by 2.6.4 it will be more obvious what is driving the underlying issue.

@ericzzzzzzz
Copy link
Contributor

This might be the cause:

// Create a watcher for events
eventList, err := client.CoreV1().Events(namespace).List(ctx, metav1.ListOptions{})
if err != nil {
return fmt.Errorf("error attempting to list kubernetes events in namespace: %s, %w", namespace, err)
}
for _, event := range eventList.Items {
if event.Reason == "FailedCreate" {
if strings.HasPrefix(event.InvolvedObject.Name, deploymentName+"-") {
errMsg := fmt.Sprintf("Failed to create Pod for Deployment %s: %s\n", deploymentName, event.Message)
return fmt.Errorf(errMsg)
}
}
}
return nil

It seems that skaffold Lists all events in a namespace, it should use Watch method to get incoming events with proper selectors instead of blindly listing all events, if --tolerate-failures-until-deadline is used it will still encounter the failure from previous round 🤦 , I'm gonna work on a fix for this.

@ericzzzzzzz ericzzzzzzz removed the needs-reproduction needs reproduction from the maintainers to validate the issue is truly a skaffold bug label Aug 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/status-check priority/p1 High impact feature/bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants