Unexpected "Error creating: Unauthorized" #8972

aran · 2023-07-26T00:13:48Z

Expected behavior

skaffold dev -p dev reliably deploys a helm-based skaffold deploy to kubernetes running in Colima.

Actual behavior

I’m seeing a skaffold dev fail with “Failed to create Pod for Deployment <…>: Error creating: Unauthorized”
However with --keep-running-on-failures and --tolerate-failures-until-deadline, I am able to see with kubectl get deployment -A that the deployment has in fact succeeded. Running with --verbosity='trace' isn’t showing any other clues that look obvious.

Information

Skaffold version:& {Version:v2.6.2 ConfigVersion:skaffold/v4beta6 GitVersion: GitCommit:0d544d26766058c5a49ba80907edf438a01c90d4 BuildDate:2023-07-18T19:56:24Z GoVersion:go1.20.6 Compiler:gc Platform:darwin/arm64 User:}
Operating system: macOS
Installed via: Homebrew
Contents of skaffold.yaml:

---
"apiVersion": "skaffold/v4beta5"
"build":
  "artifacts":
  - "bazel":
      "target": "//foundation/cloudnative-pg:cloudnative-pg.tar"
    "image": "cloudnative-pg"
  - "bazel":
      "target": "//foundation/cloudnative-pg:busybox.tar"
    "image": "busybox"
  "tagPolicy":
    "sha256": {}
"kind": "Config"
"metadata":
  "name": "cloudnative-pg"
"profiles":
- "deploy":
    "helm":
      "releases":
      - "createNamespace": true
        "name": "cloudnative-pg-release"
        "namespace": "cnpg-system"
        "remoteChart": "cloudnative-pg"
        "repo": "https://cloudnative-pg.io/charts/"
        "setValueTemplates":
          "image.repository": "{{.IMAGE_REPO_cloudnative_pg}}"
          "image.tag": "{{.IMAGE_TAG_cloudnative_pg}}@{{.IMAGE_DIGEST_cloudnative_pg}}"
          "test.image.repository": "{{.IMAGE_REPO_busybox}}"
          "test.image.tag": "{{.IMAGE_TAG_busybox}}@{{.IMAGE_DIGEST_busybox}}"
        "setValues":
          "image.pullPolicy": "IfNotPresent"
          "serviceAccount.create": "true"
          "serviceAccount.name": "cloudnative-pg-release"
          "test.image.pullPolicy": "IfNotPresent"
  "name": "dev"
- "deploy":
    "helm":
      "releases":
      - "createNamespace": false
        "name": "cloudnative-pg-release"
        "namespace": "cnpg-system"
        "remoteChart": "cloudnative-pg"
        "repo": "https://cloudnative-pg.io/charts/"
        "setValueTemplates":
          "image.repository": "{{.IMAGE_REPO_cloudnative_pg}}"
          "image.tag": "{{.IMAGE_TAG_cloudnative_pg}}@{{.IMAGE_DIGEST_cloudnative_pg}}"
          "test.image.repository": "{{.IMAGE_REPO_busybox}}"
          "test.image.tag": "{{.IMAGE_TAG_busybox}}@{{.IMAGE_DIGEST_busybox}}"
        "setValues":
          "image.pullPolicy": "IfNotPresent"
          "serviceAccount.create": "false"
          "serviceAccount.name": "cloudnative-pg-release"
          "test.image.pullPolicy": "IfNotPresent"
  "manifests":
    "rawYaml":
    - "namespace.yaml"
    - "sa.yaml"
  "name": "prod"
...

Steps to reproduce the behavior

skaffold dev -p dev in a folder with the above skaffold.yaml file.

On another mac, with same skaffold, colima and kubernetes versions, the same issue does not occur.

Arguably there's a logging enhancement implicit in this as it is hard to diagnose what is occurring from the existing logs.

skaffold_debug2.txt

The text was updated successfully, but these errors were encountered:

ocni-dtu · 2023-07-28T07:03:01Z

I'm experiencing the same error.
The skaffold deployment fails, but if I let the deployment/pods run after skaffold's failure the pod successfully starts.

davidedmondsMPG · 2023-07-28T11:20:39Z

I'm getting similar installing Agones via Helm, in both the Render (subsequently deployed with kubectl with --server-side flag) and Deploy phases:

(anonymized)

apiVersion: skaffold/v4beta1
kind: Config
metadata:
  name: service

build:
  artifacts:
    - image: service
      ko:
        main: ./cmd/service

manifests:
  kustomize:
    paths:
      - # two paths for kustomization resources
  helm:
    releases:
      - name: agones
        namespace: agones-system
        createNamespace: true
        remoteChart: agones
        repo: https://agones.dev/chart/stable
        version: 1.33.0
        setValues:
          agones.ping.install: false

deploy:
  kubectl:
    flags:
      apply:
        - --server-side
        - --force-conflicts

portForward:
  - # three deployment port forwards

Log output (anonymised):

- priorityclass.scheduling.k8s.io/agones-system serverside-applied
# snip
Waiting for deployments to stabilize...
# snip
 - agones-system:deployment/agones-extensions: Failed to create Pod for Deployment agones-extensions: Error creating: pods "agones-extensions-5cd94689cb-" is forbidden: no PriorityClass with name agones-system was found
 - agones-system:deployment/agones-extensions failed. Error: Failed to create Pod for Deployment agones-extensions: Error creating: pods "agones-extensions-5cd94689cb-" is forbidden: no PriorityClass with name agones-system was found.
 - agones-system:deployment/agones-controller: Failed to create Pod for Deployment agones-controller: Error creating: pods "agones-controller-6d55775b94-" is forbidden: no PriorityClass with name agones-system was found
 - agones-system:deployment/agones-controller failed. Error: Failed to create Pod for Deployment agones-controller: Error creating: pods "agones-controller-6d55775b94-" is forbidden: no PriorityClass with name agones-system was found.

Running the Helm chart outside of skaffold results in working pods - I don't see them going into a failure state when watching in Kubectl, but they definitely go through that state in the event log:

0s          Normal    NoPods                   poddisruptionbudget/agones-controller-pdb   No matching pods found
0s          Normal    NoPods                   poddisruptionbudget/agones-extensions-pdb   No matching pods found
0s          Normal    NoPods                   poddisruptionbudget/agones-controller-pdb   No matching pods found
0s          Normal    NoPods                   poddisruptionbudget/agones-extensions-pdb   No matching pods found
0s          Normal    EnsuringLoadBalancer     service/agones-allocator                    Ensuring load balancer
0s          Normal    ScalingReplicaSet        deployment/agones-extensions                Scaled up replica set agones-extensions-69d554d97 to 2
0s          Normal    AppliedDaemonSet         service/agones-allocator                    Applied LoadBalancer DaemonSet kube-system/svclb-agones-allocator-ca3aa78a
0s          Normal    ScalingReplicaSet        deployment/agones-controller                Scaled up replica set agones-controller-bf5d58f54 to 2
0s          Warning   FailedCreate             replicaset/agones-controller-bf5d58f54      Error creating: pods "agones-controller-bf5d58f54-" is forbidden: no PriorityClass with name agones-system was found
0s          Normal    ScalingReplicaSet        deployment/agones-allocator                 Scaled up replica set agones-allocator-88cffb566 to 3
0s          Warning   FailedCreate             replicaset/agones-extensions-69d554d97      Error creating: pods "agones-extensions-69d554d97-" is forbidden: no PriorityClass with name agones-system was found
0s          Normal    Scheduled                pod/agones-allocator-88cffb566-lrcdw        Successfully assigned agones-system/agones-allocator-88cffb566-lrcdw to colima
1s          Normal    SuccessfulCreate         replicaset/agones-allocator-88cffb566       Created pod: agones-allocator-88cffb566-lrcdw
0s          Normal    Scheduled                pod/agones-extensions-69d554d97-jhm9w       Successfully assigned agones-system/agones-extensions-69d554d97-jhm9w to colima
0s          Normal    Scheduled                pod/agones-allocator-88cffb566-5cgkm        Successfully assigned agones-system/agones-allocator-88cffb566-5cgkm to colima
0s          Normal    Scheduled                pod/agones-allocator-88cffb566-krjht        Successfully assigned agones-system/agones-allocator-88cffb566-krjht to colima
0s          Normal    Scheduled                pod/agones-controller-bf5d58f54-6g75g       Successfully assigned agones-system/agones-controller-bf5d58f54-6g75g to colima
1s          Normal    SuccessfulCreate         replicaset/agones-allocator-88cffb566       Created pod: agones-allocator-88cffb566-krjht
0s          Normal    Scheduled                pod/agones-controller-bf5d58f54-snpj2       Successfully assigned agones-system/agones-controller-bf5d58f54-snpj2 to colima
0s          Normal    Scheduled                pod/agones-extensions-69d554d97-n4kgf       Successfully assigned agones-system/agones-extensions-69d554d97-n4kgf to colima
1s          Normal    SuccessfulCreate         replicaset/agones-allocator-88cffb566       Created pod: agones-allocator-88cffb566-5cgkm
1s          Normal    SuccessfulCreate         replicaset/agones-extensions-69d554d97      Created pod: agones-extensions-69d554d97-jhm9w
1s          Normal    SuccessfulCreate         replicaset/agones-controller-bf5d58f54      Created pod: agones-controller-bf5d58f54-6g75g
1s          Normal    SuccessfulCreate         replicaset/agones-controller-bf5d58f54      Created pod: agones-controller-bf5d58f54-snpj2
1s          Normal    SuccessfulCreate         replicaset/agones-extensions-69d554d97      Created pod: agones-extensions-69d554d97-n4kgf
0s          Warning   FailedMount              pod/agones-allocator-88cffb566-lrcdw        MountVolume.SetUp failed for volume "client-ca" : failed to sync secret cache: timed out waiting for the condition
0s          Normal    Pulled                   pod/agones-extensions-69d554d97-jhm9w       Container image "us-docker.pkg.dev/agones-images/release/agones-extensions:1.33.0" already present on machine
0s          Normal    Created                  pod/agones-extensions-69d554d97-jhm9w       Created container agones-extensions
0s          Normal    Pulled                   pod/agones-controller-bf5d58f54-snpj2       Container image "us-docker.pkg.dev/agones-images/release/agones-controller:1.33.0" already present on machine
0s          Normal    Created                  pod/agones-controller-bf5d58f54-snpj2       Created container agones-controller
0s          Normal    Started                  pod/agones-extensions-69d554d97-jhm9w       Started container agones-extensions
0s          Normal    Started                  pod/agones-controller-bf5d58f54-snpj2       Started container agones-controller
0s          Normal    Pulled                   pod/agones-allocator-88cffb566-5cgkm        Container image "us-docker.pkg.dev/agones-images/release/agones-allocator:1.33.0" already present on machine
0s          Normal    Pulled                   pod/agones-controller-bf5d58f54-6g75g       Container image "us-docker.pkg.dev/agones-images/release/agones-controller:1.33.0" already present on machine
0s          Normal    Created                  pod/agones-allocator-88cffb566-5cgkm        Created container agones-allocator
0s          Normal    Pulled                   pod/agones-allocator-88cffb566-krjht        Container image "us-docker.pkg.dev/agones-images/release/agones-allocator:1.33.0" already present on machine
0s          Normal    Created                  pod/agones-controller-bf5d58f54-6g75g       Created container agones-controller
0s          Normal    UpdatedLoadBalancer      service/agones-allocator                    Updated LoadBalancer with new IPs: [] -> [192.168.106.2]
0s          Normal    Pulled                   pod/agones-allocator-88cffb566-lrcdw        Container image "us-docker.pkg.dev/agones-images/release/agones-allocator:1.33.0" already present on machine
0s          Normal    Created                  pod/agones-allocator-88cffb566-krjht        Created container agones-allocator
1s          Normal    Created                  pod/agones-allocator-88cffb566-lrcdw        Created container agones-allocator
1s          Normal    Pulled                   pod/agones-extensions-69d554d97-n4kgf       Container image "us-docker.pkg.dev/agones-images/release/agones-extensions:1.33.0" already present on machine
1s          Normal    Created                  pod/agones-extensions-69d554d97-n4kgf       Created container agones-extensions
0s          Normal    Started                  pod/agones-allocator-88cffb566-5cgkm        Started container agones-allocator
0s          Normal    Started                  pod/agones-allocator-88cffb566-krjht        Started container agones-allocator
0s          Normal    Started                  pod/agones-controller-bf5d58f54-6g75g       Started container agones-controller
0s          Normal    Started                  pod/agones-extensions-69d554d97-n4kgf       Started container agones-extensions
0s          Normal    Started                  pod/agones-allocator-88cffb566-lrcdw        Started container agones-allocator

aran · 2023-07-28T15:49:46Z

Maybe this should be split into two issues (or, kept as one issue, but with two distinct parts)

1 - (More actionable today) Add to the logging around pod/deployment failure detection, especially for helm.
2 - (Perhaps blocked on that) Actually repro and fix the underlying issue that is causing skaffold to have seemingly-incorrect behavior.

@ericzzzzzzz - what do you think? If you want me to split it up, happy to do that.

ericzzzzzzz · 2023-07-31T07:04:01Z

Hi @davidedmondsMPG, sorry for the late reply.

The skaffold command fails at status check stage, that part logic is the same across deployers. Skaffold checks k8s event for failedCreate event, if this type event exists, that part logic will return an error. The error message "ailed to create Pod for Deployment agones-extensions: Error creating: pods "agones-extensions-5cd94689cb-" is forbidden: no PriorityClass with name agones-system was found", shows that PriorityClass was not found when creating the pod, then later we saw a success creation, it seemed that the corresponding pritoityClass resource available at a later time. For some failures, skaffold is not able to determine if they will be automatically fixed at a later time. We've seen some cases like that, for example, a cluster doesn't have enough resources to schedule deployments, the cluster can auto scale to provision another node, and the deployment will succeed eventually. It is also possible that the cluster has hit its cap, it will not scale up more nodes and the deployment will never succeed. That's why we introduced --tolerate-failures-until-deadline to let users determine how to handle cases like this.

ericzzzzzzz · 2023-07-31T07:07:27Z

@aran similar to davidedmondsMPG's case, your k8s resource may rely on something that is created when installing the release, that's why you also need to use --tolerate-failures-until-deadline flag

aran · 2023-07-31T20:58:23Z

@ericzzzzzzz we did run with --tolerate-failures-until-deadline and skaffold did not seem to see the successful deployment later

pipaliyachirag · 2023-08-07T11:56:04Z

any follow up?

BenjaminBenetti · 2023-08-07T20:55:23Z

I'm also suffering from this issue. Passing --tolerate-failures-until-deadline does prevent skaffold from throwing an error however it also will never recognize the successful deployment, preventing file sync from kicking in. My setup is pretty simple and I've been using similar configurations successfully for a while. I'm running on a freshly created local cluster.

skaffold: v2.6.2
helm: v3.11
k8s: v1.27.3 
kind: 0.20.0

I've done some testing and this issue appears in version v2.3.1 of skaffold on version v2.3.0 it deploys successfully. The reason for this seems to be a bunch of additional error handling code that was added to deployment.go in version v2.3.1.

The only notable clue is that this issue doesn't seem to show up right after creating the cluster. After letting the cluster run for a bit this issue seems to show up.

For anyone with this issue, you can try downgrading to v2.3.0

ericzzzzzzz · 2023-08-08T15:58:08Z

Hi, thank you all for the discussion! I'm gonna prioritize this! We'd like more info to have a better understanding of the issue, could you provide a minimum reproducible project for us to debug and identify the cause ? Thanks!

aran · 2023-08-08T16:36:59Z

@ericzzzzzzz A key sub-issue here is that it doesn't reliably reproduce and the logging doesn't indicate what happened to facilitate constructing a reproduction. Is it possible to increase logging at the 'debug' or 'trace' level around deployment status checks? By importing or copying the kubernetes deployment status data structures directly, e.g. these ones, and logging them in full at the 'debug' level, it should be possible to eliminate STATUSCHECK_UNKNOWN. Then with any luck by 2.6.4 it will be more obvious what is driving the underlying issue.

ericzzzzzzz · 2023-08-09T02:01:17Z

This might be the cause:

skaffold/pkg/skaffold/kubernetes/status/resource/deployment.go

Lines 238 to 252 in 8cb32b3

    
           // Create a watcher for events 
        
           eventList, err := client.CoreV1().Events(namespace).List(ctx, metav1.ListOptions{}) 
        
           if err != nil { 
        
           	return fmt.Errorf("error attempting to list kubernetes events in namespace: %s, %w", namespace, err) 
        
           } 
        
           for _, event := range eventList.Items { 
        
           	if event.Reason == "FailedCreate" { 
        
           		if strings.HasPrefix(event.InvolvedObject.Name, deploymentName+"-") { 
        
           			errMsg := fmt.Sprintf("Failed to create Pod for Deployment %s: %s\n", deploymentName, event.Message) 
        
           			return fmt.Errorf(errMsg) 
        
           		} 
        
           	} 
        
           } 
        
           return nil

It seems that skaffold Lists all events in a namespace, it should use Watch method to get incoming events with proper selectors instead of blindly listing all events, if --tolerate-failures-until-deadline is used it will still encounter the failure from previous round 🤦 , I'm gonna work on a fix for this.

ericzzzzzzz added the needs-reproduction needs reproduction from the maintainers to validate the issue is truly a skaffold bug label Jul 26, 2023

ericzzzzzzz self-assigned this Aug 8, 2023

ericzzzzzzz added priority/p1 High impact feature/bug. area/status-check labels Aug 8, 2023

ericzzzzzzz mentioned this issue Aug 11, 2023

fix: status check lists all events #9015

Merged

ericzzzzzzz removed the needs-reproduction needs reproduction from the maintainers to validate the issue is truly a skaffold bug label Aug 11, 2023

ericzzzzzzz closed this as completed in #9015 Aug 15, 2023

shafeeqes mentioned this issue Aug 28, 2023

Update versions of various tools gardener/gardener#8388

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected "Error creating: Unauthorized" #8972

Unexpected "Error creating: Unauthorized" #8972

aran commented Jul 26, 2023

ocni-dtu commented Jul 28, 2023 •

edited

Loading

davidedmondsMPG commented Jul 28, 2023

aran commented Jul 28, 2023

ericzzzzzzz commented Jul 31, 2023

ericzzzzzzz commented Jul 31, 2023

aran commented Jul 31, 2023

pipaliyachirag commented Aug 7, 2023

BenjaminBenetti commented Aug 7, 2023

ericzzzzzzz commented Aug 8, 2023 •

edited

Loading

aran commented Aug 8, 2023

ericzzzzzzz commented Aug 9, 2023

Unexpected "Error creating: Unauthorized" #8972

Unexpected "Error creating: Unauthorized" #8972

Comments

aran commented Jul 26, 2023

Expected behavior

Actual behavior

Information

Steps to reproduce the behavior

ocni-dtu commented Jul 28, 2023 • edited Loading

davidedmondsMPG commented Jul 28, 2023

aran commented Jul 28, 2023

ericzzzzzzz commented Jul 31, 2023

ericzzzzzzz commented Jul 31, 2023

aran commented Jul 31, 2023

pipaliyachirag commented Aug 7, 2023

BenjaminBenetti commented Aug 7, 2023

ericzzzzzzz commented Aug 8, 2023 • edited Loading

aran commented Aug 8, 2023

ericzzzzzzz commented Aug 9, 2023

ocni-dtu commented Jul 28, 2023 •

edited

Loading

ericzzzzzzz commented Aug 8, 2023 •

edited

Loading