Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaskRun fails during initialization when disable-home-env-overwrite=true #2165

Closed
chanseokoh opened this issue Mar 5, 2020 · 35 comments · Fixed by #2180
Closed

TaskRun fails during initialization when disable-home-env-overwrite=true #2165

chanseokoh opened this issue Mar 5, 2020 · 35 comments · Fixed by #2180
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@chanseokoh
Copy link
Contributor

This is closely related to the on-going Tekton $HOME issue (#2013 (comment)). I am testing disable-home-env-overwrite before it gets flipped.

This comment says

With this new flag Tekton will no longer interfere with HOME - it will be whatever you expect it to be when the container runs in a Pod.

Previously $HOME would have been set to /tekton/home but now it won't be. So I would expect $HOME/.docker/config.json to be written to /root/.docker/config.json if the user is root and the image doesn't specify its own HOME.

I don't think this is the case. I am testing gcr.io/cloud-builders/gradle, but Tekton fails as it tries to create a directory /.docker.

"level":"fatal",
"ts":1583431818.4164164,
"caller":"creds-init/main.go:41",
"msg":"Error initializing credentials: mkdir /.docker: permission denied",
"stacktrace":
main.main
    github.com/tektoncd/pipeline/cmd/creds-init/main.go:41
runtime.main
    runtime/proc.go:203

Note the "permission denied" error is not the issue here. The issue is that it is /.docker instead of /root/.docker.

@chanseokoh
Copy link
Contributor Author

I tested this after defining HOME=/root on top of gcr.io/cloud-builders/gradle, but it doesn't seem to make any difference. It's still /.docker.

@chanseokoh
Copy link
Contributor Author

Note that this error is from the credential-initializer init container. So credential-initializer needs to get the user home information from my main step.

But then, we will have to answer this question: what if I define multiple steps whose images have different user home directories (say, the home of image A in the first step is /root, while image B in the next step is /home/foo)? Where should credential-initializer put .docker/config.json? Perhaps both /root/.docker/config.json and /home/foo/.docker/config.json.

@ghost
Copy link

ghost commented Mar 5, 2020

I think relying on creds-init might actually be the wrong approach here. You're right that multiple home directories in multiple steps make things complicated. You might consider passing a workspace into your pipelines instead. This workspace could be from a Secret or could be from a persistent volume. You could mount it at a known location in your Task (such as /home/foo) and then symlink or copy it explicitly as needed in each Step. Users of the Pipeline could then provide the credentials through this workspace.

To me, it seems like creds-init is simply not capable of injecting credentials in a reliable way for this use case and I'm not sure that copying these creds everywhere that they might be used is the best way to go. I'm still thinking about possible solutions but something about the way it works currently doesn't feel right to me.

@chanseokoh
Copy link
Contributor Author

chanseokoh commented Mar 5, 2020

Thanks. I'll look into mounting a Secret, but I'm not sure if this will work nicely as a tekton catalog writer (jib-gradle and jib-maven). This sounds like my task will add a special contract only applicable for my catalog for providing Docker credentials. I'd like a general solution in a document way at the Tekton level, but I'll look into mounting a Secret anyways.

In any case, this error is blocking me from testing disable-home-env-overwrite at all. My cluster has some Docker Secrets by default (which I don't want to remove because I don't know what they are), and I think creds-init just crashes because of their presence. Or, maybe it's always crashing no matter what because $HOME is undefind?

@chanseokoh
Copy link
Contributor Author

chanseokoh commented Mar 5, 2020

I'm not sure that copying these creds everywhere that they might be used is the best way to go.

BTW, isn't this the current behavior? The Docker credentials are exposed to every step at /home/tekton. Doesn't seem like it will make things worse.

@ghost
Copy link

ghost commented Mar 6, 2020

I'm not sure that copying these creds everywhere that they might be used is the best way to go.

BTW, isn't this the current behavior? The Docker credentials are exposed to every step at /home/tekton. Doesn't seem like it will make things worse.

I meant more the copying of creds to every possible HOME directory that steps of a task might use. Having a single root-owned location seems different to me than having 10 copies with random owners. Maybe it's a distinction that doesn't matter after all, I dunno.

To me this feels like there should be an explicit "opt-in" - Step A declares that it will use creds X so we do copy X into that step's ~/.X directory. But Step B does not explicitly request any creds so we do not copy any into that container user's home.

@ghost
Copy link

ghost commented Mar 6, 2020

This sounds like my task will add a special contract only applicable for my catalog for providing Docker credentials. I'd like a general solution in a document way at the Tekton level, but I'll look into mounting a Secret anyways.

Yeah that's a fair point and I understand not wanting to take this path if this isn't an approach that everyone uses. I'm wondering whether this should become a recommendation for catalog authors though - to expose (optional) workspaces for credentials to be mounted into. If everyone was doing it then it might not be bad?

@ghost ghost added the kind/bug Categorizes issue or PR as related to a bug. label Mar 6, 2020
@chanseokoh
Copy link
Contributor Author

chanseokoh commented Mar 6, 2020

I'm wondering whether this should become a recommendation for catalog authors though - to expose (optional) workspaces for credentials

I can do this (as I commented in #2119 (comment)), and I will do this if this is the only option. However, I'd like to throw a couple points for the sake of discussions.

In my case, the Docker config file is just one of many ways to get credentials for remote Docker registries. Taking the optional workspaces approach means having to declare 5+ optional workspaces in my Task. The Task needs to clone a git repo, pull from/push to remote Docker registries over HTTPS, and use docker credential (of type kubernetes.io/basic-auth, kubernetes.io/dockerconfigjson, kubernetes.io/dockercfg, etc) for base image and target image repositories at least. That's a lot of clutters.

Now, when you flip disable-home-env-overwrite, you certainly have to decide how the current Tekton auth support will work and do something about it. If you decide to get rid of the entire auth support, that's fair. The point is, you will make the auth support work in one way or another (as long as you don't get rid of it), and I wonder what it would look like and what your plan is. If it doesn't go away, I think most likely I will be able to make some use of it. Implementing my own optional workspaces might become a waste of time.

How about this: the current behavior is to create these credential files under /tekton/home no matter what. When flipping disable-home-env-overwrite, you basically do the same thing: create these files at a fixed location, be it /tekton/home, /workspace/credentials, /mnt/secrets, etc. The location doesn't matter much to me, as long as it is a pre-determined documented location. Maybe optionally a Task could configure this base directory. And whether you copy such files into each container or mount the secrets doesn't matter much to me either. Then, my task could just reference or copy such files.

@chanseokoh
Copy link
Contributor Author

To me this feels like there should be an explicit "opt-in" - Step A declares that it will use creds X so we do copy X into that step's ~/.X directory. But Step B does not explicitly request any creds so we do not copy any into that container user's home.

Also note, I think copying a file into a "home" directory needs more thought as I explained in #2013 (comment). For example, OpenShift runs containers as a random UID like 1015300000.

@ghost
Copy link

ghost commented Mar 6, 2020

Taking the optional workspaces approach means having to declare 5+ optional workspaces in my Task. The Task needs to clone a git repo, pull from/push to remote Docker registries over HTTPS, and use docker credential (of type kubernetes.io/basic-auth, kubernetes.io/dockerconfigjson, kubernetes.io/dockercfg, etc) for base image and target image repositories at least. That's a lot of clutters.

I think "clutter" is a mis-characterization. I view this as a Task Author explicitly declaring support for, and dependence on, credentials. It becomes obvious what the Task accepts and expects. With creds-init the end-user has to learn this odd system of annotations on Secrets with names like "tekton.dev/git-0" that are utterly removed from the Task they're trying to run. They have to digest auth.md in its entirety in order to make this work.

So in other words I feel like the burden should be on catalog Task authors to explicitly declare what creds they support in the Tasks they write. In my view this shouldn't be something that happens quietly behind the scenes. There may be much better ways for Tekton to support authors doing this but I definitely think explicit is better than implicit here.

So having written all this, creds-init as it stands today should 100% work as it's documented to. I'm still debugging the issue with the HOME var mentioned at the top of this post and will update as I figure out what's up or if I have more questions.

@ghost ghost self-assigned this Mar 6, 2020
@chanseokoh
Copy link
Contributor Author

chanseokoh commented Mar 6, 2020

So in other words I feel like the burden should be on catalog Task authors to explicitly declare what creds they support in the Tasks they write.

Thanks. If Tekton recommends this approach, I am all for it. I hope the optional workspace feature is implemented soon. For now, I think I'm blocked on the issue to implement this approach.

So having written all this, creds-init as it stands today should 100% work as it's documented to.

Anyways, I'm still super curious how exactly it will work when disable-home-env-overwrite is flipped (including when running as non-root). I appreciate your update once you make a decision.

@ghost
Copy link

ghost commented Mar 6, 2020

I've gone with the approach of placing the creds in a fixed location (/tekton/home) and made a PR here: #2180

@ghost
Copy link

ghost commented Mar 9, 2020

Design doc for this problem to be discussed in WG on wednesday: https://docs.google.com/document/d/1SVuDt-SXPHymz41dveSXFSPrx5Z-Wzb9eHliJAyYg4o

@ghost
Copy link

ghost commented Mar 24, 2020

Just to reiterate from the Pull Request that closed this Issue:

  1. Credentials are now written to /tekton/creds when the disable-home-env-overwrite flag is "true".
  2. A new variable has been exposed, $(credentials.path) which points to the place where creds-init wrote the credentials.
  3. Our entrypoint binary will automatically copy credentials from $(credentials.path) to the Step's HOME. We find the HOME directory using go-homedir rather than relying on just the $HOME env var.

@chanseokoh once v0.11.0-rc3 is released this fix will be available to try out. Very keen to hear your feedback / experience with the changes!

@ghost
Copy link

ghost commented Mar 25, 2020

Pipelines Beta RC3 has been released and includes this fix https://github.com/tektoncd/pipeline/releases/tag/v0.11.0-rc3

@chanseokoh
Copy link
Contributor Author

chanseokoh commented Apr 29, 2020

I'm testing this now and also the case where I run a pipeline as a non-root while forcing $HOME to /workspace (which I think is a reasonable choice). One problem is that the non-existing parent directories of the volumes I mount for caching have ownership and permissions of drwxr-xr-x root root and my process can't write things into it. For example, in my task spec,

    env:
    - name: HOME
      value: /workspace
    volumeMounts:
    - name: $(inputs.params.CACHE)
      mountPath: /workspace/.gradle/caches
      subPath: gradle-caches
    - name: $(inputs.params.CACHE)
      mountPath: /workspace/.gradle/wrapper
      subPath: gradle-wrapper

And /workspace/.gradle is not writable by a non-root user. (But /workspace/.gradle/caches and /workspace/.gradle/wrapper are globally writable.)

[build-and-push] drwxr-xr-x 3 root root 4096 Apr 29 16:53 .gradle

What's the right solution to this?

(Assume that the k8s runtime will assign an arbitrary random user like in OpenShift.)

@chanseokoh
Copy link
Contributor Author

chanseokoh commented Apr 29, 2020

FYI, I was setting securityContext: runAsUser: 1234 in a Task spec.

Now, if I set the following in a TaskRun spec,

    securityContext:
      runAsUser: 1111
      runAsGroup: 2222
      fsGroup: 3333

I get the following errors when disable-home-env-overwrite and disable-working-directory-overwrite are set to true.

[create-dir-image-ppl7m] 2020/04/29 17:17:03 unsuccessful cred copy: ".docker" from "/tekton/creds" to "/": unable to create destination directory: mkdir /.docker: permission denied
[create-dir-image-ppl7m] 2020/04/29 17:17:03 unsuccessful cred copy: ".gitconfig" from "/tekton/creds" to "/": unable to open destination: open /.gitconfig: permission denied
[create-dir-image-ppl7m] 2020/04/29 17:17:03 unsuccessful cred copy: ".git-credentials" from "/tekton/creds" to "/": unable to open destination: open /.git-credentials: permission denied
[create-dir-image-ppl7m] 2020/04/29 17:17:03 unsuccessful cred copy: ".ssh" from "/tekton/creds" to "/": unable to create destination directory: mkdir /.ssh: permission denied

[git-source-source-v75bx] 2020/04/29 17:17:03 unsuccessful cred copy: ".docker" from "/tekton/creds" to "/": unable to create destination directory: mkdir /.docker: permission denied
[git-source-source-v75bx] 2020/04/29 17:17:03 unsuccessful cred copy: ".gitconfig" from "/tekton/creds" to "/": unable to open destination: open /.gitconfig: permission denied
[git-source-source-v75bx] 2020/04/29 17:17:03 unsuccessful cred copy: ".git-credentials" from "/tekton/creds" to "/": unable to open destination: open /.git-credentials: permission denied
[git-source-source-v75bx] 2020/04/29 17:17:03 unsuccessful cred copy: ".ssh" from "/tekton/creds" to "/": unable to create destination directory: mkdir /.ssh: permission denied
[git-source-source-v75bx] {"level":"error","ts":1588180626.7862072,"caller":"git/git.go:41","msg":"Error running git [config --global http.sslVerify true]: exit status 255\nerror: could not lock config file //.gitconfig: Permission denied\n","stacktrace":"github.com/tektoncd/pipeline/pkg/git.run\n\tgithub.com/tektoncd/pipeline/pkg/git/git.go:41\ngithub.com/tektoncd/pipeline/pkg/git.Fetch\n\tgithub.com/tektoncd/pipeline/pkg/git/git.go:82\nmain.main\n\tgithub.com/tektoncd/pipeline/cmd/git-init/main.go:53\nruntime.main\n\truntime/proc.go:203"}
[git-source-source-v75bx] {"level":"warn","ts":1588180626.7865357,"caller":"git/git.go:83","msg":"Failed to set http.sslVerify in git config: exit status 255"}
[git-source-source-v75bx] {"level":"fatal","ts":1588180626.7866344,"caller":"git-init/main.go:54","msg":"Error fetching git repository: exit status 255","stacktrace":"main.main\n\tgithub.com/tektoncd/pipeline/cmd/git-init/main.go:54\nruntime.main\n\truntime/proc.go:203"}

container step-git-source-source-v75bx has failed  : [{"key":"StartedAt","value":"2020-04-29T17:17:06Z","resourceRef":{}}]

BTW, without the securityContext settings in a Task spec, I do see the directory /workspace/.docker. (As in my previous comment, I set HOME to /workspace.) So it's weird to see it trying to copy .docker into a different place (which is /) when securityContext is set:

unsuccessful cred copy: ".docker" from "/tekton/creds" to "/": unable to create destination directory: mkdir /.docker: permission denied

@chanseokoh
Copy link
Contributor Author

One problem is that the volumes I mount for caching have ownership and permissions of drwxr-xr-x root root and my process can't write things into it.

Sorry, I need to correct this. It is the "non-existing parent directories of the volumes". For example, I am mounting /workspace/.gradle/cache where /wokspace/.gradle doesn't exist. I see that /workspace/.gradle/cache is globally writable, but the parent directory /wokspace/.gradle that did not exist before is not writable (having drwxr-xr-x root root). I wonder if Tekton should also make /wokspace/.gradle writable? Otherwise, what would be the solution or a temporary workaround?

@ghost
Copy link

ghost commented Apr 29, 2020

Volume mounts can be nested, so /workspace/.gradle can be a volumeMount even while /workspace/.gradle/cache and /workspace/.gradle/wrapper are as well. This results in all three directories being world-writable.

For the "unsuccessful cred copy" messages, can you confirm whether $HOME was set in your Step's env? If so I'm baffled how the cred copy didn't figure out what HOME was and why it decided to write to / instead. If HOME wasn't set though, what was the user's home directory expected to be resolved to?

The $(credentials.path) variable is available for cases where Tekton just straight up does the wrong thing. In this case your Task can take charge of copying the credentials out of that directory into the correct HOME location.

@chanseokoh
Copy link
Contributor Author

Volume mounts can be nested, so /workspace/.gradle can be a volumeMount even while /workspace/.gradle/cache and /workspace/.gradle/wrapper are as well. This results in all three directories being world-writable.

The problem is that I don't want to cache /workspace/.gradle. But now I realized that probably I can just always mount an emptyDir.

For the "unsuccessful cred copy" messages, can you confirm whether $HOME was set in your Step's env? If so I'm baffled how the cred copy didn't figure out what HOME was and why it decided to write to / instead. If HOME wasn't set though, what was the user's home directory expected to be resolved to?

$HOME is set to /workspace in a Task spec as you can see in my previous example, and everything seems to work OK when I don't set securityContext in a TaskRun spec. That is, I can see /workspace/.docker. So, what is weird is that, without touching anything (i.e., $HOME still set to /workspace), setting securityContext causes the "unsuccessful cred copy" into /.

This error just prevents my task from running at all, so I think there's even no opportunity of using $(credentials.path). In any case, I think this looks like a bug that prevents you from using securityContext.

@chanseokoh
Copy link
Contributor Author

chanseokoh commented Apr 29, 2020

And sorry for asking a bit unrelated question: my task gets a git project source at /workspace/source (git resource), but /workspace/source is not writable by a non-root. I can work around this by mounting an emptyDir to /workspace/source only for the sake of making it globally writable (UPDATE: no, this just clears /workspace/source), but I think there should be a better way?

@ghost
Copy link

ghost commented Apr 29, 2020

Let's take this one complete example at a time. It's difficult to know where to start otherwise. I've set both disable-home-env-overwrite and disable-working-directory-overwrite to "true" in my cluster. I do not see the same thing you do where it tries to write to /.ssh though.

Here's the complete TaskRun and TaskSpec which I'm running. It uses two securityContexts, one in the TaskRun's podTemplate and one in the Step:

apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  generateName: test-creds-
spec:
  podTemplate:
    securityContext:
      runAsUser: 1111
      runAsGroup: 2222
      fsGroup: 3333
  taskSpec:
    steps:
    - name: check-dirs
      image: ubuntu
      env:
      - name: HOME
        value: /workspace
      script: |
        #!/usr/bin/env bash
        set -xe
        id
        ls -lahR $(credentials.path)
        echo ~
        echo $HOME
        ls -lahR $HOME
      securityContext:
        runAsUser: 1234

Here're the relevant log lines I see in the when I run this TaskRun:

[check-dirs] 2020/04/29 19:41:55 unsuccessful cred copy: ".docker" from "/tekton/creds" to "/workspace": unable to open source: open /tekton/creds/.docker/config.json: permission denied
[check-dirs] 2020/04/29 19:41:55 unsuccessful cred copy: ".gitconfig" from "/tekton/creds" to "/workspace": unable to open source: open /tekton/creds/.gitconfig: permission denied
[check-dirs] 2020/04/29 19:41:55 unsuccessful cred copy: ".git-credentials" from "/tekton/creds" to "/workspace": unable to open source: open /tekton/creds/.git-credentials: permission denied
[check-dirs] 2020/04/29 19:41:55 unsuccessful cred copy: ".ssh" from "/tekton/creds" to "/workspace": unable to open source: open /tekton/creds/.ssh/known_hosts: permission denied

The only one of these that I'm actually interested in (because my ServiceAccount / Secret specify it) is this line:

[check-dirs] 2020/04/29 19:41:55 unsuccessful cred copy: ".ssh" from "/tekton/creds" to "/workspace": unable to open source: open /tekton/creds/.ssh/known_hosts: permission denied

So what's happening here?

Looking at the directory structure of /tekton/creds I see the following:

[check-dirs] /tekton/creds:
[check-dirs] total 8.0K
[check-dirs] drwxrwsrwt 4 root 3333  120 Apr 29 19:41 .
[check-dirs] drwxr-xr-x 8 root root 4.0K Apr 29 19:41 ..
[check-dirs] drwxr-sr-x 2 1111 3333   60 Apr 29 19:41 .docker
[check-dirs] -rw------- 1 1111 3333    0 Apr 29 19:41 .git-credentials
[check-dirs] -rw------- 1 1111 3333   29 Apr 29 19:41 .gitconfig
[check-dirs] drwxr-sr-x 2 1111 3333  100 Apr 29 19:41 .ssh

I'm only interested in the .ssh directory. It's interesting that the owner of the directory is 1111. My Step has user set to 1234. But I see that the read and exec bits are set for both group and global so I'd assume we're ok here.

Looking at the directory structure of /tekton/creds/.ssh I see this:

[check-dirs] /tekton/creds/.ssh:
[check-dirs] total 12K
[check-dirs] drwxr-sr-x 2 1111 3333 100 Apr 29 19:41 .
[check-dirs] drwxrwsrwt 4 root 3333 120 Apr 29 19:41 ..
[check-dirs] -rw------- 1 1111 3333 110 Apr 29 19:41 config
[check-dirs] -rw------- 1 1111 3333  23 Apr 29 19:41 id_fake-ssh-directory
[check-dirs] -rw------- 1 1111 3333  28 Apr 29 19:41 known_hosts

Hm, OK so this explains why the error message open /tekton/creds/.ssh/known_hosts: permission denied showed up. My Step runs as user 1234 but the files in /tekton/creds/.ssh are owned by user 1111 and there are no group or global bits set.

So I go check the docs. Reading there it sounds like written files will be owned by the user specified in the securityContext's runAsUser field. Now, the way that /tekton/creds/.ssh ended up being there is that an init container, credentials-initializer copied the directory out of a Secret volume mount. That init container won't have had the runAsUser of 1234, it would've been the Pod securityContext's runAsUser value, 1111. So that probably explains what's happening here: The creds-init helper runs as the PodTemplate's user, 1111, and writes files with that UID to /tekton/creds. Then our Step container comes along and specifies its own runAsUser, 1234, which isn't allowed to read files created by 1111. Our entrypoint tries to copy the file /tekton/creds/.ssh/known_hosts but isn't allowed.

Right, we've got a pretty good idea of what's happening now. What's a good solution? At the moment our creds-init helper writes files with permission 0600 and clearly that isn't going to work if the UID can be different for every single Step. We could change these permissions to be world-readable but that seems insecure. It does work, I've just tested it, but it seems like a garbage solution.

Right now I'm thinking maybe we should drop creds-init completely. Every Step could get the Secrets as volume mounts and do the same work that creds-init does but in the context of their own isolated securityContext, writing to their own $HOME directory, as their respective runAsUser user. Is this better than giving the files 0666 permissions in creds-init though? I dunno. I'm going to sleep on it. I really wish creds-init could just go away because it's really not making the transition away from /tekton/home any easier.

Also, if possible, could you post a complete Task + TaskRun with as few Steps / other stuff in it as possible, which reproduces the issue you're seeing. It'll be much easier to provide help if we have a single complete example to work from.

@chanseokoh
Copy link
Contributor Author

chanseokoh commented Apr 29, 2020

Thanks a lot for looking into this! I couldn't agree more on what you said. And I am also not sure what the best solution should be.

I get the exact same result of yours with your Task + TaskRun. So I compared the difference with my Task and was able to come up with a sample that reproduces my error. Hope this helps.

 apiVersion: tekton.dev/v1beta1
 kind: TaskRun
 metadata:
-  generateName: test-creds-
+  name: test-creds
 spec:
+  serviceAccountName: registry-admin
   podTemplate:
     securityContext:
       runAsUser: 1111
       runAsGroup: 2222
       fsGroup: 3333
+  resources:
+    inputs:
+    - name: source
+      resourceSpec:
+        type: git
+        params:
+        - name: url
+          value: https://github.com/che-samples/console-java-simple
   taskSpec:
+    resources:
+      inputs:
+      - name: source
+        type: git
     steps:
     - name: check-dirs
       image: ubuntu

The pipeline halts with this error:

All TaskRuns deleted in namespace "default"
taskrun.tekton.dev/test-creds created
[git-source-source-rlv2l] 2020/04/29 21:58:10 unsuccessful cred copy: ".docker" from "/tekton/creds" to "/": unable to create destination directory: mkdir /.docker: permission denied
[git-source-source-rlv2l] 2020/04/29 21:58:10 unsuccessful cred copy: ".gitconfig" from "/tekton/creds" to "/": unable to open destination: open /.gitconfig: permission denied
[git-source-source-rlv2l] 2020/04/29 21:58:10 unsuccessful cred copy: ".git-credentials" from "/tekton/creds" to "/": unable to open destination: open /.git-credentials: permission denied
[git-source-source-rlv2l] 2020/04/29 21:58:10 unsuccessful cred copy: ".ssh" from "/tekton/creds" to "/": unable to create destination directory: mkdir /.ssh: permission denied
[git-source-source-rlv2l] {"level":"error","ts":1588197492.548074,"caller":"git/git.go:41","msg":"Error running git [config --global http.sslVerify true]: exit status 255\nerror: could not lock config file //.gitconfig: Permission denied\n","stacktrace":"github.com/tektoncd/pipeline/pkg/git.run\n\tgithub.com/tektoncd/pipeline/pkg/git/git.go:41\ngithub.com/tektoncd/pipeline/pkg/git.Fetch\n\tgithub.com/tektoncd/pipeline/pkg/git/git.go:82\nmain.main\n\tgithub.com/tektoncd/pipeline/cmd/git-init/main.go:53\nruntime.main\n\truntime/proc.go:203"}
[git-source-source-rlv2l] {"level":"warn","ts":1588197492.5484045,"caller":"git/git.go:83","msg":"Failed to set http.sslVerify in git config: exit status 255"}
[git-source-source-rlv2l] {"level":"fatal","ts":1588197492.5505333,"caller":"git-init/main.go:54","msg":"Error fetching git repository: exit status 255","stacktrace":"main.main\n\tgithub.com/tektoncd/pipeline/cmd/git-init/main.go:54\nruntime.main\n\truntime/proc.go:203"}

container step-git-source-source-rlv2l has failed  : [{"key":"StartedAt","value":"2020-04-29T21:58:12Z","resourceRef":{}}]

What's interesting is that, as I said, if I comment out securityContext in TaskRun, it doesn't crash.

#   podTemplate:
#     securityContext:
#       runAsUser: 1111
#       runAsGroup: 2222
#       fsGroup: 3333

UPDATE:

Interestingly, if I remove serviceAccountName: registry-admin (while keeping securityContext and resources), it crashes with a different error the same error at the end but those weird cred-copy-into-/ logs go away:

All TaskRuns deleted in namespace "default"
taskrun.tekton.dev/test-creds created          
[git-source-source-smv7m] {"level":"error","ts":1588197956.5517914,"caller":"git/git.go:41","msg":"Error running git [config --global http.sslVerify true]: exit status 255\nerror: could not lock config file //.gitconfig: Permission denied\n","stacktrace":"github.com/tektoncd/pipeline/pkg/git.run\n\tgithub.com/tektoncd/pipeline/pkg/git/git.go:41\ngithub.com/tektoncd/pipeline/pkg/git.Fetch\n\tgithub.com/tektoncd/pipeline/pkg/git/git.go:82\nmain.main\n\tgithub.com/tektoncd/pipeline/cmd/git-init/main.go:53\nruntime.main\n\truntime/proc.go:203"}
[git-source-source-smv7m] {"level":"warn","ts":1588197956.5551481,"caller":"git/git.go:83","msg":"Failed to set http.sslVerify in git config: exit status 255"}
[git-source-source-smv7m] {"level":"fatal","ts":1588197956.555225,"caller":"git-init/main.go:54","msg":"Error fetching git repository: exit status 255","stacktrace":"main.main\n\tgithub.com/tektoncd/pipeline/cmd/git-init/main.go:54\nruntime.main\n\truntime/proc.go:203"}
                                                 
container step-git-source-source-smv7m has failed  : [{"key":"StartedAt","value":"2020-04-29T22:05:56Z","resourceRef":{}}]

registry-admin is a service account holding a Docker GCR credential secret.

serviceaccount.yaml (click to expand)
apiVersion: v1
kind: ServiceAccount
metadata:
  name: registry-admin
secrets:
  - name: registry-credentials
secret.yaml (click to expand)
apiVersion: v1
kind: Secret
metadata:
  name: registry-credentials
  annotations:
    tekton.dev/docker-0: gcr.io
type: kubernetes.io/basic-auth
stringData:
  username: _json_key
  password: ...

UPDATE2: also note that it was only [git-source-source-rlv2l] that had the error here, but in #2165 (comment), it was both [create-dir-image-ppl7m] and [git-source-source-v75bx].

@ghost
Copy link

ghost commented May 1, 2020

Excellent, I've been able to reproduce the problem exactly.

create-dir-image-XXXXX is injected into a Task when either the GCS PipelineResource is used or Tekton decides it needs to create an extra directory during PipelineResource linking. It doesn't have a HOME when the home override flag is "true" and it doesn't run as root when securityContext sets non-zero user ID. So it reports errors when the entrypoint tries to copy credentials out of /tekton/creds into /.

git-source-XXXXX is placed into a Task when the Git PipelineResource is used. It shares the same problems as above and also adds another wrinkle: it can't lock the $HOME/.gitconfig file for setting configuration options. This is again because $HOME isn't set, it defaults to /, and it's running as a non-zero user ID. Unlike create-dir-image- this is a fatal error for the Git PipelineResource and the Task dies here.

Ultimately the errors with these two Steps are happening because PipelineResources don't have a HOME set and they're trying to write to / as a non-root user due to the securityContext.

So summarizing the various problems that have been discovered here:

  1. The Git PipelineResource needs to be able to lock and write files in $HOME. Specifically $HOME/.gitconfig.

  2. Credentials need to be written to /tekton/creds using the UID of the currently running Step. creds-init can't do this on its own because UID can differ from container to container.

And the likely solutions seem to me:

  • PipelineResources need to have their HOME set somewhere they can always write regardless of UID. I'm thinking /tekton/home since it's always mounted (even when the override flag is true) and it's always world-writeable since it's an emptyDir.

  • creds-init probably needs to go away completely and have its logic moved into the entrypointer. This is the only solution I can think of that will allow UID to be random, creds copied out of secret volumes with the correct file permissions, and HOME to be discovered at runtime.

Ideally the entrypointer could copy credentials straight out of secret volumes and into wherever they think $HOME is. Unfortunately an annoying extra problem that I've brought upon myself is that I've introduced $(credentials.path), which I've documented as pointing to a single location. So the entrypointer is going to need to copy the creds to /tekton/creds as well as copying them to wherever $HOME is.

I'll create issues for each of these problems and then start working on fixes for both.

@chanseokoh
Copy link
Contributor Author

Awesome! I can see what's going on.

Ultimately the errors with these two Steps are happening because PipelineResources don't have a HOME set and they're trying to write to / as a non-root user due to the securityContext.

One thing that still seems strange to me: as I said, when not setting securityContext in a TaskRun (not Task), I see that it correctly copied .docker, .gitconfig, etc into /workspace instead of /. This is reproducible with my sample when

  • using serviceAccountName
  • removing securityContext from TaskRun
[check-dirs] 2020/05/01 14:53:01 unsuccessful cred copy: ".docker" from "/tekton/creds" to "/workspace": unable to open source: open /tekton/creds/.docker/config.json: permission denied           
[check-dirs] 2020/05/01 14:53:01 unsuccessful cred copy: ".gitconfig" from "/tekton/creds" to "/workspace": unable to open source: open /tekton/creds/.gitconfig: permission denied                 
[check-dirs] 2020/05/01 14:53:01 unsuccessful cred copy: ".git-credentials" from "/tekton/creds" to "/workspace": unable to open source: open /tekton/creds/.git-credentials: permission denied     
[check-dirs] 2020/05/01 14:53:01 unsuccessful cred copy: ".ssh" from "/tekton/creds" to "/workspace": unable to open source: open /tekton/creds/.ssh/known_hosts: permission denied                 

(If you also remove securityContext from Task, they will be copied successfully into /workspace).

So, without securityContext in TaskRun, it can actually see the right $HOME value. I think something else is also involved to make this difference.

@ghost
Copy link

ghost commented May 1, 2020

So, without securityContext in TaskRun, it can actually see the right $HOME value. I think something else is also involved to make this difference.

check-dirs Step always sees the correct $HOME value, /workspace. We set this explicitly on the Step. In some of our runs above, though, the Task dies before it gets to check-dirs. The log output we see in these cases is only for git-source-xxxx and create-dir-image-xxxx. They fail writing to /.

(If you also remove securityContext from Task, they will be copied successfully into /workspace).

One small nit here: we don't set the securityContext on the Task but on the Step. So check-dirs, the step container, receives the securityContext. But creds-init (an injected initContainer) and pipeline resource injected containers do not receive that securityContext. If you remove the securityContext from the check-dirs Step then the TaskRun's securityContext is applied to all containers equally.

This is all immeasurably confusing. I'm going to try to illustrate the different scenarios here:


First scenario:

disable-home-override: "true"
No TaskRun securityContext: UID=root
No check-dirs securityContext: UID=root

Order of operations:

  1. creds-init initContainer writes credentials to /tekton/creds. Those files all get root ownership.
  2. PipelineResource containers, run as root, copy creds from /tekton/creds to /.
  3. Git PipelineResource, run as root, writes to /.gitconfig successfully.
  4. check-dirs container, runs as root, read creds from /tekton/creds, write creds to /workspace.

Second scenario:

disable-home-override: "true"
No TaskRun securityContext: UID=root
check-dirs has securityContext: UID=1234

Order of operations:

  1. creds-init initContainer writes credentials to /tekton/creds. Those files all get root ownership.
  2. PipelineResource containers, run as root, copy creds from /tekton/creds to /.
  3. Git PipelineResource, run as root, writes to /.gitconfig successfully.
  4. check-dirs container, runs as 1234, dies copying creds from /tekton/creds to /workspace because they're owned by root.
    Error: [check-dirs] 2020/05/01 14:53:01 unsuccessful cred copy: ".ssh" from "/tekton/creds" to "/workspace": unable to open source: open /tekton/creds/.ssh/known_hosts: permission denied

Third scenario:

disable-home-override: "true"
TaskRun has securityContext: UID=1111
check-dirs has securityContext: UID=1234

Order of operations:

  1. creds-init initContainer writes credentials to /tekton/creds. Those files all get 1111 ownership.
  2. PipelineResource containers, run as 1111, fail to copy creds from /tekton/creds to /.
    Messages: unsuccessful cred copy: ".ssh" from "/tekton/creds" to "/": unable to create destination directory: mkdir /.ssh: permission denied
  3. Git PipelineResource, run as 1111, fatal error: dies writing to /.gitconfig.
    Error: {"level":"error","ts":1588180626.7862072,"caller":"git/git.go:41","msg":"Error running git [config --global http.sslVerify true]: exit status 255\nerror: could not lock config file //.gitconfig: Permission denied\n","stacktrace":"github.com/tektoncd/pipeline/pkg/git.run\n\tgithub.com/tektoncd/pipeline/pkg/git/git.go:41\ngithub.com/tektoncd/pipeline/pkg/git.Fetch\n\tgithub.com/tektoncd/pipeline/pkg/git/git.go:82\nmain.main\n\tgithub.com/tektoncd/pipeline/cmd/git-init/main.go:53\nruntime.main\n\truntime/proc.go:203"}

Here check-dirs never runs. Therefore no mention of /workspace.


This is a really confusing dance and there's quite a bit of work to do to get all of Tekton's movements in lock-step.

@ghost
Copy link

ghost commented Jul 7, 2020

@chanseokoh I think this would be worth checking in on again now that 0.14 has been released. Some of the warnings may remain in the logs (e.g. unsuccessful cred copy may continue to show up) but Tasks should hopefully 🤞 no longer fail at one of these initialization stages. And credentials should be initialized in each Step with the UID of that Step's first process.

There's still some trickiness with the meaning of "$HOME" though. If a container is run with a randomized UID then that user isn't going to have a $HOME directory, since they won't have an entry in /etc/passwd. From what I understand reading around a little bit it sounds like this is fixed in OpenShift 4+ but I'm not sure how much Tekton can otherwise realistically do to handle scenarios like this. If there's no user home dir and therefore no valid $HOME directory then I'm left unsure what the correct fallback behavior is.

@chanseokoh
Copy link
Contributor Author

chanseokoh commented Jul 7, 2020

@sbwsg thanks for letting me know. I've tested 0.14 against GKE with Task and TaskRun, and it seems mostly working.

However, I've noticed a recent change in the jib-gradle catalog task that replaced git input resource with a "workspace" source, which forced me to try Pipeline and PipelineRun. I've never used Pipeline or PipelineRun before, so it took a lot of time for me to make it running. I generally followed the test run. However, after I changed the PipelineRun to run as non-root (the usual uid 1111 / gid 2222 / fsgroup 3333 stuff), I hit the following error, which seems a bit familiar:

error: could not lock config file //.gitconfig: Permission denied

full log:

[clone] 2020/07/07 20:10:10 unsuccessful cred copy: ".docker" from "/tekton/creds" to "/": unable to create destination directory: mkdir /.docker: permission denied
[clone] 2020/07/07 20:10:10 unsuccessful cred copy: ".gitconfig" from "/tekton/creds" to "/": unable to open destination: open /.gitconfig: permission denied
[clone] 2020/07/07 20:10:10 unsuccessful cred copy: ".git-credentials" from "/tekton/creds" to "/": unable to open destination: open /.git-credentials: permission denied
[clone] 2020/07/07 20:10:10 unsuccessful cred copy: ".ssh" from "/tekton/creds" to "/": unable to create destination directory: mkdir /.ssh: permission denied
[clone] + CHECKOUT_DIR=/workspace/output/
[clone] + '[[' true '==' true ]]
[clone] + cleandir
[clone] + '[[' -d /workspace/output/ ]]
[clone] + rm -rf /workspace/output//build /workspace/output//build.gradle /workspace/output//pom.xml /workspace/output//src
[clone] + rm -rf /workspace/output//.git /workspace/output//.gradle
[clone] + rm -rf '/workspace/output//..?*'
[clone] + test -z 
[clone] + test -z 
[clone] + test -z 
[clone] + /ko-app/git-init -url https://github.com/chanseokoh/appengine-diagnosis -revision master -refspec  -path /workspace/output/ '-sslVerify=true' '-submodules=true' -depth 1
[clone] {"level":"error","ts":1594152611.4998615,"caller":"git/git.go:41","msg":"Error running git [config --global http.sslVerify true]: exit status 255\nerror: could not lock config file //.gitconfig: Permission denied\n","stacktrace":"github.com/tektoncd/pipeline/pkg/git.run\n\tgithub.com/tektoncd/pipeline/pkg/git/git.go:41\ngithub.com/tektoncd/pipeline/pkg/git.Fetch\n\tgithub.com/tektoncd/pipeline/pkg/git/git.go:84\nmain.main\n\tgithub.com/tektoncd/pipeline/cmd/git-init/main.go:52\nruntime.main\n\truntime/proc.go:203"}
[clone] {"level":"warn","ts":1594152611.4999757,"caller":"git/git.go:85","msg":"Failed to set http.sslVerify in git config: exit status 255"}
[clone] {"level":"fatal","ts":1594152611.4999988,"caller":"git-init/main.go:53","msg":"Error fetching git repository: exit status 255","stacktrace":"main.main\n\tgithub.com/tektoncd/pipeline/cmd/git-init/main.go:53\nruntime.main\n\truntime/proc.go:203"}

container step-clone has failed  : [{"key":"StartedAt","value":"2020-07-07T20:10:11.453Z","resourceRef":{}}]

@chanseokoh
Copy link
Contributor Author

I also wonder if I have to use Pipeline and PipelineRun if I want to fetch workspace source for the jib-gradle task from GitHub. Previously, I was able to define a git resource in TaskRun, but now I'm not sure how I can easily achieve getting source from GitHub in TaskRun if the task is using a workspace.

@ghost
Copy link

ghost commented Jul 8, 2020

However, after I changed the PipelineRun to run as non-root (the usual uid 1111 / gid 2222 / fsgroup 3333 stuff), I hit the following error, which seems a bit familiar:

This is an example of where "$HOME" doesn't exist. I'm guessing user 1111 doesn't have an entry in /etc/passwd. And therefore doesn't have a home directory. What's happened is that the git-init binary is delegating to git, running the git config --global http.sslVerify true command. --global tells git to try and access config in ~. ~ doesn't exist for UID 1111, and so just resolves to /. So git in turn isn't able to get a lock on the global git config file since UID 1111 doesn't have write access to /. Hm. Without ~ it's tricky to know precisely what to tell git to do here.

I also wonder if I have to use Pipeline and PipelineRun if I want to fetch workspace source for the jib-gradle task from GitHub. Previously, I was able to define a git resource in TaskRun, but now I'm not sure how I can easily achieve getting source from GitHub in TaskRun if the task is using a workspace.

Yeah, right now PipelineResources are getting a heavy amount of attention / redesign so pipelines + workspaces are preferred for this kind of "fetch-before-use" behaviour. There's nothing stopping you using a jib-gradle Task with PipelineResources but I guess @vdemeester thought it better to put the beta version of the catalog more in line with the beta APIs that Tekton exposes (which doesn't include PipelineResources - they're still alpha).

@chanseokoh
Copy link
Contributor Author

Thanks for the explanation. The git behavior seems fair. I guess the UID 1111 example isn't something we will see in practice and probably the image should have a home for it. I think then this is basically working in my case.

One last question: Gradle auto-creates ~/.gradle, and jib-gradle is designed to cache two subdirectories (~/.gradle/caches and ~/.gradle/wrapper) through volume mounts. When running the task as non-root, I noticed that ~/.gradle becomes owned by root:root (while ~/.gradle/caches and ~/.gradle/wrapper are world-writable) and Gradle can't access it. The workaround I use for testing is to mount an emptyDir volume at ~/.gradle only to make it globally writable, as you mentioned in #2165 (comment). Is this an acceptable and reasonable approach?

@ghost
Copy link

ghost commented Jul 8, 2020

Interesting that ~/.gradle ends up owned by root:root, I'm not sure where that's happening. I've had a quick dig through the kubernetes source code but couldn't find anything specifically that sets a parent directory of a mount to be root:root owned.

Regardless, I think mounting the writable emptyDir to ~/.gradle is the right way to go here. Acceptable and reasonable!

@vyom-soft
Copy link

Hello,
I an new to Tekton, I am facing the problem. How should I over this is there any approach? sample?

"image-digest-exporter-d6kxs] 2020/08/21 08:55:40 unsuccessful cred copy: ".docker" from "/tekton/creds" to "/tekton/home": unable to open destination: open /tekton/home/.docker/config.json: permission denied"

$ tkn version
Client version: 0.11.0
Pipeline version: v0.15.2
Triggers version: unknown
Thank you
Sanjeev

@vessela991
Copy link

Same problem as @vyom-soft
I am trying to pull image from private registry (harbor) and assume this is the reason I am not able to. Any suggestions are welcomed :)

tkn version
Client version: 0.11.0
Pipeline version: v0.15.2
Triggers version: unknown

@ghost
Copy link

ghost commented Aug 21, 2020

It would be great to see the Task (kubectl get -o yaml task <name of task>) and TaskRun (kubectl get -o yaml taskrun <name of taskrun>) involved. If you can also show the Pod YAML (kubectl get -o yaml pod <pod name from taskrun yaml>) that ran for the TaskRun as well that would be excellent.

Please make sure to sanitize / remove any data that is sensitive before posting them here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants