Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore moving away from init containers #224

Closed
bobcatfish opened this issue Nov 5, 2018 · 3 comments
Closed

Explore moving away from init containers #224

bobcatfish opened this issue Nov 5, 2018 · 3 comments
Labels
meaty-juicy-coding-work This task is mostly about implementation!!! And docs and tests of course but that's a given

Comments

@bobcatfish
Copy link
Collaborator

bobcatfish commented Nov 5, 2018

Expected Behavior

Currently knative/build-pipeline uses knative/build Build primitives as a way of sequentially running containers. Build uses init-containers as a means of running these sequential steps which has drawbacks including:

  • No k8s native log persistence for init-containers run as Steps
  • Disallows use of side-car containers
  • Disallows parallelization within Builds/Steps where it would otherwise be possible
  • Integration with 3rd party logging/metrics can be hindered as init containers interact differently

Actual Behavior

Steps to Reproduce the Problem

init-container logs example:

  1. Run a Pod using init-containers with restartPolicy:never as Build uses:
    https://github.com/knative/build/blob/c5d66f1ca208fa2ea3408eb46f0328e4cd390bd1/pkg/builder/cluster/convert/convert.go#L353

Example below:

apiVersion: v1
kind: Pod
metadata:
  name: init-pod
  labels:
    app: init-pod
spec:
  restartPolicy: Never
  containers:
  - name: app-container
    image: busybox
    command: ['sh', '-c', 'echo hi && exit 0']
  initContainers:
  - name: init-myservice
    image: busybox
    command: ['sh', '-c', 'echo init-myservice is running!']
  - name: init-myservice2
    image: busybox
    command: ['sh', '-c', 'echo init-myservice2 is running!']
  1. Let the pod complete entirely.
  2. Attempt to get the logs from an init-container:
kubectl logs init-pod -c init-myservice
kubectl logs init-pod -c init-myservice2

Will see:

Unable to retrieve container logs for docker://67f5c7b5e3c635460b462e0e7047e91494
d7f8bf3cbae3331a8e5708dbb2644e

Additional Info

Blocking #107

@bobcatfish
Copy link
Collaborator Author

PR in build repo: knative/build#470

@bobcatfish bobcatfish added the meaty-juicy-coding-work This task is mostly about implementation!!! And docs and tests of course but that's a given label Jan 10, 2019
bobcatfish added a commit to bobcatfish/pipeline that referenced this issue Jan 27, 2019
We noticed early on that logs from init containers are often cleaned up
immediately by k8s, particularly if the containers are short running
(e.g. just echoing "hello world"). We started down a path to correct
that, which takes an approach based on Prow's entrypoint solution
(https://github.com/kubernetes/test-infra/tree/master/prow/cmd/entrypoint)
(even using the same image at the moment!) which wraps the user's
provided command and streams logs to a volume, from which the logs can
be uploaded/streamed by a sidecar.

Since we are using init containers for step execution, we can't yet use
sidecars, but we are addressing that in tektoncd#224 (also an entrypoint
re-writing based solution). Once we have that, we can sidecar support,
starting with GCS as a POC (#107) and moving into other types.

In the meantime, to enable us to get logs (particularly in tests), we
had the taskrun controller create a PVC on the fly to hold logs. This
has two problems:
* The PVCs are not cleaned up so this is an unexpected side effect for
  users
* Combined with PVC based input + ouput linking, this causes scheduling
  problems for the resulting pods (tektoncd#375)

Now that we want to have an official release, this would be a bad state
to release in, so we will remove this magical log PVC creation logic,
which was never our intended end state anyway.

Since we _do_ need the entrypoint rewriting and log interception logic
in the long run, this commit leaves most functionality intact, removing
only the PVC creation and changing the volume being used to an
`emptyDir`, which is what we will likely use for #107 (and this is how
Prow handles this as well). This means the released functionality will
be streaming logs to a location where nothing can read them, however I
think it is better than completely removing the functionality b/c:
1. We need the functionality in the long run
2. Users should be prepared for this functionality (e.g. dealing with
   edge cases around the taskrun controller being able to fetch an
   image's entrypoint)

Fixes tektoncd#387
bobcatfish added a commit to bobcatfish/pipeline that referenced this issue Jan 30, 2019
We noticed early on that logs from init containers are often cleaned up
immediately by k8s, particularly if the containers are short running
(e.g. just echoing "hello world"). We started down a path to correct
that, which takes an approach based on Prow's entrypoint solution
(https://github.com/kubernetes/test-infra/tree/master/prow/cmd/entrypoint)
(even using the same image at the moment!) which wraps the user's
provided command and streams logs to a volume, from which the logs can
be uploaded/streamed by a sidecar.

Since we are using init containers for step execution, we can't yet use
sidecars, but we are addressing that in tektoncd#224 (also an entrypoint
re-writing based solution). Once we have that, we can sidecar support,
starting with GCS as a POC (#107) and moving into other types.

In the meantime, to enable us to get logs (particularly in tests), we
had the taskrun controller create a PVC on the fly to hold logs. This
has two problems:
* The PVCs are not cleaned up so this is an unexpected side effect for
  users
* Combined with PVC based input + ouput linking, this causes scheduling
  problems for the resulting pods (tektoncd#375)

Now that we want to have an official release, this would be a bad state
to release in, so we will remove this magical log PVC creation logic,
which was never our intended end state anyway.

Since we _do_ need the entrypoint rewriting and log interception logic
in the long run, this commit leaves most functionality intact, removing
only the PVC creation and changing the volume being used to an
`emptyDir`, which is what we will likely use for #107 (and this is how
Prow handles this as well). This means the released functionality will
be streaming logs to a location where nothing can read them, however I
think it is better than completely removing the functionality b/c:
1. We need the functionality in the long run
2. Users should be prepared for this functionality (e.g. dealing with
   edge cases around the taskrun controller being able to fetch an
   image's entrypoint)

Fixes tektoncd#387
bobcatfish added a commit to bobcatfish/pipeline that referenced this issue Jan 30, 2019
We noticed early on that logs from init containers are often cleaned up
immediately by k8s, particularly if the containers are short running
(e.g. just echoing "hello world"). We started down a path to correct
that, which takes an approach based on Prow's entrypoint solution
(https://github.com/kubernetes/test-infra/tree/master/prow/cmd/entrypoint)
(even using the same image at the moment!) which wraps the user's
provided command and streams logs to a volume, from which the logs can
be uploaded/streamed by a sidecar.

Since we are using init containers for step execution, we can't yet use
sidecars, but we are addressing that in tektoncd#224 (also an entrypoint
re-writing based solution). Once we have that, we can sidecar support,
starting with GCS as a POC (#107) and moving into other types.

In the meantime, to enable us to get logs (particularly in tests), we
had the taskrun controller create a PVC on the fly to hold logs. This
has two problems:
* The PVCs are not cleaned up so this is an unexpected side effect for
  users
* Combined with PVC based input + ouput linking, this causes scheduling
  problems for the resulting pods (tektoncd#375)

Now that we want to have an official release, this would be a bad state
to release in, so we will remove this magical log PVC creation logic,
which was never our intended end state anyway.

Since we _do_ need the entrypoint rewriting and log interception logic
in the long run, this commit leaves most functionality intact, removing
only the PVC creation and changing the volume being used to an
`emptyDir`, which is what we will likely use for #107 (and this is how
Prow handles this as well). This means the released functionality will
be streaming logs to a location where nothing can read them, however I
think it is better than completely removing the functionality b/c:
1. We need the functionality in the long run
2. Users should be prepared for this functionality (e.g. dealing with
   edge cases around the taskrun controller being able to fetch an
   image's entrypoint)

Fixes tektoncd#387
bobcatfish added a commit to bobcatfish/pipeline that referenced this issue Jan 30, 2019
We noticed early on that logs from init containers are often cleaned up
immediately by k8s, particularly if the containers are short running
(e.g. just echoing "hello world"). We started down a path to correct
that, which takes an approach based on Prow's entrypoint solution
(https://github.com/kubernetes/test-infra/tree/master/prow/cmd/entrypoint)
(even using the same image at the moment!) which wraps the user's
provided command and streams logs to a volume, from which the logs can
be uploaded/streamed by a sidecar.

Since we are using init containers for step execution, we can't yet use
sidecars, but we are addressing that in tektoncd#224 (also an entrypoint
re-writing based solution). Once we have that, we can sidecar support,
starting with GCS as a POC (#107) and moving into other types.

In the meantime, to enable us to get logs (particularly in tests), we
had the taskrun controller create a PVC on the fly to hold logs. This
has two problems:
* The PVCs are not cleaned up so this is an unexpected side effect for
  users
* Combined with PVC based input + ouput linking, this causes scheduling
  problems for the resulting pods (tektoncd#375)

Now that we want to have an official release, this would be a bad state
to release in, so we will remove this magical log PVC creation logic,
which was never our intended end state anyway.

Since we _do_ need the entrypoint rewriting and log interception logic
in the long run, this commit leaves most functionality intact, removing
only the PVC creation and changing the volume being used to an
`emptyDir`, which is what we will likely use for #107 (and this is how
Prow handles this as well). This means the released functionality will
be streaming logs to a location where nothing can read them, however I
think it is better than completely removing the functionality b/c:
1. We need the functionality in the long run
2. Users should be prepared for this functionality (e.g. dealing with
   edge cases around the taskrun controller being able to fetch an
   image's entrypoint)

Fixes tektoncd#387
knative-prow-robot pushed a commit that referenced this issue Jan 31, 2019
We noticed early on that logs from init containers are often cleaned up
immediately by k8s, particularly if the containers are short running
(e.g. just echoing "hello world"). We started down a path to correct
that, which takes an approach based on Prow's entrypoint solution
(https://github.com/kubernetes/test-infra/tree/master/prow/cmd/entrypoint)
(even using the same image at the moment!) which wraps the user's
provided command and streams logs to a volume, from which the logs can
be uploaded/streamed by a sidecar.

Since we are using init containers for step execution, we can't yet use
sidecars, but we are addressing that in #224 (also an entrypoint
re-writing based solution). Once we have that, we can sidecar support,
starting with GCS as a POC (#107) and moving into other types.

In the meantime, to enable us to get logs (particularly in tests), we
had the taskrun controller create a PVC on the fly to hold logs. This
has two problems:
* The PVCs are not cleaned up so this is an unexpected side effect for
  users
* Combined with PVC based input + ouput linking, this causes scheduling
  problems for the resulting pods (#375)

Now that we want to have an official release, this would be a bad state
to release in, so we will remove this magical log PVC creation logic,
which was never our intended end state anyway.

Since we _do_ need the entrypoint rewriting and log interception logic
in the long run, this commit leaves most functionality intact, removing
only the PVC creation and changing the volume being used to an
`emptyDir`, which is what we will likely use for #107 (and this is how
Prow handles this as well). This means the released functionality will
be streaming logs to a location where nothing can read them, however I
think it is better than completely removing the functionality b/c:
1. We need the functionality in the long run
2. Users should be prepared for this functionality (e.g. dealing with
   edge cases around the taskrun controller being able to fetch an
   image's entrypoint)

Fixes #387
@bobcatfish bobcatfish added this to the Moar features for 0.1 milestone Feb 6, 2019
@bobcatfish bobcatfish assigned imjasonh and unassigned aaron-prindle and imjasonh Feb 8, 2019
@bobcatfish bobcatfish removed this from the Moar features for 0.1 milestone Feb 8, 2019
@bobcatfish
Copy link
Collaborator Author

Quick update on the current state of this work;

  1. @aaron-prindle created No init containers knative/build#470 which overrides the entrypoint with a custom image that allows us to start containers for each step in a Task within the same pod.

  2. This PR was pretty massive and hard to get all working at once so he started breaking it up, Entrypoint img knative/build#522 was the first step - that PR adds just the entrypoint image itself with tests, etc., @imjasonh took this over and got that PR merged but into build-pipeline instead (via Add entrypoint overrider image #448)

  3. So now the work still remains to actually hook up the image and use it

@nader-ziada
Copy link
Member

/assign

chmouel pushed a commit to chmouel/tektoncd-pipeline that referenced this issue Dec 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meaty-juicy-coding-work This task is mostly about implementation!!! And docs and tests of course but that's a given
Projects
None yet
Development

No branches or pull requests

4 participants