Using Azure Kubernetes Service can be challenging, Building it the right way can be difficult and best practices can sometimes be confusing. In this guide, I'm going to explain how to build and deploy AKS demanding workloads using Ephemeral OS disks and nodepools. This blueprint will create a good base of using AKS and utilizing best practices to maximize performance. I have created LightBeacon in order to make AKS accessible for people who're new to AKS as a whole and wish to have a guide that is simplified and accessible.
Important Note : I will not be addressing Security and/or Identity governance in this guide.
- An Azure subscription
First, make sure your Azure CLI is properly configured on your workstation. You can download Azure CLI from here.
Download kubectl if you haven't already :
sudo az aks install-cli
Make sure you're logged in and you have aks-preview extension installed :
az login
Then :
az extension add -n aks-preview
This will make stuff easier later on.
Throughout the tutorial I'll be using WSL 2 with Ubuntu 20 on Windows 10.
Note : I will be using the names 'testaks' for the cluster and 'testaks-rg' for the resource group. Change it to whatever you like.
Create the Resource Group :
az group create -n testaks-rg -l eastus
I'm using East US for demonstration purposes but you can use w/e region close to you. I'm going to use version 1.19.3 as I'm using this for a Dev environment. We'll build the first nodepool normally and then we will make adjustments.
Let's use this command to create the initial cluster :
az aks create --name testaks --resource-group testaks-rg --kubernetes-version 1.19.3 --node-count 3 --node-vm-size Standard_D2_v3 --network-plugin azure --generate-ssh-keys --debug
Note: ARM will automatically select the region where the resource group is located at, therefore there is no need for an -l trigger unless you want to specify a different region.
Alright, now we have the cluster available in East US :
Now, let's connect to the fresh cluster.
Let's use get-credentials to connect to our cluster :
az aks get-credentials -n testaks -g testaks-rg
Now, we can run kubectl get nodes -o wide to see what we have currently :
Awesome, we have 3 nodes running in a healthy state.
Let's view the current state of our nodepool :
az aks nodepool list --cluster-name testaks --resource-group testaks-rg -o table
Great, we have one nodepool classified as System.
System is the default nodepool selected at the initial creation of the cluster. This is to make sure that all kube-system related pods can be deployed without any issue so the cluster's creation can be completed successfully.
We can verify that using the below command :
In the current state, our single nodepool will accept all deployments and will co-host the system pods with the application pods. Now, let's change the overall structure of the cluster, to accommodate 2 more nodepools, with a tweak. We'll add another System nodepool that will host our system pods using Ephemeral disks which will be much faster, and a User nodepool that will host our application and will be dedicated to it.
Using this structure, we're omitting issues like IOPS and Networking throughput that can affect our application and our system pods/services, by sepearting them from eachother and implementing better governance into AKS.
Let's add our new System nodepool. Pay close attention to --node-taints trigger added in the command and to --node-osdisk-type trigger :
az aks nodepool add --name ephsystem --cluster-name testaks --resource-group testaks-rg --node-vm-size Standard_DS3_v2 --node-osdisk-type Ephemeral --node-taints CriticalAddonsOnly=true:NoSchedule --node-count 2 --mode System
We've told Azure to create another System nodepool, Using a different VM size to accomodate Ephemeral OS disk because of disk cache considerations, We've also added a special taint to stop non-system workloads from being scheduled to this new nodepool.
After our command has finished running, let's use az aks nodepool list --cluster-name testaks --resource-group testaks-rg -o table
to check our nodepool state :
Great, our new System nodepool is up and running.
Now, let's create the User nodepool for our application :
az aks nodepool add --name appnopool --cluster-name testaks --resource-group testaks-rg --node-vm-size Standard_D2_v3 --node-count 2 --mode User
Note : Pay attention that we're back to the original VM size since we do not have a special need for Ephemeral disks. also, I'm using a small VM size since this is a dev environment, in a production workload, use the right VM size that is applicable to your application. Also, there are no taints here since we want this nodepool to be accessible.
Running our az aks nodepool list --cluster-name testaks --resource-group testaks-rg -o table
command will show 3 nodepools now :
Now, before we eliminate our old nodepool, let's verify a few things. Let's use kubectl get pods -n kube-system to view our systempods, And deep dive into one of them using kubectl describe pod.
For this example, let's checkout our CoreDNS pod and scroll down to Tolerations:
Remember our CriticalAddonsOnly taint? Azure automatically tags systempods this way. In this method, we'll make sure that after our cluster has lost it's original nodepool, it will keep running, using our new Ephemeral System nodepool. Let's delete our original nodepool :
az aks nodepool delete --cluster-name testaks --resource-group testaks-rg --name nodepool1
Azure will now gracefully cordon & drain all of the nodes in the original nodepool and delete them afterwards. After the deletion has been completed, let's use the az aks nodepool list command again to view our state :
Now we have 2 nodepools. One marked as System, and one marked as User.
Let's check what's going on with our system pods :
Great, seems like everything is running just fine.
Now, to shake things a bit, let's delete one of the kube-proxy pods to see what's going to happen :
Great, now, let's run kubectl get pods -n kube-system
again to check out our new pod :
Let's see where was it scheduled to, using kubectl describe pod [podname] -n kube-system
:
The pod was successfully rescheduled to our System nodepool, which means we have configured our cluster correctly.
What will happen if we'll schedule a normal non-system workload to our cluster? Let's find out.
For this, let's create an Azure Container Registry that will host our test application.
az acr create -n youracrname -g testaks-rg --sku Standard
Note : change 'youracrname' to a unique name for your Container Registry.
Now, let's attach our newly created Container Registry to AKS, the easy way :
az aks update -n testaks -g testaks-rg --attach-acr youracrname
We are good to go now.
Let's git clone this repo into a folder called demo :
mkdir demo && cd demo && git clone https://github.com/msftnadavbh/lightbeacon-aks.git && cd lightbeacon-aks/source
We've have cloned the repo and we're into the source folder, which holds a nice Dockerfile running a sample PHP webpage. Let's use az acr build to build the application directly into our ACR :
az acr build -t sample/webpage -r youracrname .
Now we're building the Docker image without the need of a Docker engine, directly on Azure. If you have Docker installed, you can try to fetch it using :
docker pull youracrname.azurecr.io/sample/webpage:latest
Good, now you have a working Azure Container Registry which is attached to AKS with a test web application. Now, let's try and deploy it - and see what happens.
Navigate back to source folder, there you'll find a cool file called deployment.yml. We will use this manifest to deploy to our AKS.
Using your favorite text editor, open the file and edit the line which holds the DNS FQDN of the Azure Container Registry :
kind: Deployment
metadata:
name: web-deployment
spec:
selector:
matchLabels:
app: staticweb
replicas: 3
template:
metadata:
labels:
app: staticweb
spec:
containers:
- name: staticweb
image: youracrname.azurecr.io/sample/webpage
imagePullPolicy: Always
ports:
- containerPort: 80
Change "youracrname" to the name of your newly created Azure Container Registry and save it.
Let's deploy the static webpage into AKS. Use kubectl apply -f deployment.yml command to fire our app into Kubernetes :
Now our application is deployed into AKS. Let's check where our application landed.
Use kubectl get pods -n default to check our static web pods :
Let's deep dive into one of them. I've chosen web-deployment-6b66dc5d85-52d6w
.
Use kubectl describe pod [podname]
and check out the Events section :
As you can see, our application landed on the User nodepool, which is meant for our application. Now our cluster configuration is complete.
If you're curious and you want to see the application, use this command to make AKS expose your application pods so you can see the web page running :
kubectl expose deployment web-deployment --type=LoadBalancer --name=web-svc
This will take sometime to complete.
You can check the progress at kubectl get svc
:
Notice that our web-svc now has an external IP. Open your favorite web browser and let's browse to see what we have :
Congratulations! you've finished this tutorial. What you've learned :
- Types of nodepools on AKS
- How to use Ephemeral OS on AKS and VM sizes available for it
- How to taint your nodepool only for system workloads
- How to build a Docker Image directly on Azure Container Registry
- How to seamlessly connect between AKS and Azure Container Registry
Please let me know what you think at : nadebu@outlook.com