Skip to content

geekzter/azure-pipeline-agents

Repository files navigation

Azure Pipeline Agents for Private Network Connectivity

Build Status Build Status Build Status Build Status

Azure Pipelines includes Microsoft-hosted Agents as a managed service. If you can use these agents I recommend you do so as they provide the best managed experience.

However, there may be scenarios where you need to manage your own agents:

  • Network access to your private resources e.g. geekzter/azure-aks
  • Configuration can't be met with any of the hosted agents (e.g. Linux distribution, Windows version)
  • Improve build times by caching artifacts

The first point is probably the most common reason to set up your own agents. With the advent of Private Link it is more common to deploy Azure Services so that they can only be accessed from a virtual network. This requires an agent hosting model that fits that constraint.

Architecture

This repository contains Virtual Network integrated Azure Pipeline scale set agents and self-hosted agents that can build the VM images it itself uses.

Azure services used include:

  • Bastion
  • Compute Gallery
  • Firewall
  • NAT Gateway
  • Pipelines
  • Storage File Share
  • Virtual Network

Tools used are:

  • Azure CLI
  • cloud-init
  • Packer
  • PowerShell
  • Terraform

Infrastructure

This repo will provision an Azure Virtual Machine Scale Set in a Virtual Network. It will provision an egress device (Firewall or NAT Gateway) and remote access (Bastion). A choice can be made between a NAT Gateway (optimize cost) or Azure Firewall (optimize control) depending on the deploy_azure_firewall Terraform variable. This also impacts the extend to which resources are connected via Private Endpoints.

To enable Virtual Network integrated image builds with build-image-isolated.yml, a separate Virtual Network (and resource group, optionally in a different subscription) to be used by Packer is created. For the image build VM's themselves yet another resource group is created. Policy is assigned to this resource group to prevent image build time VM extension installation which would render the image unusable (we want to install extensions at deploy time, not build time).

Image lifecycle

The build-image.yml pipeline uses the method and scripts described on the actions/runner-images GitHub repo to build a managed image with the same configuration Azure DevOps and GitHub Actions are using for Microsoft-hosted agents and GitHub-hosted runners. The GenerateResourcesAndImage.ps1 script does the heavy lifting of building the managed image with Packer. This pipeline can run on Microsoft-hosted agents ('Azure Pipelines' pool).

In Enterprise you will have isolation requirements (e.g. no public endpoints), build in a Virtual Network, protect the identity used for the build, etc. To accommodate such requirements the build-image-isolated.yml takes the packer templates and provides the variables required to customize the VM that is used to create the image from. This pipeline needs to run on a self-hosted agent such as the scale set agents deployed by this repository.

Licensing

Note that by building an image you are accepting licenses pertaining to the tools installed at software installation (i.e. build) time.

Agent lifecycle

With the aforementioned image template created by actions/runner-images, or an Azure Marketplace image, you can make sure you're always on the latest version. Instead of post deployment patching, an immutable infrastructure approach is taken when new versions of the image are built instead of patching VM's.

Lifecycle steps are:

  • A Virtual Machine Scale Set (VMSS) is created with the (at that time) latest version of an image
  • Adding the VMSS has a scale set agent pool ensures the Azure Pipelines agent is installed
  • When a pipeline job needs to be run, a VMMS instance is assigned to run the job
  • When the pipeline completes, the VMSS instance is destroyed
  • When Virtual Machine Scale Set needs a new instance, an instance is created from the latest VM image version

The above ensures VM instances are kept up to date. The speed of this is controlled by the minimum and maximum number of instances if the scale set agents pool (as configured in Azure DevOps).

Infrastructure Provisioning

To customize provisioning, see configuration.

Provision with Codespace

The easiest method is to use a GitHub Codespace. Just create a GitHub Codespace from the Code menu or page. This will create a Codespace with prerequisites installed. Wait until Codespace preparation including post create commands have been completed to start a clean shell (pwsh).
If your prompt looks like this, post creation has not yet finished:
PS /workspaces/azure-pipeline-agents>
Instead a terminal should look like:
/workspaces/azure-pipeline-agents/scripts [master ≡]>
Follow the instructions shown in the terminal to provision infrastructure.

Environment variables

If you fork this repository on GitHub, you can define Codescape secrets. These will be surfaced as environment variables with the same name. Defining secrets for ARM_TENANT_ID and ARM_SUBSCRIPTION_ID will make sure you target the right Azure subscription.

Session Management

You can reconnect to disconnected terminal sessions using tmux. This blog post explains how that works. Just type
ct <terraform workspace>
to enter a tmux session with the terraform workspace environment variable TF_WORKSPACE set. Type the same to get back into a previously disconnected session. This can be done up to the timeout configured in Codespaces.

Provision locally

Pre-requisites

If you set this up locally, make sure you have the following pre-requisites:

Interactive

Run:
scripts/deploy.ps1 -Apply
This will also log into Azure and let you select a subscription in case ARM_SUBSCRIPTION_ID is not set.

Provision from Pipeline

This repo contains a pipeline that can be used for CI/CD. You'll need the Azure Pipelines Terraform Tasks extension installed. To be able to create Self-Hosted Agents, the 'Project Collection Build Service (org)' group needs to be given 'Administrator' permission to the Agent Pool, and 'Limit job authorization scope to current project for non-release pipelines' disabled. For this reason, it is recommended to have a dedicated project for this pipeline.

Configuration

Self-hosted Agents

Self-hosted Agents are the predecessor to Scale Set Agents. They also provide the ability to run agents anywhere (including outside Azure). However, you have to manage the full lifecycle of each agent instance. I still include this approach as separate Terraform modules for Ubuntu & Windows. It involves installing the VM agent as described on this page for Linux.

Set Terraform variable deploy_self_hosted to true to provision self-hosted agents. You will also need to set azdo_pat and azdo_org.

Scale Set Agents

Scale Set Agents leverage Azure Virtual Machine Scale Sets. The lifecycle of individual agents is managed by Azure DevOps, therefore I recommend Scale Set Agents over Self-hosted agents.

Set Terraform variable deploy_azure_scale_set to true to provision scale set agents.

The software in the scale set (I use Ubuntu only), is installed using cloud-init.

Note this also sets up some environment variables on the agent e.g. PIPELINE_DEMO_AGENT_VIRTUAL_NETWORK_ID that can be used in pipelines to set up a peering connection from (see example below).

Feature toggles

Features toggles are declared in variables.tf and can be overridden by creating a .auto.tfvars file (see config.auto.tfvars.sample), or environemt variables e.g. TF_VAR_deploy_self_hosted="true".

Terraform variable Feature
configure_azure_cidr_allow_rules Configure allow rules for IP ranges documented here. When enabled traffic allowed by this rule will not have FQDN's shown in the logs.
configure_azure_crl_oscp_rules Allow traffic to TLS recommended locations. This is plain HTTP (port 80) traffic used by Certificate Revocation List (CRL) download and/or Online Certificate Status Protocol (OCSP).
configure_azure_wildcard_allow_rules Configure generic wildcard FQDN rules e.g. *.blob.core.windows.net.
deploy_azure_bastion Deploy managed bastion host.
deploy_azure_files_share Deploy SMB files share, mount it on agents and configure Pipeline Agent diagnostics (_diag directory) to use it.
deploy_azure_firewall Instead of NAT Gateway, uses Azure Firewall for network egress traffic. This allows you to control outbound traffic e.g. by FQDN, as well as monitor it. Setting this value to true will also create private endpoints for storage used, Azure Monitor, etc.
deploy_non_essential_azure_vm_extensions Deploy monitoring extensions. These extensions generate their own network traffic. This variable allows you to turn them off.
deploy_azure_scale_set Deploy Scale Set agents.
deploy_azure_self_hosted_vms Deploy Self-Hosted agent VMs.
deploy_azdo_self_hosted_vm_agents Deploy Self-Hosted agent VM extensions.
linux_tools Uses cloud-init to instal tools (e.g. AzCopy, Packer, PowerShell, PowerShell Azure modules). Should not be used when using a pre-baked image.
azure_linux_os_image_id Use pre-baked image by specifying the resource id of a VM image e.g. /subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/Shared/providers/Microsoft.Compute/galleries/SharedImages/images/Ubuntu2204/versions/latest
azure_log_analytics_workspace_id Providing a value of an existing Log Analytics workspace allows you to retain logs after infrastructure is destroyed.
azure_windows_os_image_id Use pre-baked image by specifying the resource id of a VM image e.g. /subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/Shared/providers/Microsoft.Compute/galleries/SharedImages/images/Windows2022/versions/latest

Pipeline use

This yaml snippet shows how to reference the scale set pool and use the environment variables set by the agent:

pool:
  name: 'Scale Set Agents 1' # Name of the Scale Set Agent Pool you created

steps:
- pwsh: |
    # Use pipeline agent virtual network as VNet to peer from
    $env:TF_VAR_peer_network_id = $env:PIPELINE_DEMO_AGENT_VIRTUAL_NETWORK_ID

    # Terraform will use $env:PIPELINE_DEMO_AGENT_VIRTUAL_NETWORK_ID as value for input variable 'peer_network_id' 
    # Create on-demand peering... (e.g. https://github.com/geekzter/azure-aks)

Troubleshooting access

If you are using Azure Firewall, and find things are failing, you can monitor allowed & blocked traffic with Log Analytics queries e.g.

AzureDiagnostics
| where Category == "AzureFirewallApplicationRule" or Category == "AzureFirewallNetworkRule"
| where msg_s contains "Deny"
| order by TimeGenerated desc
| project TimeGenerated, msg_s

For more elaborate queries, check the kusto directory.