Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forced Replacement of azurerm_role_assignment Resource Due to principal_id Evaluation Order in terraform-azurerm-aks Module #617

Closed
1 task done
morbendor opened this issue Dec 15, 2024 · 9 comments
Labels
invalid This doesn't seem right waiting-response

Comments

@morbendor
Copy link

morbendor commented Dec 15, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Greenfield/Brownfield provisioning

greenfield

Terraform Version

1.9.2

Module Version

9.2.0

AzureRM Provider Version

3.117.0

Affected Resource(s)/Data Source(s)

azurerm_role_assignment.network_contributor_on_subnet

Terraform Configuration Files

module "aks_ods01" {
  source  = "Azure/aks/azurerm"
  version = "9.2.0"

  resource_group_name = local.rg
  location            = local.location
  # node_resource_group             = "MC_rg-nonprod-shared"
  kubernetes_version              = local.cluster_kubernetes_version
  cluster_name                    = data.azurecaf_name.aks_name.result
  prefix                          = data.azurecaf_name.aks_name.result
  log_analytics_workspace_enabled = false
  rbac_aad                        = false
  private_cluster_enabled         = local.private_cluster_enabled
  vnet_subnet_id                  = local.subnet_id
  identity_type                   = "UserAssigned"
  identity_ids                    = [azurerm_user_assigned_identity.main.id]
  client_id                       = azurerm_user_assigned_identity.main.principal_id
  os_disk_size_gb                 = 60
  sku_tier                        = local.cluster_sku_tier
  node_pools                      = local.nodes
  enable_auto_scaling             = local.default_nodepool_auto_scaling
  agents_availability_zones       = local.default_nodepool_zones
  agents_pool_name                = local.default_nodepool_name
  agents_count                    = local.default_nodepool_agents_count
  agents_min_count                = local.default_nodepool_min_count
  agents_max_count                = local.default_nodepool_max_count
  temporary_name_for_rotation     = local.default_nodepool_temporary_name_for_rotation
  network_contributor_role_assigned_subnet_ids = {
    "subnet_id" = local.subnet_id
  }
  tags       = local.aks_tags
  depends_on = [azurerm_role_assignment.main]
}

tfvars variables values

locals {
  location                                           = "uksouth"
  workload                                           = "nonprod-"
  environment                                        = "qa"
  az_region                                          = "uks"
  cluster_designation                                = "01"
  rg                                           = "rg-nonprod--qa-uks"
  subnet_id                                    = "/subscriptions/500adf42--76039abcdae0/resourceGroups/rg-nonprod--qa-uks/providers/Microsoft.Network/virtualNetworks/vnet-nonprod--qa-uks/subnets/snet-nonprod--qa-uks-01"
  routetable_id                                = "/subscriptions/500adf42--96e1-76039abcdae0/resourceGroups/rg-nonprod--qa-uks/providers/Microsoft.Network/routeTables/route-nonprod--qa-uks-01"
  cluster_kubernetes_version                   = "1.30"
  default_nodepool_name                        = "system"
  default_nodepool_agents_count                = 2
  default_nodepool_min_count                   = 2
  default_nodepool_max_count                   = 4
  default_nodepool_zones                       = ["1"]
  default_nodepool_temporary_name_for_rotation = "${local.default_nodepool_name}tmp"
  cluster_sku_tier                             = "Standard"
  private_cluster_enabled                      = true
  default_nodepool_auto_scaling                = true
  nodes = {
    worker0 = {
      name                        = "aggreg"
      vm_size                     = "Standard_D16s_v5"
      create_before_destroy       = true
      zones                       = ["1"]
      temporary_name_for_rotation = "aggregtmp"
      vnet_subnet_id              = local.subnet_id
      # enable_auto_scaling         = true
      node_count = 4
      # max_count  = 4
      node_labels = {
        "type" = "aggregator"
      }
    }
    worker1 = {
      name                        = "leafnode"
      vm_size                     = "Standard_D16s_v5"
      create_before_destroy       = true
      zones                       = ["1"]
      temporary_name_for_rotation = "leafnodetmp"
      vnet_subnet_id              = local.subnet_id
      # enable_auto_scaling         = true
      node_count = 4
      # max_count  = 4
      node_labels = {
        "type" = "leaf"
      }
    }
  }
  generic_tags = {
    environment        = local.environment
    infra-owner        = data.azurerm_client_config.current.object_id
    CreatedBy          = data.azurerm_client_config.current.object_id
    CreatedByTerraform = true
    CreationDate       = time_static.tags_time_stemp.rfc3339
    UpdateDate         = formatdate("YYYY-MM-DD'T'hh:mm:ss", timestamp())
  }
  aks_tags = merge(
    local.generic_tags,
    {}
  )
  msi_aks_tags = merge(
    local.generic_tags,
    {}
  )
}

Debug Output/Panic Output

# module.aks_ods01.azurerm_role_assignment.network_contributor_on_subnet["subnet_id"] must be replaced
-/+ resource "azurerm_role_assignment" "network_contributor_on_subnet" {
      ~ id                                     = "/subscriptions/500adf42-4c55---76039abcdae0/resourceGroups/rg-nonprod-ods-qa-uks/providers/Microsoft.Network/virtualNetworks/vnet-nonprod-ods-qa-uks/subnets/snet-nonprod-ods-qa-uks-ods01/providers/Microsoft.Authorization/roleAssignments/8dc4fd27---b901-1f95c7632b99" -> (known after apply)
      ~ name                                   = "8dc427--5306-b901-1f95c7632b99" -> (known after apply)
      ~ principal_id                           = "4124e96d--4fd1-8a5f-a4411142e" # forces replacement -> (known after apply) # forces replacement
      ~ principal_type                         = "ServicePrincipal" -> (known after apply)
      ~ role_definition_id                     = "/subscriptions/5-4c55-42be-96e1-76030/providers/Microsoft.Authorization/roleDefinitions/4d97b98b-1d4f---c67212e7" -> (known after apply)
      + skip_service_principal_aad_check       = (known after apply)
        # (6 unchanged attributes hidden)
    }

Expected Behaviour

When using the terraform-azurerm-aks module with a specified var.client_id and identity_type = "UserAssigned":

1.No Replacement:
The azurerm_role_assignment.network_contributor_on_subnet resource should not be flagged for replacement during each terraform plan or terraform apply if var.client_id remains unchanged.

  1. Consistent Principal ID:
    The principal_id in the azurerm_role_assignment should use var.client_id directly if provided, avoiding reliance on data sources or computed values that are resolved only after apply.

  2. Idempotent Plans:
    Subsequent terraform plan runs after an initial apply should indicate no changes if the configuration has not been altered.

By ensuring the principal_id prioritizes var.client_id, the module's behavior becomes predictable and avoids unnecessary resource replacements.

Proposed Change:

resource "azurerm_role_assignment" "network_contributor_on_subnet" {
  for_each = var.network_contributor_role_assigned_subnet_ids

  principal_id         = coalesce(try(var.client_id,data.azurerm_user_assigned_identity.cluster_identity[0].principal_id, azurerm_kubernetes_cluster.main.identity[0].principal_id))
  scope                = each.value
  role_definition_name = "Network Contributor"

  lifecycle {
    precondition {
      condition     = !var.create_role_assignment_network_contributor
      error_message = "Cannot set both of `var.create_role_assignment_network_contributor` and `var.network_contributor_role_assigned_subnet_ids`."
    }
  }
}

Actual Behaviour

When using the terraform-azurerm-aks module (version 9.2.0) with identity_type = "UserAssigned" and providing var.client_id:

  1. Forced Replacement:
    During each terraform plan, the azurerm_role_assignment.network_contributor_on_subnet resource is flagged for replacement due to the principal_id being derived from data sources that are not resolved until apply. This happens even when var.client_id remains the same.

  2. Unresolved Data Dependency:
    The try(data.azurerm_user_assigned_identity.cluster_identity[0].principal_id, azurerm_kubernetes_cluster.main.identity[0].principal_id) evaluation occurs before terraform apply, causing the principal_id to be marked as (known after apply).

  3. Non-Idempotent Plans:
    Subsequent terraform plan runs continue to show changes due to the azurerm_role_assignment resource needing to be replaced, even when no configuration changes have been made.

This behavior leads to unnecessary resource replacements and disrupts the efficiency and predictability of deployments.

Current Code:

resource "azurerm_role_assignment" "network_contributor_on_subnet" {
  for_each = var.network_contributor_role_assigned_subnet_ids

  principal_id         = coalesce(try(data.azurerm_user_assigned_identity.cluster_identity[0].principal_id, azurerm_kubernetes_cluster.main.identity[0].principal_id), var.client_id)
  scope                = each.value
  role_definition_name = "Network Contributor"

  lifecycle {
    precondition {
      condition     = !var.create_role_assignment_network_contributor
      error_message = "Cannot set both of `var.create_role_assignment_network_contributor` and `var.network_contributor_role_assigned_subnet_ids`."
    }
  }
}

Steps to Reproduce

No response

Important Factoids

No response

References

No response

@morbendor morbendor added the bug Something isn't working label Dec 15, 2024
@morbendor
Copy link
Author

Hi team,
I wanted to check if there has been any progress on this issue. Please let me know if you need more details or if there's anything I can do to help move this forward.
Thanks in advance!

@zioproto
Copy link
Collaborator

@morbendor why you are passing both identity_ids and client_id ?

  identity_type                   = "UserAssigned"
  identity_ids                    = [azurerm_user_assigned_identity.main.id]
  client_id                       = azurerm_user_assigned_identity.main.principal_id

@zioproto
Copy link
Collaborator

@morbendor you can remove var.client_id, that one is supposed to be used with the Service Principal scenario:

from the Terraform docs:
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/kubernetes_cluster

 An identity block as defined below. One of either identity or service_principal must be specified.

Image

You don't need the following block:

dynamic "service_principal" {
for_each = var.client_id != "" && var.client_secret != "" ? ["service_principal"] : []
content {
client_id = var.client_id
client_secret = var.client_secret
}
}

So you don't need to pass var.client_id. As you can see that value is never used in the identity block:

dynamic "identity" {
for_each = var.client_id == "" || var.client_secret == "" ? ["identity"] : []
content {
type = var.identity_type
identity_ids = var.identity_ids
}
}

@zioproto zioproto added invalid This doesn't seem right waiting-response and removed bug Something isn't working labels Dec 19, 2024
@zioproto
Copy link
Collaborator

Please use a valid configuration and let us know if this problem is still reproducible. Thanks

@morbendor
Copy link
Author

Even if I don't explicitly use the client_id variable (as shown here), I still receive a message indicating that the resource will be force-replaced. This seems to happen because the resource relies on some data to determine the principal_id.

module "aks_ods01" {
  source  = "Azure/aks/azurerm"
  version = "9.2.0"

  resource_group_name             = local.ods01_rg
  location                        = local.location
  kubernetes_version              = local.ods01_cluster_kubernetes_version
  cluster_name                    = data.azurecaf_name.aks_name.result
  prefix                          = data.azurecaf_name.aks_name.result
  log_analytics_workspace_enabled = false
  rbac_aad                        = false
  private_cluster_enabled         = local.ods01_private_cluster_enabled
  net_profile_pod_cidr            = local.ods01_net_profile_pod_cidr
  net_profile_service_cidr        = local.ods01_net_profile_service_cidr
  net_profile_dns_service_ip      = local.ods01_net_profile_dns_service_ip
  vnet_subnet_id                  = local.ods01_subnet_id
  identity_type                   = "UserAssigned"
  identity_ids                    = [azurerm_user_assigned_identity.main.id]
  os_disk_size_gb                 = 60
  agents_max_pods                 = 128
  sku_tier                        = local.ods01_cluster_sku_tier
  node_pools                      = local.ods01_nodes
  enable_auto_scaling             = local.ods01_default_nodepool_auto_scaling
  agents_availability_zones       = local.ods01_default_nodepool_zones
  agents_pool_name                = local.ods01_default_nodepool_name
  agents_count                    = local.ods01_default_nodepool_agents_count
  agents_min_count                = local.ods01_default_nodepool_min_count
  agents_max_count                = local.ods01_default_nodepool_max_count
  temporary_name_for_rotation     = local.ods01_default_nodepool_temporary_name_for_rotation
  network_plugin                  = local.ods01_network_plugin
  network_plugin_mode             = local.ods01_network_plugin_mode
  network_contributor_role_assigned_subnet_ids = {
    ods01_subnet_id = local.ods01_subnet_id
  }
  tags       = local.ods01_aks_tags
  depends_on = [azurerm_role_assignment.main]
}

As a result, every time I run terraform plan or terraform apply, it triggers a force-replace action.

  # module.aks_monitor.azurerm_role_assignment.network_contributor_on_subnet["monitor_subnet_id"] must be replaced
-/+ resource "azurerm_role_assignment" "network_contributor_on_subnet" {
      ~ id                                     = "/subscriptions/*********/resourceGroups/rg-nonprod-ods-qa-uks/providers/Microsoft.Network/virtualNetworks/vnet-nonprod-ods-qa-uks/subnets/snet-nonprod-ods-qa-uks-monitor/providers/Microsoft.Authorization/roleAssignments/******" -> (known after apply)
      ~ name                                   = "*******" -> (known after apply)
      ~ principal_id                           = "***********" # forces replacement -> (known after apply) # forces replacement
      ~ principal_type                         = "ServicePrincipal" -> (known after apply)
      ~ role_definition_id                     = "/subscriptions/500adf42-4c55-42be-96e1-76039abcdae0/providers/Microsoft.Authorization/roleDefinitions/*******" -> (known after apply)
      + skip_service_principal_aad_check       = (known after apply)
        # (6 unchanged attributes hidden)
    }

@zioproto
Copy link
Collaborator

I cannot reproduce it on my greenfield installation. I wonder if you have something in your terraform state that triggers this replacement.

Could you please use the command terraform console and type:

module.aks.cluster_identity

and

azurerm_user_assigned_identity.main

What do you see ?

Thanks

@morbendor
Copy link
Author

morbendor commented Dec 19, 2024

module.aks_ods01.cluster_identity

{ "identity_ids" = toset([ "/subscriptions/123abc45-6d78-90ef-12gh-34567ijkl890/resourceGroups/rg-env-xyz-qa-uks/providers/Microsoft.ManagedIdentity/userAssignedIdentities/msi-env-xyz-qa-uks-abc01", ]) "principal_id" = "" "tenant_id" = "" "type" = "UserAssigned" }

azurerm_user_assigned_identity.main

{ "client_id" = "a1b2c3d4-e5f6-7890-ab12-c34d56ef78gh" "id" = "/subscriptions/123abc45-6d78-90ef-12gh-34567ijkl890/resourceGroups/rg-env-xyz-qa-uks/providers/Microsoft.ManagedIdentity/userAssignedIdentities/msi-env-xyz-qa-uks-abc01" "location" = "uksouth" "name" = "msi-env-xyz-qa-uks-abc01" "principal_id" = "987zyx65-4w32-1v0u-t9rs-8mnop34q56tu" "resource_group_name" = "rg-env-xyz-qa-uks" "tags" = tomap({ "CreatedBy" = "123fabc4-5e67-890a-b12c-345678de9fgh" "CreatedByTerraform" = "true" "CreationDate" = "19-12-2024T16:28" "UpdateDate" = "19-12-2024T16:28" "designated-name" = "msi-env-xyz-qa-uks-abc01" "environment" = "qa" "infra-owner" = "123fabc4-5e67-890a-b12c-345678de9fgh" }) "tenant_id" = "765dcba4-3v21-09w8-x567-uvwxy6789z12" "timeouts" = null /* object */ }

I believe my principal ID is being sourced from this data reference:

data.azurerm_user_assigned_identity.cluster_identity[0].principal_id
Since Terraform doesn't know this value until the apply stage, it tries to replace it each time the plan runs.

@zioproto
Copy link
Collaborator

I am not able to reproduce it. @lonegunmanb can you please also have a look. Thanks

@morbendor
Copy link
Author

If you're available, we can arrange a meeting to debug this issue using my code in a new environment that we'll create.

Thanks!

@github-project-automation github-project-automation bot moved this from Todo to Done in Azure Module Kanban Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid This doesn't seem right waiting-response
Projects
Archived in project
Development

No branches or pull requests

2 participants