Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

accountQuota SC parameter causes client throttling #1556

Closed
RomanBednar opened this issue Nov 8, 2023 · 2 comments · Fixed by #1564
Closed

accountQuota SC parameter causes client throttling #1556

RomanBednar opened this issue Nov 8, 2023 · 2 comments · Fixed by #1564

Comments

@RomanBednar
Copy link
Contributor

What happened:

CreateVolume triggers endless recursive loop when accountQuota parameter is set and exceeded causing client to get throttled temporarily, then the cycle repeats.

What you expected to happen:

accountQuota parameter should work

How to reproduce it:

  1. Create StorageClass with quota:
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: azurefile-csi-quota
parameters:
  skuName: Premium_LRS
  accountQuota: "200"
provisioner: file.csi.azure.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
  1. Create PVCs until the limit is exceeded (requires creating 1 extra PVC after previous ones exceeded the quota)

storageAccount created: fe6678df99bee438b872f69

1st volume (5GB) provisioned fine:

I1107 15:21:06.701649       1 utils.go:83] GRPC response: {"volume":{"capacity_bytes":5368709120,"volume_context":{"accountQuota":"200","csi.storage.k8s.io/pv/name":"pvc-6a182b69-a974-47e6-af86-79d22a74b133","csi.storage.k8s.io/pvc/name":"pvc-1","csi.storage.k8s.io/pvc/namespace":"default","secretnamespace":"default","skuName":"Premium_LRS"},"volume_id":"rbednar-mycluster-01-v5lrw-rg#fe6678df99bee438b872f69#pvc-6a182b69-a974-47e6-af86-79d22a74b133###default"}}
I1107 15:21:06.724977       1 utils.go:76] GRPC call: /csi.v1.Controller/DeleteVolume
I1107 15:21:06.725002       1 utils.go:77] GRPC request: {"volume_id":"rbednar-mycluster-01-v5lrw-rg#fe6678df99bee438b872f69#pvc-6a182b69-a974-47e6-af86-79d22a74b133###default"}
I1107 15:21:06.849901       1 controllerserver.go:546] create file share pvc-0b182d6b-0459-4663-b2fe-0546e3826d45 on storage account fe6678df99bee438b872f69 successfully

2nd PVC (200GB) creates fine but there's a message saying total used quota is 100GB which is incorrect:

I1107 15:24:42.934581       1 utils.go:77] GRPC request: {"capacity_range":{"required_bytes":214748364800},"name":"pvc-371b7b48-b8bc-4dbd-a78d-653ed5fc08a5","parameters":{"accountQuota":"200","csi.storage.k8s.io/pv/name":"pvc-371b7b48-b8bc-4dbd-a78d-653ed5fc08a5","csi.storage.k8s.io/pvc/name":"pvc-2","csi.storage.k8s.io/pvc/namespace":"default","skuName":"Premium_LRS"},"volume_capabilities":[{"AccessType":{"Mount":{}},"access_mode":{"mode":7}}]}
I1107 15:24:43.361397       1 controllerserver.go:468] total used quota on account(fe6678df99bee438b872f69) is 100 GB, file share number: 1
I1107 15:24:43.420890       1 controllerserver.go:525] begin to create file share(pvc-371b7b48-b8bc-4dbd-a78d-653ed5fc08a5) on account(fe6678df99bee438b872f69) type(Premium_LRS) subID() rg(rbednar-mycluster-01-v5lrw-rg) location() size(200) protocol(SMB)
I1107 15:24:43.578772       1 controllerserver.go:546] create file share pvc-371b7b48-b8bc-4dbd-a78d-653ed5fc08a5 on storage account fe6678df99bee438b872f69 successfully
I1107 15:24:43.607713       1 controllerserver.go:592] store account key to k8s secret(azure-storage-account-fe6678df99bee438b872f69-secret) in default namespace
I1107 15:24:43.607758       1 utils.go:83] GRPC response: {"volume":{"capacity_bytes":214748364800,"volume_context":{"accountQuota":"200","csi.storage.k8s.io/pv/name":"pvc-371b7b48-b8bc-4dbd-a78d-653ed5fc08a5","csi.storage.k8s.io/pvc/name":"pvc-2","csi.storage.k8s.io/pvc/namespace":"default","secretnamespace":"default","skuName":"Premium_LRS"},"volume_id":"rbednar-mycluster-01-v5lrw-rg#fe6678df99bee438b872f69#pvc-371b7b48-b8bc-4dbd-a78d-653ed5fc08a5###default"}}
1

3rd PVC (5GB) fails to provision and causes client to get throttled:

I1107 15:28:01.166289       1 utils.go:77] GRPC request: {"capacity_range":{"required_bytes":4294967296},"name":"pvc-9e7c99c6-673c-4c04-83c4-c9050b13dc3d","parameters":{"accountQuota":"200","csi.storage.k8s.io/pv/name":"pvc-9e7c99c6-673c-4c04-83c4-c9050b13dc3d","csi.storage.k8s.io/pvc/name":"pvc-3","csi.storage.k8s.io/pvc/namespace":"default","skuName":"Premium_LRS"},"volume_capabilities":[{"AccessType":{"Mount":{}},"access_mode":{"mode":7}}]}
I1107 15:28:01.399269       1 controllerserver.go:468] total used quota on account(fe6678df99bee438b872f69) is 300 GB, file share number: 2
W1107 15:28:01.399328       1 controllerserver.go:470] account(fe6678df99bee438b872f69) used quota(200 GB) is over 300 GB, skip matching current account
I1107 15:28:01.649716       1 controllerserver.go:468] total used quota on account(fe6678df99bee438b872f69) is 300 GB, file share number: 2
W1107 15:28:01.649774       1 controllerserver.go:470] account(fe6678df99bee438b872f69) used quota(200 GB) is over 300 GB, skip matching current account
I1107 15:28:01.649786       1 azure_storageaccount.go:613] Get storage account(fe6678df99bee438b872f69) from cache
.
.
.
I1107 15:28:01.834264       1 controllerserver.go:468] total used quota on account(fe6678df99bee438b872f69) is 300 GB, file share number: 2
W1107 15:28:01.834328       1 controllerserver.go:470] account(fe6678df99bee438b872f69) used quota(200 GB) is over 300 GB, skip matching current account
I1107 15:28:01.834341       1 azure_storageaccount.go:613] Get storage account(fe6678df99bee438b872f69) from cache
.
.
.
W1107 15:28:12.230614       1 controllerserver.go:453] EnsureStorageAccount() failed with error(could not list storage accounts for account type Premium_LRS: Retriable: false, RetryAfter: 90s, HTTPStatusCode: 429, RawError: {"error":{"code":"TooManyRequests","message":"The request is being throttled as the limit has been reached for operation type - List_ObservationWindow_00:05:00. For more information, see - https://aka.ms/srpthrottlinglimits"}}), waiting for retrying
W1107 15:28:12.230652       1 utils.go:139] sleep 16 more seconds, waiting for throttling complete

Anything else we need to know?:

Secondary issues

  1. documentation does not explain that storageAccount parameter must not be used with accountQuota in which case it does not have effect
  2. log messages are confusing, total used quota reported does not match what is actually used by PVCs
  3. quota is not applied to currently created volume, driver limits only consecutive PVCs after a quota was exceeded

Note 1

Adding extra debug messages to cloud-provider-azure AddStorageAccountTags shows that the account is already tagged with skip-matching in cache and so does not do update the account in cloud (returns with nil):

I1108 12:55:11.704020       1 azure_storageaccount.go:629] AddStorageAccountTags: result of getStorageAccountWithCache: storage.Account{Response:autorest.Response{Response:(*http.Response)(0xc0007d6000)}, Sku:(*storage.Sku)(0xc00080c3c0), Kind:"FileStorage", Identity:(*storage.Identity)(nil), ExtendedLocation:(*storage.ExtendedLocation)(nil), AccountProperties:(*storage.AccountProperties)(0xc000568300), Tags:map[string]*string{"k8s-azure-created-by":(*string)(0xc0004b54b0), "skip-matching":(*string)(0xc0002c9c40)}, Location:(*string)(0xc0004b54a0), ID:(*string)(0xc0004b53e0), Name:(*string)(0xc0004b5400), Type:(*string)(0xc0004b5410)}

Possible root cause might be when CreateVolume is called recursively after exceeding quota and attempting to tag account with skip-matching and hits EnsureStorageAccount it does not do anything because it checks account in cloud instead of cache and the cycle continues until throttled.

Note 2

(storage account is changed to fcb080acbf0394f39a213ea because I reinstalled the cluster)

When storageAccount is tagged manually with skip-matching in Azure cloud the driver creates new account and provisions volume:

I1107 16:15:07.297843       1 utils.go:76] GRPC call: /csi.v1.Controller/CreateVolume
I1107 16:15:07.297870       1 utils.go:77] GRPC request: {"capacity_range":{"required_bytes":5368709120},"name":"pvc-c360e586-fbbf-4e91-83e9-0188c9b2cbdc","parameters":{"accountQuota":"200","csi.storage.k8s.io/pv/name":"pvc-c360e586-fbbf-4e91-83e9-0188c9b2cbdc","csi.storage.k8s.io/pvc/name":"pvc-3","csi.storage.k8s.io/pvc/namespace":"default","skuName":"Premium_LRS"},"volume_capabilities":[{"AccessType":{"Mount":{}},"access_mode":{"mode":7}}]}
I1107 16:15:07.795768       1 azure_storageaccount.go:732] found skip-matching tag for account fe6678df99bee438b872f69, skip matching
I1107 16:15:07.795872       1 azure_storageaccount.go:340] azure - no matching account found, begin to create a new account fcb080acbf0394f39a213ea in resource group rbednar-mycluster-01-v5lrw-rg, location: centralus, accountType: Premium_LRS, accountKind: FileStorage, tags: map[k8s-azure-created-by:azure]
I1107 16:15:07.795893       1 azure_storageaccount.go:365] set AllowBlobPublicAccess(false) for storage account(fcb080acbf0394f39a213ea)
I1107 16:15:27.695930       1 controllerserver.go:468] total used quota on account(fcb080acbf0394f39a213ea) is 0 GB, file share number: 0
I1107 16:15:27.773226       1 controllerserver.go:525] begin to create file share(pvc-c360e586-fbbf-4e91-83e9-0188c9b2cbdc) on account(fcb080acbf0394f39a213ea) type(Premium_LRS) subID() rg(rbednar-mycluster-01-v5lrw-rg) location() size(100) protocol(SMB)
I1107 16:15:28.039538       1 controllerserver.go:546] create file share pvc-c360e586-fbbf-4e91-83e9-0188c9b2cbdc on storage account fcb080acbf0394f39a213ea successfully

Environment:

  • CSI Driver version: 1.29.1
  • Kubernetes version (use kubectl version):
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@andyzhangx
Copy link
Member

thanks for reporting, this issue would be fixed by kubernetes-sigs/cloud-provider-azure#4968 using a lock

@jsafrane
Copy link
Contributor

@andyzhangx have you considered using alpha feature flags? Because right now account quota is apparently broken, not documented at all, does not have any e2e tests, yet it's indistinguishable from other CSI driver features and cannot be turned off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants