Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in orchestrator: Netherite backend failed to start: Operations per second is over the account limit. #396

Open
ericleigh007 opened this issue May 8, 2024 · 9 comments
Labels
P2 Priority 2

Comments

@ericleigh007
Copy link

ericleigh007 commented May 8, 2024

      "Microsoft.Azure.DurableTask.Core": "2.15.1",
      "Microsoft.Azure.DurableTask.Netherite": "1.4.2",
      "Microsoft.Azure.WebJobs.Extensions.DurableTask": "2.12.0"

We have started to get this error reported in our production deployment and it is worrying because it results in data loss.

As it is probably obvious from the below, our service bus trigger starts [or in this case attempts to start] an orchestrator. What is "interesting" is that we are not under any undue loading conditions that I can tell, and we just ran a full-up-test in our scaled test environment (equal to this environment in scale) yesterday with no such problem.

Microsoft.Azure.WebJobs.Host.FunctionInvocationException:
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor+<ExecuteWithLoggingAsync>d__26.MoveNext (Microsoft.Azure.WebJobs.Host, Version=3.0.39.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35: D:\a\_work\1\s\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:352)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor+<TryExecuteAsync>d__18.MoveNext (Microsoft.Azure.WebJobs.Host, Version=3.0.39.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35: D:\a\_work\1\s\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:108)
Inner exception System.InvalidOperationException handled at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw:
   at DurableTask.Netherite.NetheriteOrchestrationService+<GetClientAsync>d__21.MoveNext (DurableTask.Netherite, Version=1.0.0.0, Culture=neutral, PublicKeyToken=ef8c4135b1b4225a: /_/src/DurableTask.Netherite/OrchestrationService/NetheriteOrchestrationService.cs:75)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at DurableTask.Netherite.NetheriteOrchestrationService+<DurableTask-Core-IOrchestrationServiceClient-CreateTaskOrchestrationAsync>d__100.MoveNext (DurableTask.Netherite, Version=1.0.0.0, Culture=neutral, PublicKeyToken=ef8c4135b1b4225a: /_/src/DurableTask.Netherite/OrchestrationService/NetheriteOrchestrationService.cs:594)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at DurableTask.Core.TaskHubClient+<InternalCreateOrchestrationInstanceWithRaisedEventAsync>d__32.MoveNext (DurableTask.Core, Version=2.16.0.0, Culture=neutral, PublicKeyToken=d53979610a6e89dd: /_/src/DurableTask.Core/TaskHubClient.cs:645)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.WebJobs.Extensions.DurableTask.DurableClient+<Microsoft-Azure-WebJobs-Extensions-DurableTask-IDurableOrchestrationClient-StartNewAsync>d__36`1.MoveNext (Microsoft.Azure.WebJobs.Extensions.DurableTask, Version=2.0.0.0, Culture=neutral, PublicKeyToken=014045d636e89289: D:\a\_work\1\s\src\WebJobs.Extensions.DurableTask\ContextImplementations\DurableClient.cs:215)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at More.Functions.Input.Dew.DewServiceBusTrigger+<StartOrchestrator>d__9.MoveNext (More.Functions, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null: D:\a\1\s\More.Functions\Input\Dew\DewServiceBusTrigger.cs:111)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at More.Functions.Input.Dew.DewServiceBusTrigger+<ProcessMessage>d__8.MoveNext (More.Functions, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null: D:\a\1\s\More.Functions\Input\Dew\DewServiceBusTrigger.cs:93)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at More.Functions.Input.Dew.DewServiceBusTrigger+<Run>d__4.MoveNext (More.Functions, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null: D:\a\1\s\More.Functions\Input\Dew\DewServiceBusTrigger.cs:53)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at More.Functions.Input.Dew.DewServiceBusTrigger+<Run>d__4.MoveNext (More.Functions, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null: D:\a\1\s\More.Functions\Input\Dew\DewServiceBusTrigger.cs:58)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.GetResult (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.WebJobs.Host.Executors.VoidTaskMethodInvoker`2+<InvokeAsync>d__2.MoveNext (Microsoft.Azure.WebJobs.Host, Version=3.0.39.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35: D:\a\_work\1\s\src\Microsoft.Azure.WebJobs.Host\Executors\VoidTaskMethodInvoker.cs:20)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionInvoker`2+<InvokeAsync>d__10.MoveNext (Microsoft.Azure.WebJobs.Host, Version=3.0.39.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35: D:\a\_work\1\s\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionInvoker.cs:52)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor+<InvokeWithTimeoutAsync>d__33.MoveNext (Microsoft.Azure.WebJobs.Host, Version=3.0.39.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35: D:\a\_work\1\s\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:581)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor+<ExecuteWithWatchersAsync>d__32.MoveNext (Microsoft.Azure.WebJobs.Host, Version=3.0.39.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35: D:\a\_work\1\s\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:527)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.GetResult (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor+<ExecuteWithLoggingAsync>d__26.MoveNext (Microsoft.Azure.WebJobs.Host, Version=3.0.39.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35: D:\a\_work\1\s\src\Microsoft.Azure.WebJobs.Host\Executors\FunctionExecutor.cs:306)
Inner exception Microsoft.Azure.Storage.StorageException handled at DurableTask.Netherite.NetheriteOrchestrationService+<GetClientAsync>d__21.MoveNext:
   at Microsoft.Azure.Storage.Core.Executor.Executor+<ExecuteAsync>d__1`1.MoveNext (Microsoft.Azure.Storage.Common, Version=11.2.3.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.EventHubs.Processor.PartitionManager+<InitializeStoresAsync>d__10.MoveNext (Microsoft.Azure.EventHubs.Processor, Version=4.3.2.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.EventHubs.Processor.PartitionManager+<StartAsync>d__7.MoveNext (Microsoft.Azure.EventHubs.Processor, Version=4.3.2.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.EventHubs.Processor.EventProcessorHost+<RegisterEventProcessorFactoryAsync>d__56.MoveNext (Microsoft.Azure.EventHubs.Processor, Version=4.3.2.0, Culture=neutral, PublicKeyToken=7e34167dcc6d6d8c)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at DurableTask.Netherite.EventHubsTransport.EventHubsTransport+<<DurableTask-Netherite-ITransportLayer-StartWorkersAsync>g__StartPartitionHost|41_0>d.MoveNext (DurableTask.Netherite, Version=1.0.0.0, Culture=neutral, PublicKeyToken=ef8c4135b1b4225a: /_/src/DurableTask.Netherite/TransportLayer/EventHubs/EventHubsTransport.cs:188)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at DurableTask.Netherite.EventHubsTransport.EventHubsTransport+<DurableTask-Netherite-ITransportLayer-StartWorkersAsync>d__41.MoveNext (DurableTask.Netherite, Version=1.0.0.0, Culture=neutral, PublicKeyToken=ef8c4135b1b4225a: /_/src/DurableTask.Netherite/TransportLayer/EventHubs/EventHubsTransport.cs:163)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at DurableTask.Netherite.NetheriteOrchestrationService+<StartWorkersAsync>d__87.MoveNext (DurableTask.Netherite, Version=1.0.0.0, Culture=neutral, PublicKeyToken=ef8c4135b1b4225a: /_/src/DurableTask.Netherite/OrchestrationService/NetheriteOrchestrationService.cs:428)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at DurableTask.Netherite.NetheriteOrchestrationService+<TryStartAsync>d__85.MoveNext (DurableTask.Netherite, Version=1.0.0.0, Culture=neutral, PublicKeyToken=ef8c4135b1b4225a: /_/src/DurableTask.Netherite/OrchestrationService/NetheriteOrchestrationService.cs:319)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at DurableTask.Core.TaskHubWorker+<StartAsync>d__32.MoveNext (DurableTask.Core, Version=2.16.0.0, Culture=neutral, PublicKeyToken=d53979610a6e89dd: /_/src/DurableTask.Core/TaskHubWorker.cs:246)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.WebJobs.Extensions.DurableTask.DurableTaskExtension+<StartTaskHubWorkerIfNotStartedAsync>d__102.MoveNext (Microsoft.Azure.WebJobs.Extensions.DurableTask, Version=2.0.0.0, Culture=neutral, PublicKeyToken=014045d636e89289: D:\a\_work\1\s\src\WebJobs.Extensions.DurableTask\DurableTaskExtension.cs:1414)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at System.Runtime.CompilerServices.TaskAwaiter.GetResult (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
   at Microsoft.Azure.WebJobs.Host.Listeners.FunctionListener+<StartAsync>d__13.MoveNext (Microsoft.Azure.WebJobs.Host, Version=3.0.39.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35: D:\a\_work\1\s\src\Microsoft.Azure.WebJobs.Host\Listeners\FunctionListener.cs:68)
@ericleigh007
Copy link
Author

ericleigh007 commented May 8, 2024

We have at least 149 of these over the last 4 hours, but our exception handling may also be hiding some of them.
It looks as if Operations per second on our storage account for the backend is just over the 20k/sec limit, so assuming it is that.

Also I think that my other issue related to FASTER #383 is related, in retrying on these errors we're getting perhaps?

@ericleigh007
Copy link
Author

ericleigh007 commented May 8, 2024

@davidmrdavid glad the job is the job while I guess @sebastianburckhardt is out readying for BUILD, maybe?

At any rate....

Another update: We're trying to get our storage account capacity upgraded on a few accounts to get over the hump and into live, whilst I'm hoping that Netherite troops will find some of the problem that could lead to needing more than expected capacity.
Some ideas:

  • The FASTER.KV problem maybe causing a lot of retries to the blob?
  • Whether or not, do we have control of any retry logic that's used by Netherite which we could tweak and provide some relief? I obviously hope that we can just get more, or regain capacity some other way, but maybe adjusting retry policy might provide some immediate relief.
  • The possibility that we might provide a DIFFERENT storage account for the "larger payload" blobs, and thereby cut that part of the load?

I'm off trying to justify Azure increasing my capacity, but happy to hear anything of any help.

@davidmrdavid
Copy link
Member

davidmrdavid commented May 8, 2024

The FASTER.KV problem maybe causing a lot of retries to the blob?

Interesting theory- maybe; I need to see if this triggers an explosive retry-chain. I'll see if I can provide you with a private package based off this (#397) in case you're willing to give it a go and report back.

We have started to get this error reported in our production deployment and it is worrying because it results in data loss.

This is new info to me - what do you mean by this resulting in data loss? Can you please clarify? Is it data loss because you're getting throttled so you need to choose a different storage account (therefore leaving the old data behind)?

@ericleigh007
Copy link
Author

Thanks @davidmrdavid
It isn't "strictly" loss as the input is still in the input file, but it means we have to do things like manually retrigger our Cosoms change feed, because the orchestrator that was supposed to start isn't started. then downstream systems don't get their updates, and then the domino effect.

@davidmrdavid
Copy link
Member

Noted. For now, let's try the package here (#383 (comment)) to see if there's indeed a correlation between those FASTER warnings and this throttling, that should help make the next steps clearer.

@ericleigh007
Copy link
Author

Azure support upped our maximum Transactions / second to 40,000 for now. We have not seen this one again since the first time. We have a decent A/B situation set up now because only ONE of our high scale accounts has the uplift. So we'll if the lower-limit accounts still report this occaisionally.

@davidmrdavid
Copy link
Member

davidmrdavid commented May 10, 2024

@ericleigh007: so is the plan to have the private package in both environments so we can measure the effect of the change in that A/B set up? Just confirming, that would be great

@ericleigh007
Copy link
Author

Unfortunately [really] no. We have no way to put the private package in the production environment so not capable of doing that. It was difficult enough to get the private package into dev where we tested.

Now back to the subject, will definitely keep you guys informed as we run on the 40k environment next week. So far, I've seen the environment hit 1.65M transactions / minute, so with the capacity uplift, we're definitely starting to increase above the standard load.

In the next week or so, I believe we will get the newest released package and dependencies into some environments and within a couple of weeks, if no problems develop, into production.

@ericleigh007
Copy link
Author

I sent some PM's to you showing the comparison between our uplifted storage account's usage vs non-uplifted, for about the same workload, give or take.
The function app under the uplifted storage account ran great, while the non-uplifted one had myriad warnings, IndexOutOfRange, Disposed, and Client timeouts starting orchestrators.
This may be correlation without causation, but wanted to mention it here.

@bachuv bachuv added P2 Priority 2 and removed Needs: Triage 🔍 labels May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 Priority 2
Projects
None yet
Development

No branches or pull requests

3 participants