-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in orchestrator: Netherite backend failed to start: Operations per second is over the account limit. #396
Comments
We have at least 149 of these over the last 4 hours, but our exception handling may also be hiding some of them. Also I think that my other issue related to FASTER #383 is related, in retrying on these errors we're getting perhaps? |
@davidmrdavid glad the job is the job while I guess @sebastianburckhardt is out readying for BUILD, maybe? At any rate.... Another update: We're trying to get our storage account capacity upgraded on a few accounts to get over the hump and into live, whilst I'm hoping that Netherite troops will find some of the problem that could lead to needing more than expected capacity.
I'm off trying to justify Azure increasing my capacity, but happy to hear anything of any help. |
Interesting theory- maybe; I need to see if this triggers an explosive retry-chain. I'll see if I can provide you with a private package based off this (#397) in case you're willing to give it a go and report back.
This is new info to me - what do you mean by this resulting in data loss? Can you please clarify? Is it data loss because you're getting throttled so you need to choose a different storage account (therefore leaving the old data behind)? |
Thanks @davidmrdavid |
Noted. For now, let's try the package here (#383 (comment)) to see if there's indeed a correlation between those FASTER warnings and this throttling, that should help make the next steps clearer. |
Azure support upped our maximum Transactions / second to 40,000 for now. We have not seen this one again since the first time. We have a decent A/B situation set up now because only ONE of our high scale accounts has the uplift. So we'll if the lower-limit accounts still report this occaisionally. |
@ericleigh007: so is the plan to have the private package in both environments so we can measure the effect of the change in that A/B set up? Just confirming, that would be great |
Unfortunately [really] no. We have no way to put the private package in the production environment so not capable of doing that. It was difficult enough to get the private package into dev where we tested. Now back to the subject, will definitely keep you guys informed as we run on the 40k environment next week. So far, I've seen the environment hit 1.65M transactions / minute, so with the capacity uplift, we're definitely starting to increase above the standard load. In the next week or so, I believe we will get the newest released package and dependencies into some environments and within a couple of weeks, if no problems develop, into production. |
I sent some PM's to you showing the comparison between our uplifted storage account's usage vs non-uplifted, for about the same workload, give or take. |
We have started to get this error reported in our production deployment and it is worrying because it results in data loss.
As it is probably obvious from the below, our service bus trigger starts [or in this case attempts to start] an orchestrator. What is "interesting" is that we are not under any undue loading conditions that I can tell, and we just ran a full-up-test in our scaled test environment (equal to this environment in scale) yesterday with no such problem.
The text was updated successfully, but these errors were encountered: