-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with _currentConfiguration being null after the first call #2050
Comments
@dmitrymal we are modifying this so that until configuration is obtained, we will retry at a 30 second interval 4 times. |
It is your call, but personally, I would have left it up to the client to decide what retry strategy to use. '30 second interval 4 times' seems just a random decision and may not work for all clients. For instance, in our case with the rate of 10K requests per minute, we might end up with 40K queued requests waiting for a retry (which is probably not good). After all, the issue is not about not being able to get the configuration or retry logic. The issue is in the code which does not check whether or not the _currentConfiguration is null. In my opinion, the code should check if currentConfiguration == null and if it is, try to get it. The code should not simply throw an exception leaving the client without any option to fix the issue. This logical bug should be fixed regardless of retry logic. The easiest and quickest way to fix it is to have it something like this: I could create a PR with the proposed changes if that would help. By the way, I see you labeled it as an 'Enhancement'. I would consider this as a Bug (not an Enhancement). Thoughts? |
@dmitrymal A request has arrived and we need to validate the token. Metadata may be needed since TokenValidationParameters.ConfigurationManager is not null the user must have set it. It seems correct that if ConfigurationManger never obtained metadata, we should try again next time through. I need to circle back with the team to understand why continuous attempts without a throttle were not acceptable. |
@dmitrymal we discussed this as a team and landed on exponential backoff. |
The retry logic might help (in some cases). However, the client code should have control of the retry logic. Otherwise, it might lead to more issues. At the very least the client should be able to set/define retry parameters. Anyway, all this (the retry discussion) is a separate concern and has nothing to do with the reported issue in the code. Any thoughts on fixing that? |
@dmitrymal we have implemented a exponential increase in this PR #2052 We should leave this open to add the ability for users to set their own algorithm as you are correct in that we can assume to understand all the scenarios. |
Summary:
We ran into an issue with our Azure Function that has a very high throughput. With high volume functions a lot of new function instances start and go. Sometime on the startup of a new instance, the very first call to the GetConfigurationAsync fails (e.g. OKTA server was unavailable).
This causes all the subsequent calls to fail for the next 5 minutes.
Details:
When this line
var configuration = await _configRetriever.GetConfigurationAsync(MetadataAddress, _docRetriever, CancellationToken.None).ConfigureAwait(false);
fails, the exception is caught and _syncAfter is set tonow + 5 mins
. However,_currentConfiguration
is stillnull
.On the subsequent calls the condition
if (_syncAfter <= now)
is true and the code falls through to this lineazure-activedirectory-identitymodel-extensions-for-dotnet/src/Microsoft.IdentityModel.Protocols/Configuration/ConfigurationManager.cs
Line 209 in 614642a
causing the another exception.
This keeps happing for 5 mins until
_syncAfter
becomes > thannow
In our case this behavior causes 10 - 50K exceptions in 5 min timeframe.
Proposed fix:
add
|| _currentConfiguration == null
condition to the following if statement:azure-activedirectory-identitymodel-extensions-for-dotnet/src/Microsoft.IdentityModel.Protocols/Configuration/ConfigurationManager.cs
Line 170 in 614642a
Something like this:
if (_currentConfiguration == null || _syncAfter <= now)
Question:
Before this is fixed, is there any workaround that you could suggest?
Disclaimer:
All this analysis is done just by reviewing the code, so I might have overlooked something. Let me know if this sounds correct to you.
The text was updated successfully, but these errors were encountered: