-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic http failover base on Srv Dns #442
Conversation
9087339
to
fff39fb
Compare
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/Constants/LoggingConstants.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/Constants/LoggingConstants.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/ConfigurationClientManager.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/ConfigurationClientManager.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/DnsClient/DnsProcessor.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/AzureAppConfigurationOptions.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/ConnectionStringUtils.cs
Show resolved
Hide resolved
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/ConfigurationClientManager.cs
Outdated
Show resolved
Hide resolved
} | ||
else | ||
{ | ||
throw new ArgumentNullException(nameof(_endpoints)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think we should throw for ArgumentNullException
for _endpoints
here. Null/empty _connectionStrings
could also be the cause for ArgumentNullException
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about throw new ArgumentNullException(nameof(_clients));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks more like an InvalidOperationException if both connectrings and endpoints are not provided.
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/DnsClient/SrvRecord.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/DnsClient/SrvRecord.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/DnsClient/SrvRecord.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/DnsClient/DnsProcessor.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/ConfigurationClientManager.cs
Outdated
Show resolved
Hide resolved
// Only remove the client if it is not in the user passed connection string or endpoints, as well as not in returned SRV records. | ||
private bool IsEligibleToRemove(IEnumerable<string> srvEndpointHosts, ConfigurationClientWrapper client) | ||
{ | ||
if (_connectionStrings != null && _connectionStrings.Any(c => GetHostFromConnectionString(c).Equals(client.Endpoint.Host, StringComparison.OrdinalIgnoreCase))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would parse the endpoint from each connection string everytime we invoke IsEligibleToRemove
.
How about storing the list of endpoints instead of connection strings? Looks like we only use the first connection string anyway. Maybe we can store that one connection string separately and extract endpoint from all other connection strings before storing in _endpoint
list.
That also makes it simpler to check isUsingConnectionString
. We can avoid the check for _connectionStrings.Any()
and just check if the single connection string is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated this, could you please tale a look again?
ee5ce94
to
68ed6e9
Compare
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/AzureAppConfigurationOptions.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/Extensions/EnumerableExtensions.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/SrvLookupClient.cs
Outdated
Show resolved
Hide resolved
endpoint = _endpoints.First(); | ||
} | ||
|
||
var lookup = new SrvLookupClient(logger); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to create a new SrvLookupClient every time we get auto failover clients? Can we create once and reuse it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, this is also what I want to optimize, the udpClient and tcpClient in it should be reusable. Besides this, I use cache in the SrvLookupClient, if new SrvLookupClient every time, using cache is meanless.
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/SrvLookupClient.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/SrvLookupClient.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/SrvLookupClient.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/SrvLookupClient.cs
Outdated
Show resolved
Hide resolved
|
||
return resultRecords; | ||
} | ||
catch (Exception e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't catch base exception in any case. Only known exceptions should be caught and handled.
Also, InternalQueryAsync
is the only method that can throw, so it would be better to wrap only that line of code in its own try/catch block so that its easy to understand where the exception is coming from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to catch any exceptions here. Dns lookup error is nothing different from other errors, just throw is fine
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/SrvLookupClient.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/SrvLookupClient.cs
Outdated
Show resolved
Hide resolved
...ration.AzureAppConfiguration/Microsoft.Extensions.Configuration.AzureAppConfiguration.csproj
Show resolved
Hide resolved
/// <summary> | ||
/// Flag to indicate whether discover the replicas automatically. | ||
/// </summary> | ||
public bool AutoDiscoverReplica { get; set; } = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public bool AutoDiscoverReplica { get; set; } = true; | |
public bool EnableReplicaDiscovery { get; set; } = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See this comment
My suggestion should be ReplicaDiscoveryEnabled
in order to avoid a verb as a property name.
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/AzureAppConfigurationProvider.cs
Outdated
Show resolved
Hide resolved
} | ||
else | ||
{ | ||
throw new ArgumentNullException(nameof(_clients)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_clients
is not an argument to this constructor.
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/ConfigurationClientManager.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/ConfigurationClientManager.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/ConfigurationClientManager.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/ConfigurationClientManager.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/ConfigurationClientManager.cs
Show resolved
Hide resolved
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/ConfigurationClientManager.cs
Outdated
Show resolved
Hide resolved
IEnumerable<SrvRecord> results = await _srvLookupClient.Value.QueryAsync(endpoint.DnsSafeHost, cancellationToken).ConfigureAwait(false); | ||
|
||
// shuffle the results to ensure hosts can be picked randomly. | ||
IEnumerable<string> srvTargetHosts = results.Select(r => $"{r.Target.Value.Trim('.')}").Shuffle().ToList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind shuffling on the non-transformed enumerable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's that trim doing, just removing a preceding '.' ? If so, may be clearer to be explicit about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this resolved? Did other comments get resolved without being considered?
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/ConfigurationClientManager.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/SrvLookupClient.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/SrvLookupClient.cs
Outdated
Show resolved
Hide resolved
f92fa5a
to
b6a344d
Compare
b6a344d
to
bc94a95
Compare
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/ConfigurationClientManager.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/ConfigurationClientManager.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/ConfigurationClientManager.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/ConfigurationClientManager.cs
Outdated
Show resolved
Hide resolved
e209865
to
cec2f6a
Compare
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/ConfigurationClientManager.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/AzureAppConfigurationProvider.cs
Outdated
Show resolved
Hide resolved
0f8a5d2
to
684a374
Compare
684a374
to
fdaa51f
Compare
{ | ||
await RefreshFallbackClients(cts.Token).ConfigureAwait(false); | ||
} | ||
catch (OperationCanceledException e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't log a warning on cancellation due to dispose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
src/Microsoft.Extensions.Configuration.AzureAppConfiguration/Extensions/TaskExtensions.cs
Outdated
Show resolved
Hide resolved
results = results.Concat(records); | ||
|
||
// If we get no record from _alt{i} SRV, we have reached the end of _alt* list | ||
if (records.Count() == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can concact empty. The check can be above to avoid it.
_ = DiscoverFallbackClients(); | ||
} | ||
|
||
List<ConfigurationClient> clients = new List<ConfigurationClient>(_clients.Where(c => c.BackoffEndTime <= now).Select(c => c.Client)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To ensure thread safety, the _dynamicClients
could be updated asynchronously
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we don't have to worry about this, updated.
_ = DiscoverFallbackClients(); | ||
} | ||
|
||
List<ConfigurationClient> clients = new List<ConfigurationClient>(_clients.Select(c => c.Client)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that creates copy
a141380
to
312ad19
Compare
|
||
namespace Microsoft.Extensions.Configuration.AzureAppConfiguration | ||
{ | ||
internal interface IConfigurationClientManager | ||
{ | ||
IEnumerable<ConfigurationClient> GetAvailableClients(DateTimeOffset time); | ||
IEnumerable<ConfigurationClient> GetAvailableClients(); | ||
|
||
IEnumerable<ConfigurationClient> GetAllClients(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two are odd. Separate method for a subset of data is not great for an interface.
In this case we can always return all and have a property in ConfigurationClient that the caller can filter on instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ConfigurationClient type is not defined in this project, it's from .net SDK, we can't add a property to it, we can only extend it or wrap it.
Then it seems would be a little bit overkill since the callers only care about the clients, they don't have to worry about how the clients are provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my perspective , changing this seems doesn't make it much better.
https://github.com/Azure/AppConfiguration-DotnetProvider/pull/502/files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason we need two different subsets of ConfigurationClients (Available vs All)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetAllClients
is a new method since 7.0.0 for time-based retry on start-up. When the app starts, there is no longer a fixed number of retries for each client, but as long as the time allows (default is 100s), we will continue to try all clients until a certain client can work or the time runs out. Therefore, it is necessary to obtain all Clients. Of course, we will not keep trying all Clients without limit. Whenever a round of attempts is made and all Clients are invalid, it will back off for a certain period of time and then start trying the next round.
GetAvailableClients
is for getting the available clients for refreshing scenario, each client has unique backoff time which is determined by its failure attempts. Hence, we would always get the clients whose back off time is due.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have uniform strategy to use client instances in all cases (startup and refresh). Beside on startup, backoff isn't specified (no one has failed yet). I would assume that at startup GetAvailableClients always return all clients. Am I missing something?
cc @jimmyca15
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like we should move the backoff tracking out of ConfigurationClientManager and use ConfigurationClientManager just as factory.
This can be done in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I would merge this one first, go out in 7.1.0-preview first.
* Automatic http failover base on Srv Dns * update failover condition * Logging messages for auto failover * Not use generic method * Remove useless constructor * Failover to TCP when response is truncated * Error handling of TCP failover * Resolve comments and add thrid party notice * Fix the tests * Fix the tess * Fix the test * Update to use DnsClient.NET * Remove unecessary code * Resolve conflict * Stop once it has been auto failovered * Update GetAvaliableClient method * Resolve comments * Resolve comments * Internalize getting dynamic client in ConfigurationClientManager * Fix format * Fix format * Resolve comments * Resolve comments * Resolve comments * Resolve comments * Resolve comments * Resolve comments * Resolve comments * Not await RefreshFallbackClients when startup * Revert non-await RefreshFallbackClients * Validate domain of Srv endpoints (#489) * Validate domain of Srv endpoints * Resolve comments * Resolve comments * Create object in using * Add the origin host to the srv lookup result * Validate endpoint by domain name (#499) * Validate endpoint by domain name * Resolve comments * Remove *-test * Fix formatResolve conflict * Use await using for IAsyncDisposible * Replace IAsyncEumerable with ValueTask<IEnumerable> * Resolve comments * Remove cancallation token * Resolve comments
* Automatic http failover base on Srv Dns (#442) * Automatic http failover base on Srv Dns * update failover condition * Logging messages for auto failover * Not use generic method * Remove useless constructor * Failover to TCP when response is truncated * Error handling of TCP failover * Resolve comments and add thrid party notice * Fix the tests * Fix the tess * Fix the test * Update to use DnsClient.NET * Remove unecessary code * Resolve conflict * Stop once it has been auto failovered * Update GetAvaliableClient method * Resolve comments * Resolve comments * Internalize getting dynamic client in ConfigurationClientManager * Fix format * Fix format * Resolve comments * Resolve comments * Resolve comments * Resolve comments * Resolve comments * Resolve comments * Resolve comments * Not await RefreshFallbackClients when startup * Revert non-await RefreshFallbackClients * Validate domain of Srv endpoints (#489) * Validate domain of Srv endpoints * Resolve comments * Resolve comments * Create object in using * Add the origin host to the srv lookup result * Validate endpoint by domain name (#499) * Validate endpoint by domain name * Resolve comments * Remove *-test * Fix formatResolve conflict * Use await using for IAsyncDisposible * Replace IAsyncEumerable with ValueTask<IEnumerable> * Resolve comments * Remove cancallation token * Resolve comments * Refactor and move backoff logic to AzureAppConfigurationProvider (#508) * move backoff logic to provider, can't access backoffendtime from manager for dynamic clients * add condition before calling getclients in refreshasync, add refreshclients method * fix logic to check if backoff ended for client * fix indents * check client backoff before refresh, fix tests in progress * remove tolist * comment changes, fix refresh tests * reduce minimum backoff when testing * fix tests, fix typo with failover logic * update comments in tests * resolve PR comments * resolve small comments * fix tests backoff duration * fix tests backoff duration * fix error from merging main branch in * add back fix for attempts backoff calculation --------- Co-authored-by: Richard chen <99175581+RichardChen820@users.noreply.github.com>
Automatic http failover base on Srv Dns
For supporting automatic Http failover, there are SRV DNS record set being added for each replica by RP.
Each replica has
_origin._tcp.*
SRV record been created to indicate the origin store of the replica_origin._tcp. contoso-replica1.azconfig.io --> SRV 1 100 443 contoso.azconfig.io
Each replica has
_alt*._tcp.*
SRV record created in _alt{i}._tcp record set for querying auto failover hosts._alt0._tcp. contoso.azconfig.io --> SRV 1 100 443 contoso-replica1.azconfig.io
_alt0._tcp. contoso.azconfig.io --> SRV 1 100 443 contoso-replica1.azconfig.io
"i" is a self-increment index since each record set only supports to accommodate up to 20 records, if there're 30 replicas, it will dispersed into two _alt* record sets.
Design: