-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-39262: [C++][Azure][FS] Add default credential auth configuration #39263
GH-39262: [C++][Azure][FS] Add default credential auth configuration #39263
Conversation
eab9db9
to
56d796f
Compare
options.backend = AzureBackend::kAzurite; // Irrelevant for this test because it | ||
// doesn't connect to the server. | ||
ARROW_EXPECT_OK(options.ConfigureDefaultCredential("dummy-account-name")); | ||
EXPECT_OK_AND_ASSIGN(auto default_credential_fs, AzureFileSystem::Make(options)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that we can use Azurite with the DefaultAzureCredential
: https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azurite?tabs=visual-studio%2Cblob-storage#azure-sdks
Can we do an operation (CreateDir()
?) and check the result to verify whether this filesystem is valid or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a bit of a try but I couldn't get the SSL setup working. I'll give it another go but I don't have great hope for making it work in CI even if I get it working locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I'll also try it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried and understand why you said SSL.
Azure::Identity::DefaultAzureCredential
uses a Bearer token and Azure SDK for C++ rejects it with http: Bearer token authentication is not permitted for non TLS protected (https) endpoints.
If we want to use DefaultAzureCredential
with Azurite, we need to generate a key and certificate pair and use it.
I looked at how to set it to Azure SDK for C++. It seems that we need to BlobClientOptions::Transport::Transport
:
- https://github.com/Azure/azure-sdk-for-cpp/blob/e328665588d2998fea198972af90d5108e64a968/sdk/storage/azure-storage-blobs/inc/azure/storage/blobs/blob_options.hpp#L174
- https://github.com/Azure/azure-sdk-for-cpp/blob/e328665588d2998fea198972af90d5108e64a968/sdk/core/azure-core/inc/azure/core/internal/client_options.hpp#L97
- https://github.com/Azure/azure-sdk-for-cpp/blob/e328665588d2998fea198972af90d5108e64a968/sdk/core/azure-core/inc/azure/core/http/policies/policy.hpp#L221
If we set BlobClientOptions::Transport::Transport
, we need to specify curl based HTTP transport implementation or WinHTTP based HTTP transport implementation. They have different configurations for TLS...
How about using TestAzureHierarchicalNSFileSystem
to test DefaultAzureCredential
? If we use the real Azure service, we don't need to custom TLS configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would definitely be possible to test against real blob storage but there will be a significant amount of manual configuration for all the identities to test all the different authentications.
Then we need to provide details of these identities to TestAzureHierarchicalNSFileSystem
either they are required always or we need to add new versions for each auth e.g. TestAzureHierarchicalNSFileSystemWithServicePrincipal
.
Also there are some Auth methods that are not feasible to test. For example managed identity can only work on Azure VMs and workload identity can only work in kubernetes.
Personally I don't think this is worthwhile to make a more comprehensive test because of how little complexity there is outside the Azure SDK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this test is enough, because as @pitrou said the other day: "we are not re-implementing the Azure SDK".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also there are some Auth methods that are not feasible to test. For example managed identity can only work on Azure VMs and workload identity can only work in kubernetes.
Oh...
OK. I see.
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
options.backend = AzureBackend::kAzurite; // Irrelevant for this test because it | ||
// doesn't connect to the server. | ||
ARROW_EXPECT_OK(options.ConfigureDefaultCredential("dummy-account-name")); | ||
EXPECT_OK_AND_ASSIGN(auto default_credential_fs, AzureFileSystem::Make(options)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also there are some Auth methods that are not feasible to test. For example managed identity can only work on Azure VMs and workload identity can only work in kubernetes.
Oh...
OK. I see.
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 659b231. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 4 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…39319) ### Rationale for this change Workload identity is a useful Azure authentication method. ### What changes are included in this PR? Implement `AzureOptions::ConfigureWorkloadIdentityCredential` ### Are these changes tested? Added a simple test initialising a fileystem using `ConfigureWorkloadIdentityCredential`. This is not the most comprehensive test but its the same as what we agreed on for #39263. ### Are there any user-facing changes? Workload identity authentication is now supported. * Closes: #39318 Authored-by: Thomas Newton <thomas.w.newton@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…39321) ### Rationale for this change Workload identity is a useful Azure authentication method. Also I failed to set the account_name correctly for a bunch of auths (I think this got lost in a rebase then I copy pasted the broken code). ### What changes are included in this PR? - Make filesystem initialisation fail if `account_name_.empty()`. This prevents the account name configuration bug we had. Also added a test asserting that filesystem initialization fails in this case. - Remove account name configuration on all auth configs, in favour of setting in separately from the auth configuration. - Implement `AzureOptions::ConfigureManagedIdentityCredential` ### Are these changes tested? Added a simple test initialising a filesystem using `ConfigureManagedIdentityCredential`. This is not the most comprehensive test but its the same as what we agreed on for #39263. ### Are there any user-facing changes? Managed identity authentication is now supported. * Closes: #39320 Authored-by: Thomas Newton <thomas.w.newton@gmail.com> Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
…ation (apache#39263) ### Rationale for this change Default credential is a useful auth option. ### What changes are included in this PR? Implement `AzureOptions::ConfigureDefaultCredential` plus a little bit of plumbing to go around it. Created a simple test. ### Are these changes tested? Added a simple unittest that everything initialises happily. This does not actually test a successful authentication. I think to do a real authentication with Azure we would need to run the test against real blob storage and we would need to create various identities which are non-trivial to create. Personally I think this is ok because all the complexity is abstracted away by the Azure SDK. ### Are there any user-facing changes? * Closes: apache#39262 Lead-authored-by: Thomas Newton <thomas.w.newton@gmail.com> Co-authored-by: Sutou Kouhei <kou@cozmixng.org> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…tion (apache#39319) ### Rationale for this change Workload identity is a useful Azure authentication method. ### What changes are included in this PR? Implement `AzureOptions::ConfigureWorkloadIdentityCredential` ### Are these changes tested? Added a simple test initialising a fileystem using `ConfigureWorkloadIdentityCredential`. This is not the most comprehensive test but its the same as what we agreed on for apache#39263. ### Are there any user-facing changes? Workload identity authentication is now supported. * Closes: apache#39318 Authored-by: Thomas Newton <thomas.w.newton@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…ion (apache#39321) ### Rationale for this change Workload identity is a useful Azure authentication method. Also I failed to set the account_name correctly for a bunch of auths (I think this got lost in a rebase then I copy pasted the broken code). ### What changes are included in this PR? - Make filesystem initialisation fail if `account_name_.empty()`. This prevents the account name configuration bug we had. Also added a test asserting that filesystem initialization fails in this case. - Remove account name configuration on all auth configs, in favour of setting in separately from the auth configuration. - Implement `AzureOptions::ConfigureManagedIdentityCredential` ### Are these changes tested? Added a simple test initialising a filesystem using `ConfigureManagedIdentityCredential`. This is not the most comprehensive test but its the same as what we agreed on for apache#39263. ### Are there any user-facing changes? Managed identity authentication is now supported. * Closes: apache#39320 Authored-by: Thomas Newton <thomas.w.newton@gmail.com> Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
…ation (apache#39263) ### Rationale for this change Default credential is a useful auth option. ### What changes are included in this PR? Implement `AzureOptions::ConfigureDefaultCredential` plus a little bit of plumbing to go around it. Created a simple test. ### Are these changes tested? Added a simple unittest that everything initialises happily. This does not actually test a successful authentication. I think to do a real authentication with Azure we would need to run the test against real blob storage and we would need to create various identities which are non-trivial to create. Personally I think this is ok because all the complexity is abstracted away by the Azure SDK. ### Are there any user-facing changes? * Closes: apache#39262 Lead-authored-by: Thomas Newton <thomas.w.newton@gmail.com> Co-authored-by: Sutou Kouhei <kou@cozmixng.org> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…tion (apache#39319) ### Rationale for this change Workload identity is a useful Azure authentication method. ### What changes are included in this PR? Implement `AzureOptions::ConfigureWorkloadIdentityCredential` ### Are these changes tested? Added a simple test initialising a fileystem using `ConfigureWorkloadIdentityCredential`. This is not the most comprehensive test but its the same as what we agreed on for apache#39263. ### Are there any user-facing changes? Workload identity authentication is now supported. * Closes: apache#39318 Authored-by: Thomas Newton <thomas.w.newton@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…ion (apache#39321) ### Rationale for this change Workload identity is a useful Azure authentication method. Also I failed to set the account_name correctly for a bunch of auths (I think this got lost in a rebase then I copy pasted the broken code). ### What changes are included in this PR? - Make filesystem initialisation fail if `account_name_.empty()`. This prevents the account name configuration bug we had. Also added a test asserting that filesystem initialization fails in this case. - Remove account name configuration on all auth configs, in favour of setting in separately from the auth configuration. - Implement `AzureOptions::ConfigureManagedIdentityCredential` ### Are these changes tested? Added a simple test initialising a filesystem using `ConfigureManagedIdentityCredential`. This is not the most comprehensive test but its the same as what we agreed on for apache#39263. ### Are there any user-facing changes? Managed identity authentication is now supported. * Closes: apache#39320 Authored-by: Thomas Newton <thomas.w.newton@gmail.com> Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Rationale for this change
Default credential is a useful auth option.
What changes are included in this PR?
Implement
AzureOptions::ConfigureDefaultCredential
plus a little bit of plumbing to go around it.Created a simple test.
Are these changes tested?
Added a simple unittest that everything initialises happily. This does not actually test a successful authentication. I think to do a real authentication with Azure we would need to run the test against real blob storage and we would need to create various identities which are non-trivial to create. Personally I think this is ok because all the complexity is abstracted away by the Azure SDK.
Are there any user-facing changes?