-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Filebeat] Enable non-AWS S3 buckets for aws-s3 input #28234
Conversation
This pull request does not have a backport label. Could you fix it @legoguy1000? 🙏
NOTE: |
💚 Build Succeeded
Expand to view the summary
Build stats
Test stats 🧪
💚 Flaky test reportTests succeeded. 🤖 GitHub commentsTo re-run your PR in the CI, just comment with:
|
@andrewkroh can u let me know what u think of this? |
@legoguy1000 thanks for the contribution, the feature is welcome. I think we can anyway simplify the implementation and changes required with no need to introduce a My suggestion is to let the With pseudo-code something along this:
|
If not add something like |
7752a17
to
66f7fcb
Compare
@aspacca i removed the |
you are right about this: we had a discussion internally and between introducing a breaking change renaming I apologise for asking you to revert your change for it: you can add back |
@@ -33,6 +34,9 @@ type config struct { | |||
AWSConfig awscommon.ConfigAWS `config:",inline"` | |||
FileSelectors []fileSelectorConfig `config:"file_selectors"` | |||
ReaderConfig readerConfig `config:",inline"` // Reader options to apply when no file_selectors are used. | |||
PathStyle bool `config:"path_style"` | |||
RegionOverride string `config:"region_override"` | |||
ProviderOverride string `config:"provider_override"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anyway we can infer the provider from the endpoint or any API request?
If we can avoid to specify it explicitly and let the code find it it would be better, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be hard unless you just Didi it by domain name but even then it would make aws
into amazonaws
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an override config but also some auto detection based on domain names from the endpoints. Let me know what you think.
This pull request is now in conflicts. Could you fix it? 🙏
|
570e9b5
to
71dd1d4
Compare
This pull request is now in conflicts. Could you fix it? 🙏
|
@aspacca let me know what u think of the changes. Instead of |
5c33b48
to
5bdce86
Compare
The `aws-s3` input can also poll 3rd party S3 compatible services such as the self hosted Minio. | ||
Uisng non-AWS S3 compatible buckets require the use of `access_key_id` and `secret_access_key` for authentication. | ||
To specify the S3 bucket name, use the `bucket_name` config and the `endpoint` must be set to replace the default API endpoint. | ||
Services that have endpoints in the standard form of `https://s3.<region>.<domain>` only need the endpoint config set like `endpoint: <domain>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rephrase this to something like that.
`endpoint` can be either a full URI with scheme, that will be used as it is as endpoint of the service to connect to, or a single domain, that will be used to build the full endpoint URI for native AWS S3 along wit the region param.
If you use the native AWS S3 service you need to set only the domain and only in case your S3 bucket url is hosted on a custom domain.
No `endpoint` is needed to be configured if you use the native AWS S3 service without a custom domain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, let me know what you think of my reword. I will be the first to admit, i'm terrible at writing docs.
if c.FIPSEnabled && endpoint.Scheme != "" { | ||
return errors.New("fips_enabled cannot be used with a non-AWS S3 bucket.") | ||
} | ||
if c.PathStyle && c.AWSConfig.Endpoint == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c.AWSConfig.Endpoint
can be not empty and still we are using native AWS S3 service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct but short of checking all the other AWS domains I thought this was a quick way to make sure that the 99% of users that use the normal AWS S3 service don't accidently use path_style
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would still be a problem.
let's separate bucket_arn
from bucket_name
and use the first for the native AWS buckets and the second for compatible AWS buckets. you can rename back to non_aws_bucket_name
to clarify better the difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this validation logic is still not 100% correct.
please, now that we have c.NonAWSBucketName
test on either it or c.BucketARN
for the other config param applicable only to native/not native AWS buckets
if c.PathStyle && c.AWSConfig.Endpoint == "" { | ||
return errors.New("Cannot use path style when using AWS native S3 services") | ||
} | ||
if c.ProviderOverride != "" && c.AWSConfig.Endpoint != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c.AWSConfig.Endpoint
can be not empty and still we are using native AWS S3 service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above
This pull request is now in conflicts. Could you fix it? 🙏
|
2804897
to
8c21929
Compare
/test |
This pull request is now in conflicts. Could you fix it? 🙏
|
I will try to fix the conflict and CI test failures today so we can get this merged. Is there any other changes wanted @kaiyan-sheng @aspacca ?? |
Looks good on my side @legoguy1000 ! Could you add the new config parameter into documentation in x-pack/filebeat/_meta/config/filebeat.inputs.reference.xpack.yml.tmpl please? Thanks!! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please, look at my comments @legoguy1000 , thanks
if c.FIPSEnabled && endpoint.Scheme != "" { | ||
return errors.New("fips_enabled cannot be used with a non-AWS S3 bucket.") | ||
} | ||
if c.PathStyle && c.AWSConfig.Endpoint == "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would still be a problem.
let's separate bucket_arn
from bucket_name
and use the first for the native AWS buckets and the second for compatible AWS buckets. you can rename back to non_aws_bucket_name
to clarify better the difference.
@aspacca @kaiyan-sheng The unit tests seem to be failing here https://github.com/elastic/beats/pull/28234/files#diff-f345fd6a1f5ea9523117d4ead2e5f1d13fb82eb1c65a089fd34fcdd514916a96R173 with respect to the SQS tests. I don't see why it would error out as the code to download an object for S3 is no different whether polling or SQS. Any ideas?? |
@legoguy1000 |
you can debug/print
nil value, one of the following:
|
Looking at it, |
@legoguy1000 I'm a little skeptical about the Let me do some test on the mocking side |
I concur and appreciate the help. If u get it working, feel free to push the fix to the branch. |
Pinging @elastic/integrations (Team:Integrations) |
@legoguy1000 benchmark should not fail now, I pushed a commit on your branch |
This pull request is now in conflicts. Could you fix it? 🙏
|
683e7b1
to
dbf0f21
Compare
thanks for the patience @legoguy1000 :) |
I feel like I should say that to u. Thanks for the help. |
* Update `aws-s3` input to support non-AWS S3 buckets
What does this PR do?
This PR adds the ability to use non-AWS S3 bucket services with the aws-s3 input. With the new polling feature the input can now poll other S3 providers like Minio that use both virtual host style and path style buckets as well as the ability to use non SSL enabled buckets (if wanted). This also updates the
log.file.path
field to use the actual URL used to download the file instead of manually generating it with hard coded values likeamazonaws.com
.Why is it important?
To be able to ingest data from other S3 providers besides AWS.
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Author's Checklist
How to test this PR locally
Related issues
Use cases
Screenshots
Logs