Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 input: try to detect GZIPped objects #18764

Merged
merged 8 commits into from
Jun 3, 2020
Merged

S3 input: try to detect GZIPped objects #18764

merged 8 commits into from
Jun 3, 2020

Conversation

ycombinator
Copy link
Contributor

@ycombinator ycombinator commented May 27, 2020

What does this PR do?

This PR enhances the S3 input to try harder to automatically detect GZIPped objects.

Why is it important?

Sometimes objects returned by the S3 API are gzipped but don't come with proper headers indicating their gzipped nature. So we try different ways to auto-detect if the object is gzipped.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

  1. Follow the instructions in https://www.elastic.co/blog/getting-aws-logs-from-s3-using-filebeat-and-the-elastic-stack to setup an S3 bucket and SQS queue and configure Filebeat. When configuring Filebeat use the console output for easy debugging.

  2. Create a sample log file.

    cat <<EOF >sample.log
    May 28 03:00:52 Shaunaks-MacBook-Pro-Work syslogd[119]: ASL Sender Statistics
    May 28 03:03:29 Shaunaks-MacBook-Pro-Work VTDecoderXPCService[57953]: DEPRECATED USE in libdispatch client: Changing the target of a source after it has been activated; set a breakpoint on _dispatch_bug_deprecated to debug
    May 28 03:03:29 Shaunaks-MacBook-Pro-Work VTDecoderXPCService[57953]: DEPRECATED USE in libdispatch client: Changing target queue hierarchy after xpc connection was activated; set a breakpoint on _dispatch_bug_deprecated to debug
    May 28 03:03:53 Shaunaks-MacBook-Pro-Work VTDecoderXPCService[57953]: DEPRECATED USE in libdispatch client: Changing the target of a source after it has been activated; set a breakpoint on _dispatch_bug_deprecated to debug
    May 28 03:03:53 Shaunaks-MacBook-Pro-Work VTDecoderXPCService[57953]: DEPRECATED USE in libdispatch client: Changing target queue hierarchy after xpc connection was activated; set a breakpoint on _dispatch_bug_deprecated to debug
    EOF
    
  3. Gzip the sample log file while keeping the original around.

    gzip --keep sample.log
    
  4. Make a copy of the gzipped file, but remove the gzip extension to try and "fool" our S3 input.

    cp sample.log.gz sneaky.log
    
  5. Start Filebeat

    filebeat -e
    
  6. One by one, upload the 3 files to your S3 bucket. After each upload, check the Filebeat log to make sure there are no errors. Also make sure that the S3 input successfully generates the expected events.

Related issues

@ycombinator ycombinator added in progress Pull request is currently in progress. Team:Platforms Label for the Integrations - Platforms team labels May 27, 2020
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label May 27, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-platforms (Team:Platforms)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label May 27, 2020
@andresrc andresrc added [zube]: Inbox needs_team Indicates that the issue/PR needs a Team:* label labels May 27, 2020
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label May 27, 2020
@andresrc andresrc added [zube]: In Progress needs_team Indicates that the issue/PR needs a Team:* label and removed [zube]: Inbox labels May 27, 2020
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label May 27, 2020
@elasticmachine
Copy link
Collaborator

elasticmachine commented May 27, 2020

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #18764 updated]

  • Start Time: 2020-06-03T12:43:16.708+0000

  • Duration: 54 min 36 sec

Test stats 🧪

Test Results
Failed 0
Passed 2226
Skipped 382
Total 2608

@ycombinator ycombinator marked this pull request as ready for review May 28, 2020 08:50
@ycombinator ycombinator added [zube]: In Review needs_backport PR is waiting to be backported to other branches. v7.9.0 v8.0.0 Filebeat Filebeat and removed [zube]: In Progress in progress Pull request is currently in progress. labels May 28, 2020
@ycombinator ycombinator changed the title WIP: S3 input: try to detect GZIPped objects S3 input: try to detect GZIPped objects May 28, 2020
@zube zube bot changed the title S3 input: try to detect GZIPped objects WIP: S3 input: try to detect GZIPped objects May 28, 2020
@zube zube bot removed the Filebeat Filebeat label May 28, 2020
@ycombinator ycombinator added the test-plan Add this PR to be manual test plan label May 28, 2020
Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I have added a couple of questions and suggestions.

x-pack/filebeat/input/s3/input.go Show resolved Hide resolved
x-pack/filebeat/input/s3/input.go Show resolved Hide resolved
x-pack/filebeat/input/s3/input.go Show resolved Hide resolved
Copy link
Contributor

@exekias exekias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement!

@ycombinator ycombinator changed the title WIP: S3 input: try to detect GZIPped objects S3 input: try to detect GZIPped objects Jun 3, 2020
@ycombinator ycombinator merged commit 52765c4 into elastic:master Jun 3, 2020
@ycombinator ycombinator deleted the xp-fb-input-s3-gzip branch June 3, 2020 13:57
ycombinator added a commit that referenced this pull request Jun 3, 2020
* Try to detect GZIP based on content encoding header

* Check GZIP contents

* Log error before returning it

* Fixing typo

* Add comment

* Adding comment

* Adding CHANGELOG entry

* Add test case for empty contents
@andresrc andresrc added the test-plan-added This PR has been added to the test plan label Jul 14, 2020
ycombinator added a commit that referenced this pull request Aug 4, 2020
* S3 input: try to detect GZIPped objects (#18764)

* Try to detect GZIP based on content encoding header

* Check GZIP contents

* Log error before returning it

* Fixing typo

* Add comment

* Adding comment

* Adding CHANGELOG entry

* Add test case for empty contents

* Cleaning up CHANGELOG
melchiormoulin pushed a commit to melchiormoulin/beats that referenced this pull request Oct 14, 2020
* Try to detect GZIP based on content encoding header

* Check GZIP contents

* Log error before returning it

* Fixing typo

* Add comment

* Adding comment

* Adding CHANGELOG entry

* Add test case for empty contents
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…c#18944)

* S3 input: try to detect GZIPped objects (elastic#18764)

* Try to detect GZIP based on content encoding header

* Check GZIP contents

* Log error before returning it

* Fixing typo

* Add comment

* Adding comment

* Adding CHANGELOG entry

* Add test case for empty contents

* Cleaning up CHANGELOG
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs_backport PR is waiting to be backported to other branches. Team:Platforms Label for the Integrations - Platforms team test-plan Add this PR to be manual test plan test-plan-added This PR has been added to the test plan v7.9.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fails processing jsonl+gzip when using S3 Input plugin Decompress S3 files without ".gz" extension
5 participants