Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multiline support to awss3 input #25710

Merged
merged 7 commits into from
May 17, 2021

Conversation

leehinman
Copy link
Contributor

@leehinman leehinman commented May 13, 2021

What does this PR do?

Adds multiline and encoding reader support to aws-s3 input. This does
not change the processing of JSON logs by aws-s3 input.

Why is it important?

This is needed so you can read logs that have embedded new lines. For
example XML Windows events.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

  • Need S3 bucket & SQS queue, then upload multiline logs.

Related issues

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label May 13, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label May 13, 2021
@leehinman leehinman requested review from kaiyan-sheng and marc-gr May 13, 2021 19:56
@leehinman leehinman force-pushed the 25249_awss3_input_multiline2 branch from 5b1a550 to 356cb8c Compare May 13, 2021 19:59
@mergify
Copy link
Contributor

mergify bot commented May 13, 2021

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b 25249_awss3_input_multiline2 upstream/25249_awss3_input_multiline2
git merge upstream/master
git push upstream 25249_awss3_input_multiline2

@elasticmachine
Copy link
Collaborator

elasticmachine commented May 13, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #25710 updated

  • Start Time: 2021-05-17T15:13:31.892+0000

  • Duration: 115 min 39 sec

  • Commit: 8a3b408

Test stats 🧪

Test Results
Failed 0
Passed 7031
Skipped 1193
Total 8224

Trends 🧪

Image of Build Times

Image of Tests

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 7031
Skipped 1193
Total 8224

Copy link
Member

@andrewkroh andrewkroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome 😄 . I took a very quick pass at it. Plan to look a little deeper and try it out tomorrow.

x-pack/filebeat/input/awss3/collector.go Outdated Show resolved Hide resolved
offset += int64(len(log))
event := createEvent(log, offset, info, objectHash, s3Ctx)
event := createEvent(string(message.Content), offset, info, objectHash, s3Ctx)
offset += int64(message.Bytes)
err = c.forwardEvent(event)
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you combine this into a one-liner, e.g. if err = c.forwardEvent(event); err != nil {.

x-pack/filebeat/input/awss3/collector.go Outdated Show resolved Hide resolved
VisibilityTimeout time.Duration `config:"visibility_timeout"`
AwsConfig awscommon.ConfigAWS `config:",inline"`
MaxBytes int `config:"max_bytes" validate:"min=0,nonzero"`
Multiline *multiline.Config `config:"multiline"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think these options should be available in the FileSelectorCfg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm we could...

The use case would be a single S3 bucket that has a mix of multiline and non-multiline log files.

I'll add it, and we can see if we like it.

Copy link
Member

@andrewkroh andrewkroh May 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parser options could be combined into a struct that is embedded in both config and FileSelectorCfg to avoid having to duplicate the same config struct tags.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of changing this so you always have a file_selector, the default is just to match any filename. That should make the config and code paths cleaner. I know we want to get a build out with multiline "soon". You OK with merging as is, with a new PR this week to clean this up?

x-pack/filebeat/docs/inputs/input-aws-s3.asciidoc Outdated Show resolved Hide resolved
Copy link
Contributor

@marc-gr marc-gr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think could be worth adding a new test with the multiline config in s3_integration_test.go?

@leehinman leehinman closed this May 14, 2021
@leehinman leehinman deleted the 25249_awss3_input_multiline2 branch May 14, 2021 14:48
@leehinman leehinman reopened this May 14, 2021
@leehinman leehinman restored the 25249_awss3_input_multiline2 branch May 14, 2021 14:51
@mergify
Copy link
Contributor

mergify bot commented May 14, 2021

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b 25249_awss3_input_multiline2 upstream/25249_awss3_input_multiline2
git merge upstream/master
git push upstream 25249_awss3_input_multiline2

@leehinman
Copy link
Contributor Author

Do you think could be worth adding a new test with the multiline config in s3_integration_test.go?

added it. running it is a little weird you have to setup S3 bucket & SQS ahead of time, then upload the 2 sample files, then run go test --tags=integration,aws

Long term I'd like to improve this.

@leehinman leehinman force-pushed the 25249_awss3_input_multiline2 branch from 52120e1 to da10563 Compare May 14, 2021 20:16
@marc-gr
Copy link
Contributor

marc-gr commented May 17, 2021

I think some files need to be formatted

Copy link
Contributor

@marc-gr marc-gr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the formatting is fixed

@leehinman leehinman merged commit 5f242e3 into elastic:master May 17, 2021
leehinman added a commit that referenced this pull request May 17, 2021
* Add multiline support to awss3 input

- only applies to non JSON logs

Closes #25249

Co-authored-by: Andrew Kroh <andrew.kroh@elastic.co>
(cherry picked from commit 5f242e3)
Copy link
Contributor

@kaiyan-sheng kaiyan-sheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for adding this feature!! Are you planning to add this into the AWS package later? I'm adding the label just to make sure we don't forget about it :)

@kaiyan-sheng kaiyan-sheng added the needs_integration_sync Changes in this PR need synced to elastic/integrations. label May 17, 2021
cachedout pushed a commit that referenced this pull request May 18, 2021
* Add multiline support to awss3 input

- only applies to non JSON logs

Closes #25249

Co-authored-by: Andrew Kroh <andrew.kroh@elastic.co>
(cherry picked from commit 5f242e3)
leehinman added a commit that referenced this pull request May 18, 2021
* Add multiline support to awss3 input

- only applies to non JSON logs

Closes #25249

Co-authored-by: Andrew Kroh <andrew.kroh@elastic.co>
(cherry picked from commit 5f242e3)

Co-authored-by: Lee Hinman <57081003+leehinman@users.noreply.github.com>
@leehinman leehinman deleted the 25249_awss3_input_multiline2 branch August 17, 2021 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v7.14.0 Automated backport with mergify enhancement needs_integration_sync Changes in this PR need synced to elastic/integrations.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Filebeat] Support multiline options in aws s3 input
5 participants