Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grep Filter Plugin Exclude Not Matching Issue #8098

Closed
WTThomas1 opened this issue Oct 26, 2023 · 8 comments
Closed

Grep Filter Plugin Exclude Not Matching Issue #8098

WTThomas1 opened this issue Oct 26, 2023 · 8 comments
Labels
Stale status: waiting-for-triage troubleshooting waiting-for-user Waiting for more information, tests or requested changes

Comments

@WTThomas1
Copy link

Bug Report

Describe the bug
In it's simplest form, using more than one space to separate the "Exclude" key and the regex value appears to be causing it not to match.

Specifically, using the filter:

[FILTER]
    Name grep
    Match test_input

With either of the following exclude lines:

    Exclude                  log                 .*?\s+DEBUG\s+[\s\S]+
    Exclude                  log                 /.*?\s+DEBUG\s+[\s\S]+/

Does not match a log line that it should. However, just removing the additional spaces between the Exclude Key, Key, and Regex:

    Exclude log .*?\s+DEBUG\s+[\s\S]+
    Exclude log /.*?\s+DEBUG\s+[\s\S]+/

Works as expected.

The documentation page does show more than one space used in multiple locations which would seem to indicate there is a documentation issue or a bug.

To Reproduce

2024-01-01 13:14:15,161 DEBUG [NOTTHREAD:testing[SessionID:TESTING:20240101131415167:-1]]  FLUENTBIT TEST 1

The combined version of the configuration file that was used to verify the issue (after removing all extra spaces between Keys and values):

[SERVICE]
    Log_Level trace
    Log_File /tmp/fluentbit_debug.log

[INPUT]
    Name tail
    Path /tmp/testInput.log
    Path_Key filePath
    Tag test_input
    Read_from_Head true
    Skip_Empty_Lines On
    Skip_Long_Lines On
    Buffer_Chunk_Size 256KB
    Buffer_Max_Size 5MB
    Mem_Buf_Limit 25MB

[FILTER]
    Name multiline
    Match test_input
    multiline.key_content log
    multiline.parser multiline_test


# Exclude Debug Level
[FILTER]
    Name grep
    Match test_input
    
    # 1. Works...
    Exclude log .*?\s+DEBUG\s+[\s\S]+
    
    # 2. Does not work... Nothing Excluded
    #Exclude                  log                 .*?\s+DEBUG\s+[\s\S]+
    
    # 3. Works...
    #Exclude log /.*?\s+DEBUG\s+[\s\S]+/
    
    # 4. Does not work... Nothing Excluded
    #Exclude                  log                 /.*?\s+DEBUG\s+[\s\S]+/
    
    # Additional Tests quoting the value just in case...
    # 5. Does not work... Nothing Excluded
    #Exclude log ".*?\s+DEBUG\s+[\s\S]+"
    
    # 6. Does not work... Nothing Excluded
    #Exclude                  log                 ".*?\s+DEBUG\s+[\s\S]+"
    
    # 5. Does not work... Nothing Excluded
    #Exclude log "/.*?\s+DEBUG\s+[\s\S]+/"
    
    # 6. Does not work... Nothing Excluded
    #Exclude                  log                 "/.*?\s+DEBUG\s+[\s\S]+/"

# Parse Out Structured Data
[FILTER]
    Name parser
    Match test_input
    key_name log
    Parser structured_data_parser
    Reserve_Data true

#  Add Attributes
[FILTER]
    Name record_modifier
    Match test_input
    Record logtype test
    Record env QA
    Record platform test_but_interact
    Record purpose test
    Record role none

[OUTPUT]
    Name file
    Match *
    File /tmp/testOutput.log

With a parser configuration file defined as:

[MULTILINE_PARSER]
    name multiline_test
    type regex
    flush_timeout 1000
    
    # rules |   state name  | regex pattern                               | next state
    # ------|---------------|---------------------------------------------|------------
    rule "start_state" "/(^\s*(?:(?:[\d-]+) (?:[\d:,]+))[\s\S]*)/" "cont"
    rule "cont" "/(^\s*(?!(?:[\d-]+) (?:[\d:,]+))[\s\S]*)/" "cont"

[PARSER]
    Name structured_data_parser
    Format regex
    Regex /^\s*(?<logtimestamp>(?:[\d-]+) (?:[\d:,]+))\s+(?<level>\S+)\s+(?:\[(?<thread>[^\[\]]+)(?:\[SessionID:(?<sessionid>(?:N\/A|(?<customerNumber>[^:]+))[^\[\]]+)\])?\]\s+)?(?:(?<classname>(?:com|org|net)\.\S+)\s+-\s+)?(?<message>[\s\S]+)/m
    Time_Key logtimestamp
    # 2023-10-25 10:01:09,722
    Time_Format %Y-%m-%d %H:%M:%S,%L
  • Steps to reproduce the problem:
  1. Comment / Uncomment the appropriate Exclude line
  2. Run using testInput.log as the input.
  3. Failing versions of the Exclude line will result in all 20 lines ending up in the output. See: testOutput_fail.log
  4. Successful versions will result in only the two non-DEBUG lines in the output file. See: testOutput_success.log

Expected behavior

Based on the documentation, regardless of the number of spaces between the "Exclude" Key, Key, and Regex, We should see any line containing "DEBUG" with at least one space on either side excluded.

Screenshots
N/A

Your Environment

  • Version used: 2.0.8
  • Configuration: See Above
  • Environment name and version (e.g. Kubernetes? What version?): New Relic Infrastructure Agent - 1.47.2
  • Operating System and version: Linux - RHEL 7
  • Filters and plugins: Grep Filter Plugin

Additional context

Making a very long story short, this issue was discovered as part of getting multiline parsing working with the New Relic Infrastructure Agent. This is my first time working with Fluentbit in any form so I was assuming it was my fault until I discovered removing the additional spaces fixed the issue. Again, I'm not sure if this is just a documentation issue due to the need to maintain space sensitivity for the regex or if this is a bug and additional spaces should be trimmed. I'm hoping this is fixed and/or this report helps others with the same issue.

@patrick-stephens
Copy link
Contributor

Is it reproducible with the latest 2.1.10?

@patrick-stephens patrick-stephens added the waiting-for-user Waiting for more information, tests or requested changes label Nov 3, 2023
@WTThomas1
Copy link
Author

WTThomas1 commented Nov 3, 2023

I updated to the latest version of Fluent-bit:

[USER@HOSTNAME bin]# ./fluent-bit -V
Fluent Bit v2.1.10
Git commit:

And retested using the same configuration and input file with the same result. Number 1 and Number 3 both resulted in the correct behavior with all "DEBUG" lines excluded. All other variations resulted in all lines being retained. Updated output log (with added separators: testOutput.log )

@MrPibody7
Copy link
Collaborator

I have reproduced the behavior described and tested an additional case in which there is only one space between the Key and the regular expression

Exclude                  log /.*?\s+DEBUG\s+[\s\S]+/

it works, too. The extra spaces are interpreted as part of the regular expression. You can check it with Rubular.
More comments once I have finished some additional tests.

@WTThomas1
Copy link
Author

Good catch and I should have tested that case as well. I'll note the documentation does not appear to show more than one space between the key and regex in any of the examples so this could just be me interpreting it as being allowed when it's not intended to be.

@MrPibody7
Copy link
Collaborator

MrPibody7 commented Nov 3, 2023

Yes, you're right.
Reading the Docs, it seems it is working as expected.
However, I tested adding more than one space between the Regex key and the regular expression value for a tail parser definition in the parsers.conf file, and it works.

[PARSER]
    Name   my-parser1
    Format regex
    Regex              ^(?<num>\d{1,3} |)\<(?<pri>[0-9]{1,5})\>1 (?<time1>[^ ]+) (?<host>[^ ]+) (?<ident>[^ ]+) (?<pid>[-0-9]+) (?<msgid>[^ ]+) (?<extradata>(\[(.*)\]|-)) (?<message>.+)$

Adding a note in the documentation may be necessary to highlight this particular difference.

Copy link
Contributor

github-actions bot commented Feb 3, 2024

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

@github-actions github-actions bot added the Stale label Feb 3, 2024
Copy link
Contributor

github-actions bot commented Feb 9, 2024

This issue was closed because it has been stalled for 5 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 9, 2024
@Niels-Be
Copy link

This exact issue took me a couple of hours to figure out why my regex does not match.

I think this should be fixed or at the very least document this with a warning in the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale status: waiting-for-triage troubleshooting waiting-for-user Waiting for more information, tests or requested changes
Projects
None yet
Development

No branches or pull requests

4 participants