Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add conditions to copy_fields processor #6730

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

belimawr
Copy link
Contributor

@belimawr belimawr commented Feb 6, 2025

What does this PR do?

This commit adds conditions to the copy_fields processor from the monitoring Filebeat to prevent it from failing and spamming the event logger at debug level with:
target field xxx already exists, drop or rename this field first

Why is it important?

It makes the debug logs more useful by remove unnecessary entries from our monitoring Filebeat

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

## Disruptive User Impact

How to test this PR locally

  1. Create a log file with more than 1kb: docker run -it --rm mingrammer/flog -n 20 > /tmp/flog.log

  2. Package the Elastic-Agent from this PR

  3. Start the Elastic-Agent with the following configuration

    elastic-agent.yml

    outputs:
      default:
        type: elasticsearch
        hosts:
          - https://localhost:9200
        protocol: https
        username: elastic
        password: changeme
        preset: latency
        ssl.verification_mode: none
    
    inputs:
      - type: filestream
        id: your-input-id
        log_level: debug
        streams:
          - id: your-filestream-stream-id
            data_stream:
              dataset: generic
            paths:
              - /tmp/flog.log
    
    agent.monitoring:
      enabled: true
      logs: true
      metrics: false
      pprof.enabled: false
      use_output: default
    
    # Needed if you already have an Elastic-Agent running on your machine
    # That's very helpful for running the tests locally
    agent.monitoring:
      http:
        enabled: false
        port: 7002
    
    agent.grpc:
      address: localhost
      port: 7001
    
    

  4. Ensure the event logs (data/elastic-agent*/logs/events/*.ndjson) do not contain any messages from the copy_fields processor. The following command must return 0:

    cat data/elastic-agent*/logs/events/*.ndjson| grep copy_fields |wc -l
    

Related issues

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

@belimawr belimawr added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Feb 6, 2025
@belimawr belimawr self-assigned this Feb 6, 2025
@belimawr belimawr added the backport-active-all Automated backport with mergify to all the active branches label Feb 6, 2025
Copy link
Contributor

mergify bot commented Feb 6, 2025

This pull request does not have a backport label. Could you fix it @belimawr? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-./d./d is the label that automatically backports to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

This commit adds conditions to the `copy_fields` processor from the
monitoring Filebeat to prevent it from failing and spamming the event
logger at debug level with:
`target field xxx already exists, drop or rename this field first`
@belimawr belimawr force-pushed the 5299-fix-coppy_fields-processor branch from fdc8a60 to 3fa11af Compare February 6, 2025 15:28
@@ -128,7 +128,7 @@ func startMockES(t *testing.T) string {
uid,
clusterUUID,
nil,
time.Now().Add(time.Hour), 0, 0, 0, 100, 0))
time.Now().Add(time.Hour), 0, 0, 0, 0, 0))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes the mock to accept all the events, instead of returning an error. The test that relied on the error has been updated.

@belimawr belimawr marked this pull request as ready for review February 7, 2025 15:02
@belimawr belimawr requested a review from a team as a code owner February 7, 2025 15:02
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@belimawr
Copy link
Contributor Author

belimawr commented Feb 7, 2025

I looked at the test failures, they seem unrelated to the PR, likely some flakiness communicating with the Elastic-Agent during the test.

Copy link
Contributor

@pkoutsovasilis pkoutsovasilis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@belimawr could you write some unit-tests around that to increase the coverage?

@belimawr
Copy link
Contributor Author

belimawr commented Feb 7, 2025

@belimawr could you write some unit-tests around that to increase the coverage?

I can if you insist. The only unit test for it is to test if the conditions are introduced to the policy, but not their effectiveness. I'd test something very generic like: "all copy_fields processor contain a when..not.has.fields condition matching the fields.to for the Filebeat monitoring.

I'm not convinced that this is what we want to enforce.

There is an integration test that ensures the correct behaviour: we do not spam our logs with "failed processors" message from processors we control and can avoid their failure.

What do you think?

@pkoutsovasilis
Copy link
Contributor

I can if you insist. The only unit test for it is to test if the conditions are introduced to the policy, but not their effectiveness. I'd test something very generic like: "all copy_fields processor contain a when..not.has.fields condition matching the fields.to for the Filebeat monitoring.

I'm not convinced that this is what we want to enforce.

There is an integration test that ensures the correct behaviour: we do not spam our logs with "failed processors" message from processors we control and can avoid their failure.

What do you think?

I kinda see multiple reasons to have unit-tests:

  1. check that your code, ideally your function(s), work as expected without any external dependencies
  2. are actually helpful to a newcomer to this code to understand how the code should work
  3. having unit-tests can help identify if an integration test is failing because of your code or another service that is part of the integration test

now if you insist on not writing any, I am not here to enforce anything 🙂

@belimawr
Copy link
Contributor Author

@pkoutsovasilis I tried the if/then/else possibility we discussed, however it does not work under Elastic-Agent :/ It breaks somewhere while the policy is being parsed. I got the logic working on a standalone Filebeat, but the Elastic-Agent could not get it sent to Filebeat :/

@pkoutsovasilis
Copy link
Contributor

@pkoutsovasilis I tried the if/then/else possibility we discussed, however it does not work under Elastic-Agent :/ It breaks somewhere while the policy is being parsed. I got the logic working on a standalone Filebeat, but the Elastic-Agent could not get it sent to Filebeat :/

hmmm that is really interesting 🤔 could you please create an issue with your findings to not lose track of such behaviour. Also I did notice the build failures and this is related to the unit-test of actually my own PR that got merged yesterday. I have opened a PR that mitigates this failures here but you would have once more to merge main when this goes in

@belimawr
Copy link
Contributor Author

@pkoutsovasilis here is the issue: #6820

Copy link

Quality Gate failed Quality Gate failed

Failed conditions
0.0% Coverage on New Code (required ≥ 40%)

See analysis details on SonarQube

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-active-all Automated backport with mergify to all the active branches Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Event log file gets flooded with copy_fields processor error
5 participants