Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metricbeat/module/mongodb/replstatus: Update getOpTimestamp in replstatus to fix sort and temp files generation issue #37688

Merged
merged 11 commits into from
Jan 25, 2024

Conversation

ritalwar
Copy link
Contributor

@ritalwar ritalwar commented Jan 22, 2024

Proposed commit message

Refactor getOpTimestamp method to use min and max aggregations for calculating first and last timestamps, addressing the sorting issue and reducing the generation of temporary files in the MongoDB.

Details:

In the MongoDB oplog, which records operations, each entry has a timestamp stored in a field called ts, refer this. This timestamp uses a special format called BSON. To find the first and last timestamps in the entire oplog efficiently, a MongoDB aggregation pipeline is used. This pipeline groups all documents into a singular group ("_id": 1), and the $min and $max operators are subsequently applied to the ts field within this consolidated group. This method, executed directly on the MongoDB server, proves to be both efficient and flexible.

 MongoDB, makes sure to always compare the part of the timestamp related to time (time_t) before the part related to the order of operations (ordinal), refer. This consistency ensures that timestamps are correctly understood, no matter the platform.

In contrast to an earlier implementation that utilized the $natural operator for sorting, the adoption of $min and $max eliminates inefficiencies associated with excessive memory consumption during sorting operations. The removal of natural ordering, which was unsuitable for large datasets, also addresses errors related to memory limits and the automatic creation of temporary files, a behavior introduced in MongoDB version 6.0 for processes exceeding 100 MB of RAM usage as per this doc.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Related issues

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jan 22, 2024
Copy link
Contributor

mergify bot commented Jan 22, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @ritalwar? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@elasticmachine
Copy link
Collaborator

❕ Build Aborted

There is a new build on-going so the previous on-going builds have been aborted.

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Start Time: 2024-01-22T13:11:27.595+0000

  • Duration: 6 min 34 sec

Steps errors 1

Expand to view the steps failures

Error signal
  • Took 0 min 0 sec . View more details here
  • Description: tar: step failed with error null

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@elasticmachine
Copy link
Collaborator

💔 Build Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Duration: 48 min 25 sec

Pipeline error 1

This error is likely related to the pipeline itself. Click here
and then you will see the error (either incorrect syntax or an invalid configuration).

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@ritalwar ritalwar changed the title Update getOpTimestamp implementation [Mongodb] Update getOpTimestamp in replstatus to fix sort and temp files generation issue. Jan 23, 2024
@elasticmachine
Copy link
Collaborator

💔 Build Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Duration: 50 min 59 sec

Pipeline error 1

This error is likely related to the pipeline itself. Click here
and then you will see the error (either incorrect syntax or an invalid configuration).

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@ritalwar ritalwar marked this pull request as ready for review January 23, 2024 09:50
@ritalwar ritalwar requested a review from a team as a code owner January 23, 2024 09:50
@ritalwar ritalwar added Team:Obs-InfraObs Label for the Observability Infrastructure Monitoring team and removed needs_team Indicates that the issue/PR needs a Team:* label labels Jan 23, 2024
@elasticmachine
Copy link
Collaborator

💔 Build Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2024-01-23T09:53:47.994+0000

  • Duration: 51 min 6 sec

Test stats 🧪

Test Results
Failed 0
Passed 4563
Skipped 908
Total 5471

Steps errors 3

Expand to view the steps failures

metricbeat-packaging-linux - mage package
  • Took 11 min 53 sec . View more details here
  • Description: mage package
metricbeat-packaging-arm-ubuntu-2204-aarch64 - mage package
  • Took 6 min 15 sec . View more details here
  • Description: mage package
Error signal
  • Took 0 min 0 sec . View more details here
  • Description: Error 'hudson.AbortException: script returned exit code 1'

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@harnish-elastic
Copy link
Contributor

@ritalwar Is there any unit test case file present that you need to update as per code changes?

@devamanv
Copy link
Contributor

Also, please make sure that unit tests are also updated.

Co-authored-by: Aman <38116245+devamanv@users.noreply.github.com>
@ritalwar
Copy link
Contributor Author

@ritalwar Is there any unit test case file present that you need to update as per code changes?

No unit tests currently exist that require updating. Please find the results of manual tests conducted by @aliabbas-elastic .

@lalit-satapathy lalit-satapathy requested a review from shmsr January 24, 2024 10:22
@elasticmachine
Copy link
Collaborator

💔 Build Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2024-01-24T10:19:03.096+0000

  • Duration: 34 min 30 sec

Test stats 🧪

Test Results
Failed 0
Passed 388
Skipped 31
Total 419

Steps errors 31

Expand to view the steps failures

Show only the first 10 steps failures

x-pack/metricbeat-pythonIntegTest - mage pythonIntegTest
  • Took 5 min 57 sec . View more details here
  • Description: mage pythonIntegTest
x-pack/metricbeat-pythonIntegTest - mage pythonIntegTest
  • Took 1 min 33 sec . View more details here
  • Description: mage pythonIntegTest
x-pack/metricbeat-pythonIntegTest - mage pythonIntegTest
  • Took 1 min 35 sec . View more details here
  • Description: mage pythonIntegTest
x-pack/metricbeat-windows-2022-windows-2022 - mage build unitTest
  • Took 3 min 24 sec . View more details here
  • Description: mage build unitTest
x-pack/metricbeat-windows-2022-windows-2022 - mage build unitTest
  • Took 0 min 16 sec . View more details here
  • Description: mage build unitTest
x-pack/metricbeat-windows-2022-windows-2022 - mage build unitTest
  • Took 0 min 16 sec . View more details here
  • Description: mage build unitTest
x-pack/metricbeat-windows-2016-windows-2016 - mage build unitTest
  • Took 4 min 47 sec . View more details here
  • Description: mage build unitTest
x-pack/metricbeat-windows-2016-windows-2016 - mage build unitTest
  • Took 0 min 17 sec . View more details here
  • Description: mage build unitTest
x-pack/metricbeat-windows-2016-windows-2016 - mage build unitTest
  • Took 0 min 17 sec . View more details here
  • Description: mage build unitTest
Error signal
  • Took 0 min 0 sec . View more details here
  • Description: Error 'hudson.AbortException: script returned exit code 1'

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@elasticmachine
Copy link
Collaborator

❕ Build Aborted

There is a new build on-going so the previous on-going builds have been aborted.

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Start Time: 2024-01-24T11:09:19.699+0000

  • Duration: 27 min 45 sec

Test stats 🧪

Test Results
Failed 0
Passed 282
Skipped 15
Total 297

Steps errors 29

Expand to view the steps failures

Show only the first 10 steps failures

x-pack/metricbeat-pythonIntegTest - mage pythonIntegTest
  • Took 5 min 1 sec . View more details here
  • Description: mage pythonIntegTest
x-pack/metricbeat-pythonIntegTest - mage pythonIntegTest
  • Took 1 min 34 sec . View more details here
  • Description: mage pythonIntegTest
x-pack/metricbeat-pythonIntegTest - mage pythonIntegTest
  • Took 1 min 34 sec . View more details here
  • Description: mage pythonIntegTest
x-pack/metricbeat-windows-2022-windows-2022 - mage build unitTest
  • Took 4 min 33 sec . View more details here
  • Description: mage build unitTest
x-pack/metricbeat-windows-2022-windows-2022 - mage build unitTest
  • Took 0 min 16 sec . View more details here
  • Description: mage build unitTest
x-pack/metricbeat-windows-2022-windows-2022 - mage build unitTest
  • Took 0 min 16 sec . View more details here
  • Description: mage build unitTest
x-pack/metricbeat-windows-2016-windows-2016 - mage build unitTest
  • Took 4 min 32 sec . View more details here
  • Description: mage build unitTest
x-pack/metricbeat-windows-2016-windows-2016 - mage build unitTest
  • Took 0 min 16 sec . View more details here
  • Description: mage build unitTest
x-pack/metricbeat-windows-2016-windows-2016 - mage build unitTest
  • Took 0 min 16 sec . View more details here
  • Description: mage build unitTest
Error signal
  • Took 0 min 0 sec . View more details here
  • Description: Error 'hudson.AbortException: script returned exit code 1'

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@elasticmachine
Copy link
Collaborator

💔 Build Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Duration: 50 min 16 sec

Pipeline error 1

This error is likely related to the pipeline itself. Click here
and then you will see the error (either incorrect syntax or an invalid configuration).

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@elasticmachine
Copy link
Collaborator

💔 Build Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2024-01-24T16:24:59.300+0000

  • Duration: 50 min 24 sec

Test stats 🧪

Test Results
Failed 0
Passed 4563
Skipped 908
Total 5471

Steps errors 3

Expand to view the steps failures

metricbeat-packaging-linux - mage package
  • Took 11 min 19 sec . View more details here
  • Description: mage package
metricbeat-packaging-arm-ubuntu-2204-aarch64 - mage package
  • Took 6 min 24 sec . View more details here
  • Description: mage package
Error signal
  • Took 0 min 0 sec . View more details here
  • Description: Error 'hudson.AbortException: script returned exit code 1'

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@shmsr shmsr added the bug label Jan 24, 2024
@elasticmachine
Copy link
Collaborator

💔 Build Failed

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Duration: 51 min 55 sec

Pipeline error 1

This error is likely related to the pipeline itself. Click here
and then you will see the error (either incorrect syntax or an invalid configuration).

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@shmsr
Copy link
Member

shmsr commented Jan 24, 2024

@ritalwar Is there any unit test case file present that you need to update as per code changes?

No unit tests currently exist that require updating. Please find the results of manual tests conducted by @aliabbas-elastic .

Let's link this issue to the description itself.

Copy link
Member

@shmsr shmsr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially doubtful when I saw pipeline aggregation being used instead of FindOne in MongoDB. However, I now understand the benefits of using it, such as executing it in a single pipeline, which is particularly useful for large oplogs. The changes look good.

Pipeline aggregation is available in MongoDB starting from version 4.2. For more information, please refer to this.

As per the MongoDB integration documentation, the compatibility is guaranteed for versions >= 2.8. It appears that @aliabbas-elastic has tested it with v7.x, which could be the reason why we didn't face any issues.

I suggest to add:

  • New method for v4.1 and older versions too:
// kind of like this?
firstOp := bson.D{{"ts", bson.D{{"$min", 1}}}}
lastOp := bson.D{{"ts", bson.D{{"$max", 1}}}}

// followed by FindOne?
  • Keep this pipeline aggregation implementation as it is but limited to v4.2 and above.
  • @aliabbas-elastic After the change can you please test with any v3.x?

cc: @ritalwar @aliabbas-elastic

@shmsr
Copy link
Member

shmsr commented Jan 24, 2024

Also, there is one thing missing in the description of natural is that it is not the proper order in the events arrived in MongoDB but the order when written to disk. Usually, the disk order is similar to the insertion order except when documents move internally because of document growth due to update operations.

https://www.mongodb.com/docs/v2.2/reference/glossary/#term-natural-order

It is also an expensive op. So a good thing we moved away from it.

@shmsr shmsr changed the title [Mongodb] Update getOpTimestamp in replstatus to fix sort and temp files generation issue. [Mongodb] Update getOpTimestamp in replstatus to fix sort and temp files generation issue Jan 24, 2024
@shmsr shmsr changed the title [Mongodb] Update getOpTimestamp in replstatus to fix sort and temp files generation issue metricbeat/module/mongodb/replstatus: Update getOpTimestamp in replstatus to fix sort and temp files generation issue Jan 24, 2024
@ali786XI
Copy link
Contributor

@aliabbas-elastic After the change can you please test with any v3.x?

Sure

@ali786XI
Copy link
Contributor

Let's link this issue to the description itself.

Thanks @shmsr. Updated in the description

@ritalwar
Copy link
Contributor Author

Pipeline aggregation is available in MongoDB starting from version 4.2.

  • The changes made in the code are working fine with MongoDB version 3.6 and above.
  • Compatibility with aggregation pipelines is also assured for MongoDB versions 3.6 and above, see this.
  • The implemented changes have already been tested with MongoDB versions 3.6 and 7.x, and working fine.
  • Notably, MongoDB version 4.2 introduced the capability to update documents using aggregation pipelines with update stages. However, in the current implementation, a group stage is utilized, and this is supported for MongoDB version >= 3.6

@ritalwar
Copy link
Contributor Author

ritalwar commented Jan 25, 2024

As per the MongoDB integration documentation, the compatibility is guaranteed for versions >= 2.8.

Also, the integration documentation needs an update since the beats code and version support have been modified while ago.
Created an issue, to update the documentation.

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2024-01-25T06:55:12.374+0000

  • Duration: 51 min 19 sec

Test stats 🧪

Test Results
Failed 0
Passed 4563
Skipped 908
Total 5471

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@shmsr
Copy link
Member

shmsr commented Jan 25, 2024

Pipeline aggregation is available in MongoDB starting from version 4.2.

  • The changes made in the code are working fine with MongoDB version 3.6 and above.
  • Compatibility with aggregation pipelines is also assured for MongoDB versions 3.6 and above, see this.
  • The implemented changes have already been tested with MongoDB versions 3.6 and 7.x, and working fine.
  • Notably, MongoDB version 4.2 introduced the capability to update documents using aggregation pipelines with update stages. However, in the current implementation, a group stage is utilized, and this is supported for MongoDB version >= 3.6

Yes. I see https://www.mongodb.com/docs/v3.6/reference/method/db.collection.aggregate/#db.collection.aggregate with v3.6

LGTM then. Let's update documentation immediately right after the next version of beats is released.

@ritalwar ritalwar merged commit 77abcf3 into elastic:main Jan 25, 2024
36 checks passed
Scholar-Li pushed a commit to Scholar-Li/beats that referenced this pull request Feb 5, 2024
…lstatus` to fix sort and temp files generation issue (elastic#37688)

* Update getOpTimestamp implementation
ritalwar added a commit to ritalwar/beats that referenced this pull request Feb 5, 2024
…lstatus` to fix sort and temp files generation issue (elastic#37688)

* Update getOpTimestamp implementation
ritalwar added a commit that referenced this pull request Feb 6, 2024
…rt and temp files generation issue (#37855)

* metricbeat/module/mongodb/replstatus: Update `getOpTimestamp` in `replstatus` to fix sort and temp files generation issue (#37688)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Team:Obs-InfraObs Label for the Observability Infrastructure Monitoring team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants