add Catalina domain specification to tomcat bean includes #7054

binaryphile · 2020-07-02T23:30:51Z

What does this PR do?

It adds the "domain" specifier to conf.yaml.example and metrics.yaml in the tomcat integration. The domain is "Catalina", in keeping with what tomcat publishes its JMX metrics under.

Motivation

We encountered an issue where JMX data would stop being reported by some of the agents on our production instances. This would be accompanied by a spike in CPU usage. The jmxfetch process under top showed an accumulation of hours of CPU time and continuous 60%+ CPU usage for hours on end. Once JMX reporting failed in this manner, it would not come back without restarting both the agent as well as the tomcat instance it was monitoring. A case was opened with datadog support, but it went unresolved.

By running "datadog-agent jmx collect", it was determined that jmxfetch was scanning hundreds of thousands of attributes. Unable to finish, it would never reach the portion of the log where it should say that it was done scanning attributes and start applying the metric filters. By examining the tomcat instance with jconsole, it was found that there were hundreds of thousands of attributes under the kafka.producer domain. These were primarily the attributes causing jmxfetch to run for so long.

However, the tomcat integration should not have been scanning those attributes in the first place. While jmxfetch has a builtin configuration to limit the number of metrics to report (not the issue here), so it will not use up arbitrary resources. However, It does not have such a hard limit on the attributes it will scan prior to applying the include filters in the integration. Its only protection from scanning an arbitrary number of attributes is by determining the proper beans to scan by examining the "bean" and "domain" specifiers in the includes. jmxfetch has a routine which determines the minimal covering set of beans to scan by examining those specifiers (I don't recall the name of the method, but it is easily searchable).

If it is not able to define those beans (say, because an include doesn't define either of those specifiers), it then resorts to scanning attributes under all beans. You can see this exhibited in the jmx collect output when it says, "Querying bean names on scope: :".

In our scenario, we had no use for the problematic kafka.producer bean and would not be scanning it. So we were surprised to find it being scanned by the default datadog tomcat integration. We found out that this is because the integration is missing any "domain" specifier on the includes. By adding "domain: Catalina" to each of the includes, we were able to verify that the log no longer output "Querying bean names on scope: :". jmxfetch then no longer scanned the problematic kafka.producer bean and stopped going off into la-la-land with CPU, and jmx reporting has been working for us since.

Additional Notes

I inspected the rest of the integration configurations. The following also leave out domain or bean specifiers, and would exhibit the same issue:

activemq
solr

jboss_wildfly relies on "bean_regex", which i haven't verified, but I would imagine is ok if it contained the domain. Still, I'm not sure the routine in jmxfetch which determines the minimal covering list of beans is smart enough to extract that from a regex, so it bears verification.

Review checklist (to be filled by reviewers)

Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
PR title must be written as a CHANGELOG entry (see why)
Files changes must correspond to the primary purpose of the PR as described in the title (small unrelated changes should have their own PR)
PR must have changelog/ and integration/ labels attached

hithwen · 2020-07-09T14:31:16Z

Hallo @binaryphile Thanks a lot for your contribution. It seems like your change was not created from the master branch though as the conf section on instances has not been there since april (8aa6d4e)

You only need to edit metrics.yaml now.

binaryphile · 2020-07-09T21:01:18Z

I'll see what I can do, I was surprised to see merge conflicts but thought maybe it was something else.

…

On Thu, Jul 9, 2020 at 10:31 AM Julia ***@***.***> wrote: Hallo @binaryphile <https://github.com/binaryphile> Thanks a lot for your contribution. It seems like your change was not created from the master branch though as the conf section on instances has not been there since april (8aa6d4e <8aa6d4e> ) You only need to edit metrics.yaml now. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7054 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABKSEFE55ZHRVGWAERTSWDR2XIEJANCNFSM4OPKHRXQ> .

codecov · 2020-07-09T21:26:03Z

Codecov Report

Merging #7054 into master will decrease coverage by 27.14%.
The diff coverage is n/a.

Impacted Files	Coverage Δ
ibm_db2/datadog_checks/ibm_db2/__init__.py
process/datadog_checks/process/lock.py
kafka/tests/test_check.py
kafka/tests/common.py
openldap/datadog_checks/openldap/__about__.py
mcache/datadog_checks/mcache/__init__.py
rethinkdb/tests/unit/test_config.py
cloud_foundry_api/tests/test_cloud_foundry_api.py
airflow/tests/conftest.py
gearmand/tests/test_integration_e2e.py
... and 772 more

binaryphile · 2020-07-09T21:27:28Z

Don't know how I got on the wrong branch in the first place but this update should tidy things up now.

binaryphile requested review from a team as code owners July 2, 2020 23:30

hithwen added changelog/Fixed integration/tomcat labels Jul 9, 2020

add Catalina domain specification to tomcat bean includes

3a88e3a

binaryphile force-pushed the tomcat-includes branch from eabe037 to 3a88e3a Compare July 9, 2020 21:23

hithwen approved these changes Jul 10, 2020

View reviewed changes

hithwen merged commit 3d9e6eb into DataDog:master Jul 10, 2020

binaryphile deleted the tomcat-includes branch July 10, 2020 13:35

This was referenced Jul 21, 2020

Add domain to metrics.yaml #7163

Merged

Add domain to metrics.yaml #7164

Merged

AlexandreYang mentioned this pull request Oct 7, 2020

Validate JMX integrations metrics.yaml #7733

Merged

chulkilee mentioned this pull request Sep 25, 2021

jmx domain for tomcat may not be Catalina #10233

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add Catalina domain specification to tomcat bean includes #7054

add Catalina domain specification to tomcat bean includes #7054

binaryphile commented Jul 2, 2020

hithwen commented Jul 9, 2020

binaryphile commented Jul 9, 2020 via email

codecov bot commented Jul 9, 2020

binaryphile commented Jul 9, 2020

add Catalina domain specification to tomcat bean includes #7054

add Catalina domain specification to tomcat bean includes #7054

Conversation

binaryphile commented Jul 2, 2020

What does this PR do?

Motivation

Additional Notes

Review checklist (to be filled by reviewers)

hithwen commented Jul 9, 2020

binaryphile commented Jul 9, 2020 via email

codecov bot commented Jul 9, 2020

Codecov Report

binaryphile commented Jul 9, 2020