Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Expose detailed JVM and netty memory metrics #61

Merged
merged 1 commit into from
Oct 6, 2022

Conversation

njhill
Copy link
Member

@njhill njhill commented Oct 1, 2022

Motivation

To size model-mesh container memory allocation for different workloads, it would be useful to have more detailed usage metrics exposed.

Container process level memory usage alone is opaque and may reflect over-allocation of heap and/or direct buffer pools.

Modifications

  • Cherry pick some of the prometheus hotspot exporters from https://github.com/prometheus/client_java/tree/main/simpleclient_hotspot/src/main/java/io/prometheus/client/hotspot, and reduce the amount of garbage they produce during collection.
  • Add a netty memory exporter which includes metrics for the amount of OS memory allocated for buffer pools as well as how much of that pool capacity is currently allocated to application buffers (for both heap and direct arenas, although typically only direct should be used).
  • Enable these by default when prometheus metrics are enabled, but support selective enablement via a mem_detail=X,Y,Z parameter in the MM_METRICS env var string.
  • Adjust heap/direct memory sizing in start.sh to allow for explicit configuration of max direct memory size
  • Adjust unit test to check for these new metrics

Result

More detailed memory insight/tuning possible, even in production.

Motivation

To size model-mesh container memory allocation for different workloads, it would be useful to have more detailed usage metrics exposed.

Container process level memory usage is opaque and may reflect over-allocation of heap and/or direct buffer pools.

Modifications

- Cherry pick some of the prometheus hotspot exporters from https://github.com/prometheus/client_java/tree/main/simpleclient_hotspot/src/main/java/io/prometheus/client/hotspot, and reduce the amount of garbage they produce during collection.
- Add a netty memory exporter which includes metrics for the amount of OS memory allocated for buffer pools as well as how much of that pool capacity is currently allocated to application buffers (for both heap and direct arenas, although typically only direct should be used).
- Enable these by default when prometheus metrics are enabled, but support selective enablement via a mem_detail=X,Y,Z parameter in the MM_METRICS env var string.
- Adjust heap/direct memory sizing in start.sh to allow for explict configuration of max direct memory size
- Adjust unit test to check for these new metrics

Result

More detailed memory insight/tuning possible, even in production.

Signed-off-by: Nick Hill <nickhill@us.ibm.com>
@njhill njhill marked this pull request as ready for review October 3, 2022 17:51
@rafvasq rafvasq self-requested a review October 4, 2022 20:57
@kserve-oss-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: njhill, rafvasq

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rafvasq
Copy link
Member

rafvasq commented Oct 6, 2022

/lgtm

@kserve-oss-bot kserve-oss-bot merged commit e415746 into main Oct 6, 2022
@njhill njhill deleted the mem-metrics branch October 13, 2022 18:56
njhill added a commit that referenced this pull request Oct 21, 2022
Motivation

A bug was introduced in recent PR #61 resulting in the netty direct memory pool size not being set correctly.

Modification

Add missing $ to variable in bash if condition in start.sh

Result

Correct netty direct memory allocation size, avoid OOM crashing.

Signed-off-by: Nick Hill <nickhill@us.ibm.com>
kserve-oss-bot pushed a commit that referenced this pull request Oct 21, 2022
#### Motivation

A bug was introduced in recent PR #61 resulting in the netty direct memory pool size not being set correctly.

#### Modification

Add missing `$` to variable in bash if condition in start.sh

#### Result

Correct netty direct memory allocation size, avoid OOM crashing.

Signed-off-by: Nick Hill <nickhill@us.ibm.com>
KillianGolds pushed a commit to KillianGolds/modelmesh that referenced this pull request Aug 7, 2024
[pull] main from kserve:main
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants