-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Expose detailed JVM and netty memory metrics #61
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Motivation To size model-mesh container memory allocation for different workloads, it would be useful to have more detailed usage metrics exposed. Container process level memory usage is opaque and may reflect over-allocation of heap and/or direct buffer pools. Modifications - Cherry pick some of the prometheus hotspot exporters from https://github.com/prometheus/client_java/tree/main/simpleclient_hotspot/src/main/java/io/prometheus/client/hotspot, and reduce the amount of garbage they produce during collection. - Add a netty memory exporter which includes metrics for the amount of OS memory allocated for buffer pools as well as how much of that pool capacity is currently allocated to application buffers (for both heap and direct arenas, although typically only direct should be used). - Enable these by default when prometheus metrics are enabled, but support selective enablement via a mem_detail=X,Y,Z parameter in the MM_METRICS env var string. - Adjust heap/direct memory sizing in start.sh to allow for explict configuration of max direct memory size - Adjust unit test to check for these new metrics Result More detailed memory insight/tuning possible, even in production. Signed-off-by: Nick Hill <nickhill@us.ibm.com>
rafvasq
approved these changes
Oct 6, 2022
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: njhill, rafvasq The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
njhill
added a commit
that referenced
this pull request
Oct 21, 2022
Motivation A bug was introduced in recent PR #61 resulting in the netty direct memory pool size not being set correctly. Modification Add missing $ to variable in bash if condition in start.sh Result Correct netty direct memory allocation size, avoid OOM crashing. Signed-off-by: Nick Hill <nickhill@us.ibm.com>
kserve-oss-bot
pushed a commit
that referenced
this pull request
Oct 21, 2022
#### Motivation A bug was introduced in recent PR #61 resulting in the netty direct memory pool size not being set correctly. #### Modification Add missing `$` to variable in bash if condition in start.sh #### Result Correct netty direct memory allocation size, avoid OOM crashing. Signed-off-by: Nick Hill <nickhill@us.ibm.com>
KillianGolds
pushed a commit
to KillianGolds/modelmesh
that referenced
this pull request
Aug 7, 2024
[pull] main from kserve:main
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
To size model-mesh container memory allocation for different workloads, it would be useful to have more detailed usage metrics exposed.
Container process level memory usage alone is opaque and may reflect over-allocation of heap and/or direct buffer pools.
Modifications
mem_detail=X,Y,Z
parameter in theMM_METRICS
env var string.Result
More detailed memory insight/tuning possible, even in production.