-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus Java client 1.x consumes more memory than 0.x #5229
Comments
Could you please verify if the two outputs are the same and you are not having four times more time series in the case of Prometheus 1.x? How many time series (or lines) you have in your Prometheus output approximately? Also, will the second and subsequent scrapes consume equal amount of memory or are they more lightweight? What tool you used for the flame graph? Will you get similar results if you look at a heap dump? Also, Micrometer's overhead on the flame graphs seems pretty minimal to me, depending on the answer of the questions above, maybe this should be a Prometheus Client question at the end? |
Hi Jonatan, It may be important that my application heavy use Kafka and prometheous output contains 4703 Kafka rows out of 4850 total After profiler output inspection, suspected fragments of code are |
This conflicts with your flame graphs, I don't see any of these classes on it. If you are really suspicious that |
Indeed, suspected code I found has small impact on flame graph. It is hard to determine root cause of memory consumption. I see that high level code is changed and memory consumption is visible on low level code - root cause is high level code recently changed or low level code not changed for years? Anyway, I keep my opinion that pointed out methods are inefficient and micrometer-registry-prometheus.1.13.0 use more memory that micrometer-registry-prometheus-simpleclient.1.13.0 |
Analyzing a heap dump might help.
I don't understand this question.
I never stated the opposite, I was trying to point out that you might opened the issue in the wrong repo, Micrometer is not the same as Prometheus Client (where |
If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed. |
I reported that "micrometer-registry-prometheus" allocates more memory than "micrometer-registry-prometheus-simpleclient". Impact is visible on application with high Kafka usage, because Kafka client use thousands of metrics. I reported real load flamegraph, because I expected the problem is more or less known/suspected/predicted. I wanted to give a hint for track a problem in real load. Because problem is not known, I was asked to provide more details. It was hard to do in real environment, because real environment is switched to simple client. To provide details, I made experiments on laptop with plain client. Continue with laptop load, I found that io.micrometer.core.instrument.config.NamingConvention#toSnakeCase is ineffectively implemented. I made #5271 for that. We also discussed about correct target repo which should be fixed. In my opinion target repo is micrometer. We see that "prometheus-metrics-model" provides method Does it mean that "prometheus-metrics-model" library holds gun to micrometer team head and demands call the method for every scape and for each out of 7K Kafka metrics? No. micrometer-registry-prometheus is responsible for well defined and well bounded task: return bunch of metrics on each request. Basically, it must print out something like
If the task works ineffective, it must be solved within micrometer-registry-prometheus. Not put blame to "prometheus-metrics-model" |
See #5288 which should bring things back in alignment in terms of NamingConvention not being called during |
It might be the case that it is ineffective but the section you fixed (it's hard to tell without a JMH benchmark) has been there for 7 years, there were no change there between 1.12 and 1.13 where you are experiencing the issue and I'm not sure if it is significant on the flame graph.
On the other hand, we would like Micrometer to be as close to the "official" behavior as possible, hence the calls to the Prometheus Client.
I don't think I blamed I think that IF there is a performance issue in a downstream dependency, it should be fixed there, not upstream since if I follow you logic then a perf issue in any library your app uses should be fixed in your app since it is using it? I don't think so, the issue should be fixed closest to its cause. Based on #5288, it seems Micrometer is calling the convention many times for each scrape. Would you be able to check if that PR fixes the issue you are seeing? |
The naming convention was being called on each scrape for each metric to create the MetricMetadata. This is unnecessary because we already have the computed convention name specifically to avoid needing to call the convention again. See gh-5229
The changes from #5288 are available in |
@michaldo thank you for trying out the changes and reporting back. It seems things are improved but still not as good as before. Do you think it makes sense to report your latest finding to the Prometheus Java client and close this issue, or do you think there's more we can do in Micrometer for this? |
Answer needs understanding Micrometer design. Simply, it depends on Prometheus Client role. If the role is a black box, performance is Prometheus Client issue. If the role is build block, perfomance is Micrometer issue or Prometheus Client issue. No doubts metrics should have as low impact as possible. In my case several Kafka topics are used and it cause number of metrics is really high: 7K. Some prerformance bottlenecks, not observable with several dozen metrics, are observable with few thousands. That is important context. Nevertheless, all Kafka metrics works out of the box - I didn't add single line of code. Three parties are involved in metric transmission: Spring Boot, Micrometer, and Prometheus Client. In current case Prometheus client always modifies given string to escape special characters. Considering that input is very often stable, escaping is waste of resources. When all parties involved in transmission are standardized libraries, they can agree transmission format and avoid runtime conversion There is difference in computer science between API and protocol. Here API is Prometheus Client and protocol is
API is friendly, but not necessary extremely effiecient. To sum up, I think that it is worth to consider concept that Micrometer is responsible for efficient metrics transmission. The Micrometer should be aware how transmission looks from source to target and arrange the process to avoid redundant, repeatable, resource-consuming actions. Especially, unnecessary String conversions should be avoided. When all of above is known, Micrometer should check if Prometheus Client matches low impact requirements and decide: |
I decide to continue this issue here: prometheus/client_java#1241 |
I observed that micrometer registry consumes more memory when I switch Spring Boot from 3.2.6 to 3.3.1
Environment
I attached profiler report collected by async profiler.
Green is memory consumption for micrometer-registry-prometheus:
scrape()
takes 446 MBRed is memory consumption for micrometer-registry-prometheus-simpleclient (I keep Spring Boot 3.3.1 but switch to simple client): 121 MB
The text was updated successfully, but these errors were encountered: