Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metric_version = 1 memory leak #9821

Closed
noarrg opened this issue Sep 26, 2021 · 2 comments · Fixed by #12160
Closed

metric_version = 1 memory leak #9821

noarrg opened this issue Sep 26, 2021 · 2 comments · Fixed by #12160
Labels
area/prometheus bug unexpected problem or unintended behavior help wanted Request for community participation, code, contribution size/m 2-4 day effort waiting for response waiting for response from contributor

Comments

@noarrg
Copy link

noarrg commented Sep 26, 2021

Relevant telegraf.conf:
[agent]
interval = "1s"
round_interval = false
metric_batch_size = 800
metric_buffer_limit = 1000
collection_jitter = "0s"
flush_interval = "1s"
flush_jitter = "0s"
hostname = ""
omit_hostname = true
debug = true
quiet = false
logfile = "logger"

[[outputs.prometheus_client]]
listen = ":9276"
path = "/metrics"
expiration_interval = "20s"
export_timestamp = false
metric_version = 1

[[inputs.socket_listener]]
service_address = "udp://:8094"
read_buffer_size = "8MB"
data_format = "prometheus"

System info:
telegraf-1.19.3_windows_amd64
Windows Server 2019 Datacenter

Steps to reproduce:
The input was about 800 metrics sent to the UDP input plugin per second, the prometheus output exposed them and they were forwarded, but the telegraf process memory exhibited linear growth - reaching 4GB memory used by the process after running of an hour.

Setting the metric_version to '2' fixed this - and the process memory remained at about 50 mbs while running for a couple hours.

@noarrg noarrg added the bug unexpected problem or unintended behavior label Sep 26, 2021
@yummydsky
Copy link

Please help to add the following code at v1 collector.go Add function. You can reference v2 collector Add function.

	// Expire metrics, doing this on Add ensure metrics are removed even if no
	// one is querying the data.
	c.Expire(time.Now(), c.ExpirationInterval)

@sspaink sspaink added help wanted Request for community participation, code, contribution size/m 2-4 day effort labels Nov 2, 2022
powersj added a commit to powersj/telegraf that referenced this issue Nov 3, 2022
Currently, we are only expiring data is someone is getting the data.
This means that if data is continiously pushed, but not gathered, the
usage can grow and grow. This change forces expiration during add,
similar to how v2 handles this as well.

fixes: influxdata#9821
@powersj
Copy link
Contributor

powersj commented Nov 3, 2022

Hi,

Expire is currently only called during Collect(), which is run when someone loaded the prometheus endpoint with the v1 metric type.

I have put up #12160 with a fix to add the expire call during add as well. In 20-30mins, there will be some test artifacts attached that PR. Could someone please download that artifact and confirm the lower memory usage? If you do run into any issues, can you provide me with an example config + the metrics pushed to prometheus input?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/prometheus bug unexpected problem or unintended behavior help wanted Request for community participation, code, contribution size/m 2-4 day effort waiting for response waiting for response from contributor
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants