HTTP server leaks memory in python3.7.1 #340

josecv · 2018-11-10T16:58:23Z

I'm using version 0.4.2, and it seems like the default http server, started via start_http_server leaks memory on every request.

Repro script:

#!/usr/bin/env python3
import gc
import time
import tracemalloc
from random import randint

from prometheus_client import Summary, start_http_server

# Create a metric to track time spent and requests made.
REQUEST_TIME = Summary('request_processing_seconds',
                       'Time spent processing request')


# Decorate function with metric.
@REQUEST_TIME.time()
def process_request(t):
    """A dummy function that takes some time."""
    time.sleep(t)


if __name__ == '__main__':
    # Start up the server to expose the metrics.
    start_http_server(8000)
    tracemalloc.start()
    snapshot = tracemalloc.take_snapshot()
    # Generate some requests.
    while True:
        process_request(randint(1, 5))
        gc.collect()
        snapshot2 = tracemalloc.take_snapshot()
        top_stats = snapshot2.compare_to(snapshot, 'lineno')
        print("[ Top 5 differences ]")
        for stat in top_stats[:5]:
            print(stat)

If you kick off the above script and then do:

while true; do curl localhost:8000; done

you can observe its memory footprint growing on every iteration:

[ Top 5 differences ]
/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py:799: size=946 KiB (+946 KiB), count=1682 (+1682), average=576 B
/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py:365: size=868 KiB (+868 KiB), count=1683 (+1683), average=528 B
/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py:500: size=422 KiB (+422 KiB), count=6753 (+6753), average=64 B
/Users/jose/code/prometheus-mem-leak/.venv/lib/python3.7/site-packages/prometheus_client/core.py:802: size=354 KiB (+354 KiB), count=5027 (+5027), average=72 B
/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py:899: size=263 KiB (+263 KiB), count=3366 (+3366), average=80 B
[ Top 5 differences ]
/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py:799: size=992 KiB (+992 KiB), count=1763 (+1763), average=576 B
/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py:365: size=910 KiB (+910 KiB), count=1764 (+1764), average=528 B
/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py:500: size=443 KiB (+443 KiB), count=7078 (+7078), average=64 B
/Users/jose/code/prometheus-mem-leak/.venv/lib/python3.7/site-packages/prometheus_client/core.py:802: size=370 KiB (+370 KiB), count=5266 (+5266), average=72 B
/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py:899: size=276 KiB (+276 KiB), count=3528 (+3528), average=80 B
[ Top 5 differences ]
/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py:799: size=1063 KiB (+1063 KiB), count=1889 (+1889), average=576 B
/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py:365: size=974 KiB (+974 KiB), count=1889 (+1889), average=528 B
/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py:500: size=474 KiB (+474 KiB), count=7584 (+7584), average=64 B
/Users/jose/code/prometheus-mem-leak/.venv/lib/python3.7/site-packages/prometheus_client/core.py:802: size=397 KiB (+397 KiB), count=5639 (+5639), average=72 B
/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py:899: size=295 KiB (+295 KiB), count=3778 (+3778), average=80 B
[ Top 5 differences ]
/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py:799: size=1115 KiB (+1115 KiB), count=1982 (+1982), average=576 B
/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py:365: size=1022 KiB (+1022 KiB), count=1983 (+1983), average=528 B
/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py:500: size=497 KiB (+497 KiB), count=7956 (+7956), average=64 B
/Users/jose/code/prometheus-mem-leak/.venv/lib/python3.7/site-packages/prometheus_client/core.py:802: size=416 KiB (+416 KiB), count=5922 (+5922), average=72 B
/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py:899: size=310 KiB (+310 KiB), count=3966 (+3966), average=80 B
[ Top 5 differences ]
/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py:799: size=1151 KiB (+1151 KiB), count=2046 (+2046), average=576 B
/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py:365: size=1055 KiB (+1055 KiB), count=2047 (+2047), average=528 B
/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py:500: size=514 KiB (+514 KiB), count=8213 (+8213), average=64 B
/Users/jose/code/prometheus-mem-leak/.venv/lib/python3.7/site-packages/prometheus_client/core.py:802: size=430 KiB (+430 KiB), count=6108 (+6108), average=72 B
/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/threading.py:899: size=320 KiB (+320 KiB), count=4094 (+4094), average=80 B

it's clearly coming from the server, because as soon as you stop the curl, memory usage stops growing

The text was updated successfully, but these errors were encountered:

brian-brazil · 2018-11-10T17:03:21Z

That code shouldn't even be calling the line it's pointing to, as there's no http requests being made.

josecv · 2018-11-10T17:10:10Z

there's no http requests being made

i'm assuming a continuous stream of http requests, external to the script (the while true curl)

hipyhop · 2018-11-19T05:46:24Z

I'm seeing this behaviour too, though it's slow enough that it's not really a problem and our orchestrator is restarting the service when it hits 100Mi of memory.

Scraping 125 metrics every 30 seconds (~10-20 custom metrics, the rest default metrics)

sbrandtb · 2018-12-10T16:41:37Z

I found the issue and created a merge request. I hope it will get merged soon.

The leak is caused by the fact that in Python 3.7, the default behavior of the `ThreadingMixin` is to use no daemon threads, but to request to block on threads on close. Because of that, it collects references to all created threads, creating the "leak": https://github.com/python/cpython/blob/v3.7.0/Lib/socketserver.py#L661 * Python 3.7: `block_on_close` is `True`: https://github.com/python/cpython/blob/v3.7.0/Lib/socketserver.py#L635 * Python 3.6: `_block_on_close` is `False`: https://github.com/python/cpython/blob/v3.6.7/Lib/socketserver.py#L639 * Python 2.7: There is no `block_on_close`, thus no logic for collecting references: https://github.com/python/cpython/blob/v2.7.15/Lib/SocketServer.py#L582 Fix by setting `daemon_threads` to `True`, which in our case should be a reasonable setting for all Python versions. Also, the new in Python 3.7 `ThreadingHTTPServer` stdlib class also sets it by default: https://github.com/python/cpython/blob/v3.7.0/Lib/http/server.py#L144 Signed-off-by: Sebastian Brandt <sebastian.brandt@friday.de>

sbrandtb · 2018-12-10T16:47:48Z

Until then, there is a quick fix:

from prometheus_client.exposition import _ThreadingSimpleServer, start_http_server
_ThreadingSimpleServer.daemon_threads = True

start_http_server(9099)

Signed-off-by: Sebastian Brandt <sebastian.brandt@friday.de>

* Fix thread leak in Python 3.7 #340 The leak is caused by the fact that in Python 3.7, the default behavior of the `ThreadingMixin` is to use no daemon threads, but to request to block on threads on close. Because of that, it collects references to all created threads, creating the "leak": https://github.com/python/cpython/blob/v3.7.0/Lib/socketserver.py#L661 * Python 3.7: `block_on_close` is `True`: https://github.com/python/cpython/blob/v3.7.0/Lib/socketserver.py#L635 * Python 3.6: `_block_on_close` is `False`: https://github.com/python/cpython/blob/v3.6.7/Lib/socketserver.py#L639 * Python 2.7: There is no `block_on_close`, thus no logic for collecting references: https://github.com/python/cpython/blob/v2.7.15/Lib/SocketServer.py#L582 Fix by setting `daemon_threads` to `True`, which in our case should be a reasonable setting for all Python versions. Also, the new in Python 3.7 `ThreadingHTTPServer` stdlib class also sets it by default: https://github.com/python/cpython/blob/v3.7.0/Lib/http/server.py#L144 Signed-off-by: Sebastian Brandt <sebastian.brandt@friday.de>

sbrandtb · 2018-12-18T08:49:33Z

@brian-brazil since this is a rather bad thing in microservice environments, would you mind releasing this soon?

tusharmakkar08 · 2019-01-31T13:37:42Z

@brian-brazil @sbrandtb I am facing this issue with prometheus-client==0.5.0 and Python 3.5.2 and it's not getting fixed by applying the fix which was mentioned by @sbrandtb.

I am not only facing this issue when I am using start_http_server but also when I am trying to use flask's server with prometheus end point monitored. Can you guys help me out?

To be precise my close_wait connections keep on increasing till my server stops serving requests.

x@ip-10-0-0-0:~$ lsof -i | python 10601 x 3u IPv4 78413488 python 10601 x 55u IPv4 82060563 python 10601 x 56u IPv4 82068773 python 10601 x 57u IPv4 82069308 python 10601 x 58u IPv4 82070804 python 10601 x 59u IPv4 82071589 python 10601 x 60u IPv4 82060912 python 10601 x 61u IPv4 82073179 python 10601 x 62u IPv4 82073397 python 10601 x 63u IPv4 82075572 python 10601 x 64u IPv4 82077191 python 10601 x 65u IPv4 82076370 python 10601 x 66u IPv4 82079364 python 10601 x 67u IPv4 82077842 python 10601 x 68u IPv4 82079997 python 10601 x 69u IPv4 82081405 python 10601 x 70u IPv4 82083436 python 10601 x 71u IPv4 82084046 python 10601 x 72u IPv4 82080465 python 10601 x 73u IPv4 82084792 python 10601 x 74u IPv4 82085875 python 10601 x 75u IPv4 82088002 python 10601 x 76u IPv4 82089044 python 10601 x 77u IPv4 82086871 python 10601 x 78u IPv4 82088321 python 10601 x 79u IPv4 82090850 python 10601 x 80u IPv4 82089551 python 10601 x 81u IPv4 82088475 python 10601 x 82u IPv4 82089875 python 10601 x 83u IPv4 82092200 grep 8002
0t0 TCP *:8002 (LISTEN)
0t0 TCP ip-10-0-0-0.ap-south-1.compute.internal:8002->ip-172-00-00-00.ap-south-1.compute.internal:45114 (CLOSE_WAIT)
0t0 TCP ip-10-0-0-0.ap-south-1.compute.internal:8002->ip-172-00-00-00.ap-south-1.compute.internal:45142 (CLOSE_WAIT)
0t0 TCP ip-10-0-0-0.ap-south-1.compute.internal:8002->ip-172-00-00-00.ap-south-1.compute.internal:45170 (CLOSE_WAIT)
0t0 TCP ip-10-0-0-0.ap-south-1.compute.internal:8002->ip-172-00-00-00.ap-south-1.compute.internal:45198 (CLOSE_WAIT)
0t0 TCP ip-10-0-0-0.ap-south-1.compute.internal:8002->ip-172-00-00-00.ap-south-1.compute.internal:45298 (CLOSE_WAIT)
0t0 TCP ip-10-0-0-0.ap-south-1.compute.internal:8002->ip-172-00-00-00.ap-south-1.compute.internal:45328 (CLOSE_WAIT)
0t0 TCP ip-10-0-0-0.ap-south-1.compute.internal:8002->ip-172-00-00-00.ap-south-1.compute.internal:45356 (CLOSE_WAIT)
0t0 TCP ip-10-0-0-0.ap-south-1.compute.internal:8002->ip-172-00-00-00.ap-south-1.compute.internal:45386 (CLOSE_WAIT)
0t0 TCP ip-10-0-0-0.ap-south-1.compute.internal:8002->ip-172-00-00-00.ap-south-1.compute.internal:45414 (CLOSE_WAIT)
0t0 TCP ip-10-0-0-0.ap-south-1.compute.internal:8002->ip-172-00-00-00.ap-south-1.compute.internal:45442 (CLOSE_WAIT)
0t0 TCP ip-10-0-0-0.ap-south-1.compute.internal:8002->ip-172-00-00-00.ap-south-1.compute.internal:45476 (CLOSE_WAIT)
0t0 TCP ip-10-0-0-0.ap-south-1.compute.internal:8002->ip-172-00-00-00.ap-south-1.compute.internal:45504 (CLOSE_WAIT)
0t0 TCP ip-10-0-0-0.ap-south-1.compute.internal:8002->ip-172-00-00-00.ap-south-1.compute.internal:45532 (CLOSE_WAIT)
0t0 TCP ip-10-0-0-0.ap-south-1.compute.internal:8002->ip-172-00-00-00.ap-south-1.compute.internal:45560 (CLOSE_WAIT)
0t0 TCP ip-10-0-0-0.ap-south-1.compute.internal:8002->ip-172-00-00-00.ap-south-1.compute.internal:45588 (CLOSE_WAIT)
0t0 TCP ip-10-0-0-0.ap-south-1.compute.internal:8002->ip-172-00-00-00.ap-south-1.compute.internal:45616 (CLOSE_WAIT)
0t0 TCP ip-10-0-0-0.ap-south-1.compute.internal:8002->ip-172-00-00-00.ap-south-1.compute.internal:45644 (CLOSE_WAIT)
0t0 TCP ip-10-0-0-0.ap-south-1.compute.internal:8002->ip-172-00-00-00.ap-south-1.compute.internal:45672 (CLOSE_WAIT)
0t0 TCP ip-10-0-0-0.ap-south-1.compute.internal:8002->ip-172-00-00-00.ap-south-1.compute.internal:45700 (CLOSE_WAIT)
0t0 TCP ip-10-0-0-0.ap-south-1.compute.internal:8002->ip-172-00-00-00.ap-south-1.compute.internal:45728 (CLOSE_WAIT)
0t0 TCP ip-10-0-0-0.ap-south-1.compute.internal:8002->ip-172-00-00-00.ap-south-1.compute.internal:45758 (CLOSE_WAIT)
0t0 TCP ip-10-0-0-0.ap-south-1.compute.internal:8002->ip-172-00-00-00.ap-south-1.compute.internal:45788 (CLOSE_WAIT)
0t0 TCP ip-10-0-0-0.ap-south-1.compute.internal:8002->ip-172-00-00-00.ap-south-1.compute.internal:45816 (CLOSE_WAIT)
0t0 TCP ip-10-0-0-0.ap-south-1.compute.internal:8002->ip-172-00-00-00.ap-south-1.compute.internal:45844 (CLOSE_WAIT)
0t0 TCP ip-10-0-0-0.ap-south-1.compute.internal:8002->ip-172-00-00-00.ap-south-1.compute.internal:45944 (CLOSE_WAIT)
0t0 TCP ip-10-0-0-0.ap-south-1.compute.internal:8002->ip-172-00-00-00.ap-south-1.compute.internal:45976 (CLOSE_WAIT)
0t0 TCP ip-10-0-0-0.ap-south-1.compute.internal:8002->ip-172-00-00-00.ap-south-1.compute.internal:46006 (CLOSE_WAIT)
0t0 TCP ip-10-0-0-0.ap-south-1.compute.internal:8002->ip-172-00-00-00.ap-south-1.compute.internal:46034 (CLOSE_WAIT)
0t0 TCP ip-10-0-0-0.ap-south-1.compute.internal:8002->ip-172-00-00-00.ap-south-1.compute.internal:46064 (CLOSE_WAIT)

sbrandtb · 2019-02-01T20:38:10Z

@tusharmakkar08 Please check out the comment in my merge request: This issue is very specific to Python >= 3.7

Since you experience an issue in 3.5, I doubt it's the same cause. Also, the shape of the curve on Grafana does not match the behaviour of this bug at all: It does not suddenly start (7:something) nor suddenly decrease again, granted there are no other effects and your metrics endpoint is called regularly (also, that makes the graph go up very linearly, while yours is everything else than that)

However, without any more information like code we can't tell anything more.

tusharmakkar08 · 2019-02-15T14:57:16Z

I think my issue is more similar to prometheus/jmx_exporter#327 and prometheus/jmx_exporter#352.

sbrandtb · 2019-02-16T12:51:49Z

@tusharmakkar08 I think you should open a new issue with code how to reproduce, because you won't get much audience here or help without reproducible code.

prometheus/client_python#340

yaverhussain · 2021-02-09T21:17:17Z

I'm seeing this behaviour too, though it's slow enough that it's not really a problem and our orchestrator is restarting the service when it hits 100Mi of memory.

Scraping 125 metrics every 30 seconds (~10-20 custom metrics, the rest default metrics)

I'm seeing this behaviour too, though it's slow enough that it's not really a problem and our orchestrator is restarting the service when it hits 100Mi of memory.

Scraping 125 metrics every 30 seconds (~10-20 custom metrics, the rest default metrics)

![screen shot 2018-11-19 at 15 41 09](https://user-images.githubusercontent.com/692291/48688390-bb8d5100-ec11-11e8-8033-1f7bf800bee7.p

Any one resolved this?

sbrandtb · 2021-02-10T09:45:11Z

@yaverhussain The issue has been fixed and merged by me two years ago in #356

yaverhussain · 2021-02-10T21:32:04Z

What version of prom client

…

On Wednesday, February 10, 2021, Sebastian Brandt ***@***.***> wrote: @yaverhussain <https://github.com/yaverhussain> The issue has been fixed and merged by me two years ago in #356 <#356> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#340 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB7ZPYTD5P4MHVCKRFLXEKDS6JITVANCNFSM4GDAI52A> .

sbrandtb · 2021-02-11T07:12:13Z

@yaverhussain >= 0.6.0, see badges here

yaverhussain · 2021-02-12T21:35:33Z

i use python 3.6, prom client 0.8.0 and twisted 20.3.0 so i might have different issue I have syslog that captures latency logs from websphere (high production load) and convert them to prom metrics on a different port problem is memory increases for every new prom label i add and there are huge amount of labels "URLs" ((however memory stabilises at some point which means i have to allocate a large amount of memory to my container app). at the current pace my pod runs out of memory (1 gig) in 1.5 hours. i realised i would need 2.5 G

…

On Thu, Feb 11, 2021 at 8:12 PM Sebastian Brandt ***@***.***> wrote: @yaverhussain <https://github.com/yaverhussain> >= 0.6.0, see badges here <5aa256d> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#340 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB7ZPYURFXOD5Z2FEZDUBOTS6N7N7ANCNFSM4GDAI52A> .

csmarchbanks · 2021-02-12T21:50:58Z

Hello, rather than continuing to comment on this issue, would you make a new issue if you believe there is a problem with this client? If you have a general usage question, I recommend reaching out to the prometheus-users mailing list. Thanks in advance!

That said, if you are adding distinct URLs as label values, such as having ids/uuids in the label value, then you will use a lot of memory as metrics will need to be stored for each URL. Usually people will template urls like "/my/path/:id" to avoid this issue.

sbrandtb mentioned this issue Dec 10, 2018

Fix thread leak in Python 3.7 #340 #356

Merged

sbrandtb added a commit to FRI-DAY/client_python that referenced this issue Dec 11, 2018

Comment on memory leak prometheus#340

51de8ba

Signed-off-by: Sebastian Brandt <sebastian.brandt@friday.de>

brian-brazil closed this as completed in #356 Dec 11, 2018

braedon added a commit to braedon/prometheus-kafka-consumer-group-exporter that referenced this issue Mar 30, 2019

Upgrade Prometheus client to fix memory leak with Python 3.7

7bc1f87

prometheus/client_python#340

braedon mentioned this issue Mar 30, 2019

Upgrade Prometheus client to fix memory leak with Python 3.7 braedon/prometheus-kafka-consumer-group-exporter#24

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTP server leaks memory in python3.7.1 #340

HTTP server leaks memory in python3.7.1 #340

josecv commented Nov 10, 2018 •

edited

Loading

brian-brazil commented Nov 10, 2018

josecv commented Nov 10, 2018

hipyhop commented Nov 19, 2018

sbrandtb commented Dec 10, 2018

sbrandtb commented Dec 10, 2018

sbrandtb commented Dec 18, 2018

tusharmakkar08 commented Jan 31, 2019

sbrandtb commented Feb 1, 2019

tusharmakkar08 commented Feb 15, 2019

sbrandtb commented Feb 16, 2019

yaverhussain commented Feb 9, 2021

sbrandtb commented Feb 10, 2021

yaverhussain commented Feb 10, 2021 via email

sbrandtb commented Feb 11, 2021

yaverhussain commented Feb 12, 2021 via email

csmarchbanks commented Feb 12, 2021

HTTP server leaks memory in python3.7.1 #340

HTTP server leaks memory in python3.7.1 #340

Comments

josecv commented Nov 10, 2018 • edited Loading

brian-brazil commented Nov 10, 2018

josecv commented Nov 10, 2018

hipyhop commented Nov 19, 2018

sbrandtb commented Dec 10, 2018

sbrandtb commented Dec 10, 2018

sbrandtb commented Dec 18, 2018

tusharmakkar08 commented Jan 31, 2019

sbrandtb commented Feb 1, 2019

tusharmakkar08 commented Feb 15, 2019

sbrandtb commented Feb 16, 2019

yaverhussain commented Feb 9, 2021

sbrandtb commented Feb 10, 2021

yaverhussain commented Feb 10, 2021 via email

sbrandtb commented Feb 11, 2021

yaverhussain commented Feb 12, 2021 via email

csmarchbanks commented Feb 12, 2021

josecv commented Nov 10, 2018 •

edited

Loading