File descriptor leak caused by clients prematurely closing connections #327

milesbxf · 2018-10-17T10:53:28Z

Hi! 👋We've been using JMX exporter to instrument Cassandra (using the javaagent on version 0.3.1).

We recently had an incident caused by Cassandra running out of file descriptors. We found these had been gradually leaking over time (metric here is node_filefd_allocated from node_exporters on those instances - the FD limit we set for Cassandra is 100k):

We'd been seeing some issues with Prometheus timing out whilst scraping these nodes, and found that the majority of open FDs were orphaned TCP sockets in CLOSE_WAIT. Thread dumps showed that all 5 JMX exporter threads on these nodes seemed to be stuck writing to the socket:

"pool-1-thread-1" - Thread t@84
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
        at sun.nio.ch.IOUtil.write(IOUtil.java:65)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
        - locked <2e084219> (a java.lang.Object)
        at sun.net.httpserver.Request$WriteStream.write(Request.java:391)
        - locked <368dd754> (a sun.net.httpserver.Request$WriteStream)
        at sun.net.httpserver.ChunkedOutputStream.writeChunk(ChunkedOutputStream.java:125)
        at sun.net.httpserver.ChunkedOutputStream.write(ChunkedOutputStream.java:87)
        at sun.net.httpserver.PlaceholderOutputStream.write(ExchangeImpl.java:444)
        at java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:253)
        at java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:211)
        at java.util.zip.GZIPOutputStream.write(GZIPOutputStream.java:145)
        at java.io.ByteArrayOutputStream.writeTo(ByteArrayOutputStream.java:167)
        - locked <fd8ef5> (a java.io.ByteArrayOutputStream)
        at io.prometheus.jmx.shaded.io.prometheus.client.exporter.HTTPServer$HTTPMetricHandler.handle(HTTPServer.java:74)
        at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
        at sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:83)
        at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:82)
        at sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:675)
        at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
        at sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:647)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
   Locked ownable synchronizers:
        - locked <467e300a> (a java.util.concurrent.ThreadPoolExecutor$Worker)

Putting these two bits of information together gives us this theory:

Prometheus scrapes the node - sends a HTTP request to JMX exporter
JMX exporter collects metrics, but takes a long time to do so (this is occasionally expected in our case, our nodes export thousands of JMX metrics)
Prometheus reaches the scrape timeout, and cancels the request with a TCP FIN
JMX exporter finishes collecting metrics, and attempts to write the output to the socket (https://github.com/prometheus/client_java/blob/parent-0.3.0/simpleclient_httpserver/src/main/java/io/prometheus/client/exporter/HTTPServer.java#L78). The other side of the TCP connection has been closed.
This call blocks forever, and we never reach https://github.com/prometheus/client_java/blob/parent-0.3.0/simpleclient_httpserver/src/main/java/io/prometheus/client/exporter/HTTPServer.java#L80 which closes the socket

It looks like simpleclient_httpserver doesn't have good semantics around handling closed connections.

We don't have a minimal reproduction of this, but tcpdumps back this up. We're considering forking the jmx_exporter to use simpleclient_jetty instead, but we wondered if anyone else had come across this?

The text was updated successfully, but these errors were encountered:

brian-brazil · 2018-10-17T11:20:10Z

Hmm, what happens if you use a try/finally for the close?

milesbxf · 2018-10-17T13:06:42Z

Do you mean in client_java/simpleclient_httpserver?

It could look something like

        try {
            if (shouldUseCompression(t)) {
                t.getResponseHeaders().set("Content-Encoding", "gzip");
                t.sendResponseHeaders(HttpURLConnection.HTTP_OK, 0);
                final GZIPOutputStream os = new GZIPOutputStream(t.getResponseBody());
                response.writeTo(os);
                os.finish();
            } else {
                t.sendResponseHeaders(HttpURLConnection.HTTP_OK, response.size());
                response.writeTo(t.getResponseBody());
            }
        } finally {
            t.close();
        }

but I don't think that would fix this - there are no exceptions thrown, just the response.writeTo call seems to block forever.

I'm not familiar enough with Java socket programming to suggest a better fix, but surely Jetty should be able to better handle the client closing the connection?

brian-brazil · 2018-10-17T13:37:27Z

JMX exporter finishes collecting metrics, and attempts to write the output to the now-closed socket

That's not how TCP works, only one direction of the socket is closed. What exactly is netstat showing for these sockets? It also shouldn't be possible for there to be thousands of them, as there's only 5 worker threads.

milesbxf · 2018-10-17T14:05:07Z

Yes of course - I've clarified in the description, thanks.

I don't have netstat output for the affected nodes, though I'll grab it when the issue reoccurs. We did analyse the leak with lsof, and found thousands of entries like:

java    17678 cassandra  526u     IPv4          148742999        0t0        TCP hostname-a:http-alt->hostname-b:36374 (CLOSE_WAIT)

hostname-a is the hostname of the Cassandra node we ran this on, and hostname-b is one of the Prometheus hosts. All of the connections in CLOSE_WAIT showed a connection to a Prometheus host and we're exposing the JMX exporter on http-alt port (8080), so these are definitely connections handled by the JMX exporter.

brian-brazil · 2018-10-17T14:10:16Z

I was interested more in the SendQ from netstat.

milesbxf · 2018-10-23T16:08:52Z

@brian-brazil just grabbed some output from a node that started being affected again. Here's an example line:

Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp      200      0 <cassandra IP>:8080       <prometheus IP>:58288     CLOSE_WAIT  73970/java

All of the CLOSE_WAIT conns look almost identical, all with 200 bytes of data in Recv-Q and 0 in Send-Q - so as I understand it there's 200 bytes of data received on the socket but not read by the JMX exporter

brian-brazil · 2018-10-23T16:20:58Z

Hmm, that smells like a connection that's in the backlog - that is it's in the kernel but the application has yet to accept it.

milesbxf · 2018-10-25T08:40:08Z

Yup, it's looking like the workers get stuck writing to a socket whilst the HTTPServer background thread (?) continues to accept connections from Prometheus, but never actually reads or closes them, presumably as all worker threads are blocked. Does that sound about right?

I think we'll try Jetty (I think it should be able to handle this situation better), will post here with what happens.

brian-brazil · 2018-10-25T08:45:10Z

Does that sound about right?

That sounds plausible. Do you know where they're blocking?

milesbxf · 2018-10-25T09:56:49Z

Yep, all worker threads look like this:

"pool-1-thread-1" - Thread t@84
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
        at sun.nio.ch.IOUtil.write(IOUtil.java:65)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
        - locked <2e084219> (a java.lang.Object)
        at sun.net.httpserver.Request$WriteStream.write(Request.java:391)
        - locked <368dd754> (a sun.net.httpserver.Request$WriteStream)
        at sun.net.httpserver.ChunkedOutputStream.writeChunk(ChunkedOutputStream.java:125)
        at sun.net.httpserver.ChunkedOutputStream.write(ChunkedOutputStream.java:87)
        at sun.net.httpserver.PlaceholderOutputStream.write(ExchangeImpl.java:444)
        at java.util.zip.DeflaterOutputStream.deflate(DeflaterOutputStream.java:253)
        at java.util.zip.DeflaterOutputStream.write(DeflaterOutputStream.java:211)
        at java.util.zip.GZIPOutputStream.write(GZIPOutputStream.java:145)
        at java.io.ByteArrayOutputStream.writeTo(ByteArrayOutputStream.java:167)
        - locked <fd8ef5> (a java.io.ByteArrayOutputStream)
        at io.prometheus.jmx.shaded.io.prometheus.client.exporter.HTTPServer$HTTPMetricHandler.handle(HTTPServer.java:74)
        at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
        at sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:83)
        at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:82)
        at sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:675)
        at com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:79)
        at sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:647)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
   Locked ownable synchronizers:
        - locked <467e300a> (a java.util.concurrent.ThreadPoolExecutor$Worker)

so they're blocking on response.writeTo(t.getResponseBody()); - (https://github.com/prometheus/client_java/blob/parent-0.3.0/simpleclient_httpserver/src/main/java/io/prometheus/client/exporter/HTTPServer.java#L78)

brian-brazil · 2018-10-25T10:19:13Z

That's odd, looks like it's stuck in the kernel. Can you get an strace?

Could you also try with the master version of our httpserver library? There's one fix there which has yet to make it out and has the potential to be relevant.

ekazakas · 2018-10-30T16:03:37Z

We ran into a similar issue using v0.4.0 where HTTPServer accepts connections but never responds.

xoraes · 2018-12-19T21:20:51Z

Any update on this?

brian-brazil · 2019-01-14T14:15:04Z

That you get a chance to try this with a newer version of client_java?

jaseemabid · 2019-01-14T22:50:19Z

@brian-brazil We moved the main jmx exporter out of the Cassandra process to an external HTTP server because we couldn't afford to have fd leaks and restart Cassandra once in a while. Now we run 2 copies of the JMX exporter with the in process version scraping only the minimal jvm metrics and the external exporter scraping much more detailed Cassandra metrics. Since it's an external process which gets restarted automatically by systemd if it crashes, we haven't really looked into it since. Its not an optimal solution, but saved us from the problematic db restarts.

amruta1989mohite · 2019-01-21T06:05:40Z

@brian-brazil any solution on this issue please.

amruta1989mohite · 2019-01-21T12:51:08Z

@brian-brazil We moved the main jmx exporter out of the Cassandra process to an external HTTP server because we couldn't afford to have fd leaks and restart Cassandra once in a while. Now we run 2 copies of the JMX exporter with the in process version scraping only the minimal jvm metrics and the external exporter scraping much more detailed Cassandra metrics. Since it's an external process which gets restarted automatically by systemd if it crashes, we haven't really looked into it since. Its not an optimal solution, but saved us from the problematic db restarts.

Hello @jaseemabid , I am facing same issue right now and I also wanted to avoid the restart of my DB at specific interval. Can you tell me how you have fixed it in detail? It would be really great if I could get any help for the same.

jaseemabid · 2019-01-25T01:49:34Z

@amruta1989mohite Run 2 copies of this exporter.

Make the in process exporter scrape only the core JVM metrics and on our infra it completes a scrape in a couple of seconds. This normally doesn't leak FDs and we haven't had any issues so far.
Run an out of process HTTP exporter to scrape everything else you need. This takes much much longer depending on your cluster but since it's a different process, you should be able to safely and automatically restart it once it starts leaking leaks FDs.

amruta1989mohite · 2019-02-05T06:53:07Z

@jaseemabid, Thanks for the response and sorry for bothering you again.
But I am quite new to exporters, so is it possible I could get the exporter code from you or any document how you did it in detail.
It would be so helpful, if possible. Thanks in advance.

amruta1989mohite · 2019-02-15T06:40:41Z

@jaseemabid Should I wait for your help?

tusharmakkar08 · 2019-02-15T19:05:11Z

@brian-brazil I am facing similar issue when I am using python's prometheus client.

Could you also try with the master version of our httpserver library? There's one fix there which has yet to make it out and has the potential to be relevant.

Can you please mention the link to the library?

brian-brazil · 2019-11-14T13:39:03Z

Doesn't look like we're getting any more information out of this, and there's been fixes since.

jaseemabid · 2019-11-14T16:04:42Z

@brian-brazil At monzo, we migrated to https://github.com/instaclustr/cassandra-exporter, found similar issues and then wrote our own https://github.com/suhailpatel/seastat.

brian-brazil · 2019-11-14T16:49:41Z

That smells like an issue inside Cassandra itself.

hr-ireh · 2021-01-02T19:06:20Z

JMX exporter finishes collecting metrics, and attempts to write the output to the now-closed socket

That's not how TCP works, only one direction of the socket is closed. What exactly is netstat showing for these sockets? It also shouldn't be possible for there to be thousands of them, as there's only 5 worker threads.

All 5 threads are blocked but the connection is received because the thread or connection is different.

brian-brazil mentioned this issue Jan 18, 2019

Cassandra jmx exporter connections remain on CLOSE_WAIT state #352

Closed

tusharmakkar08 mentioned this issue Feb 15, 2019

HTTP server leaks memory in python3.7.1 prometheus/client_python#340

Closed

tusharmakkar08 mentioned this issue Feb 19, 2019

File descriptor leak caused by clients causing system to get down prometheus/client_python#377

Closed

brian-brazil closed this as completed Nov 14, 2019

hr-ireh mentioned this issue Jan 2, 2021

5 threads is bloke and close_wait connection #568

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File descriptor leak caused by clients prematurely closing connections #327

File descriptor leak caused by clients prematurely closing connections #327

milesbxf commented Oct 17, 2018 •

edited

Loading

brian-brazil commented Oct 17, 2018

milesbxf commented Oct 17, 2018

brian-brazil commented Oct 17, 2018

milesbxf commented Oct 17, 2018

brian-brazil commented Oct 17, 2018

milesbxf commented Oct 23, 2018

brian-brazil commented Oct 23, 2018

milesbxf commented Oct 25, 2018

brian-brazil commented Oct 25, 2018

milesbxf commented Oct 25, 2018 •

edited

Loading

brian-brazil commented Oct 25, 2018

ekazakas commented Oct 30, 2018

xoraes commented Dec 19, 2018

brian-brazil commented Jan 14, 2019

jaseemabid commented Jan 14, 2019

amruta1989mohite commented Jan 21, 2019

amruta1989mohite commented Jan 21, 2019 •

edited

Loading

jaseemabid commented Jan 25, 2019

amruta1989mohite commented Feb 5, 2019

amruta1989mohite commented Feb 15, 2019

tusharmakkar08 commented Feb 15, 2019

brian-brazil commented Nov 14, 2019

jaseemabid commented Nov 14, 2019

brian-brazil commented Nov 14, 2019

hr-ireh commented Jan 2, 2021

File descriptor leak caused by clients prematurely closing connections #327

File descriptor leak caused by clients prematurely closing connections #327

Comments

milesbxf commented Oct 17, 2018 • edited Loading

brian-brazil commented Oct 17, 2018

milesbxf commented Oct 17, 2018

brian-brazil commented Oct 17, 2018

milesbxf commented Oct 17, 2018

brian-brazil commented Oct 17, 2018

milesbxf commented Oct 23, 2018

brian-brazil commented Oct 23, 2018

milesbxf commented Oct 25, 2018

brian-brazil commented Oct 25, 2018

milesbxf commented Oct 25, 2018 • edited Loading

brian-brazil commented Oct 25, 2018

ekazakas commented Oct 30, 2018

xoraes commented Dec 19, 2018

brian-brazil commented Jan 14, 2019

jaseemabid commented Jan 14, 2019

amruta1989mohite commented Jan 21, 2019

amruta1989mohite commented Jan 21, 2019 • edited Loading

jaseemabid commented Jan 25, 2019

amruta1989mohite commented Feb 5, 2019

amruta1989mohite commented Feb 15, 2019

tusharmakkar08 commented Feb 15, 2019

brian-brazil commented Nov 14, 2019

jaseemabid commented Nov 14, 2019

brian-brazil commented Nov 14, 2019

hr-ireh commented Jan 2, 2021

milesbxf commented Oct 17, 2018 •

edited

Loading

milesbxf commented Oct 25, 2018 •

edited

Loading

amruta1989mohite commented Jan 21, 2019 •

edited

Loading