Skip to content

Commit

Permalink
Fix TLS Race Connecting to InfluxDB
Browse files Browse the repository at this point in the history
Rather than leaving stale connections about we tried to poll for data coming in
from InfluxDB and timeout if it didn't repond in a timely manner.  This introduced
a race where the timeout triggers, a context switch occurs where data is actually
available and the TlsStream spins trying to asynchronously notify that data is
available, but which never gets read.  Not only does this use up 100% of a core,
but it also slowly starves the system of handler threads at which point metrics
stop being delivered.

This basically removes the poll and timeout, any TLS socket erros should be
detected by TCP keep-alives.

Fixes #5460 #5469

refs #5504
  • Loading branch information
spjmurray authored and Michael Friedrich committed Aug 17, 2017
1 parent 15df0bf commit d052e94
Show file tree
Hide file tree
Showing 4 changed files with 4 additions and 17 deletions.
1 change: 0 additions & 1 deletion doc/09-object-types.md
Original file line number Diff line number Diff line change
Expand Up @@ -926,7 +926,6 @@ Configuration Attributes:
enable_send_metadata | **Optional.** Whether to send check metadata e.g. states, execution time, latency etc.
flush_interval | **Optional.** How long to buffer data points before transfering to InfluxDB. Defaults to `10s`.
flush_threshold | **Optional.** How many data points to buffer before forcing a transfer to InfluxDB. Defaults to `1024`.
socket_timeout | **Optional.** How long to wait for InfluxDB to respond. Defaults to `5s`.

Note: If `flush_threshold` is set too low, this will always force the feature to flush all data
to InfluxDB. Experiment with the setting, if you are processing more than 1024 metrics per second
Expand Down
15 changes: 3 additions & 12 deletions lib/perfdata/influxdbwriter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -134,9 +134,9 @@ void InfluxdbWriter::ExceptionHandler(boost::exception_ptr exp)
//TODO: Close the connection, if we keep it open.
}

Stream::Ptr InfluxdbWriter::Connect(TcpSocket::Ptr& socket)
Stream::Ptr InfluxdbWriter::Connect()
{
socket = new TcpSocket();
TcpSocket::Ptr socket = new TcpSocket();

Log(LogNotice, "InfluxdbWriter")
<< "Reconnecting to InfluxDB on host '" << GetHost() << "' port '" << GetPort() << "'.";
Expand Down Expand Up @@ -423,8 +423,7 @@ void InfluxdbWriter::Flush(void)
String body = boost::algorithm::join(m_DataBuffer, "\n");
m_DataBuffer.clear();

TcpSocket::Ptr socket;
Stream::Ptr stream = Connect(socket);
Stream::Ptr stream = Connect();

if (!stream)
return;
Expand Down Expand Up @@ -462,14 +461,6 @@ void InfluxdbWriter::Flush(void)
HttpResponse resp(stream, req);
StreamReadContext context;

struct timeval timeout = { GetSocketTimeout(), 0 };

if (!socket->Poll(true, false, &timeout)) {
Log(LogWarning, "InfluxdbWriter")
<< "Response timeout of TCP socket from host '" << GetHost() << "' port '" << GetPort() << "'.";
return;
}

try {
resp.Parse(context, true);
} catch (const std::exception& ex) {
Expand Down
2 changes: 1 addition & 1 deletion lib/perfdata/influxdbwriter.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ class InfluxdbWriter : public ObjectImpl<InfluxdbWriter>
static String EscapeKey(const String& str);
static String EscapeField(const String& str);

Stream::Ptr Connect(TcpSocket::Ptr& socket);
Stream::Ptr Connect();

void AssertOnWorkQueue(void);

Expand Down
3 changes: 0 additions & 3 deletions lib/perfdata/influxdbwriter.ti
Original file line number Diff line number Diff line change
Expand Up @@ -90,9 +90,6 @@ class InfluxdbWriter : ConfigObject
[config] int flush_threshold {
default {{{ return 1024; }}}
};
[config] int socket_timeout {
default {{{ return 5; }}}
};
};

validator InfluxdbWriter {
Expand Down

0 comments on commit d052e94

Please sign in to comment.