-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set TCP keepalive to publisher and subscriber #43
Conversation
Codecov Report
@@ Coverage Diff @@
## main #43 +/- ##
==========================================
+ Coverage 78.15% 79.05% +0.89%
==========================================
Files 15 14 -1
Lines 2065 2086 +21
==========================================
+ Hits 1614 1649 +35
+ Misses 451 437 -14
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to have this feature! I think a test is missing and I have a suggestion.
posttroll/__init__.py
Outdated
@@ -69,5 +73,20 @@ def strp_isoformat(strg): | |||
return dat.replace(microsecond=mis) | |||
|
|||
|
|||
def _set_tcp_keepalive(socket): | |||
keepalive = os.environ.get("POSTTROLL_TCP_KEEPALIVE") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use donfig to delegate the environment variable reading? https://github.com/pytroll/donfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created #44 for this, there are other environment variables used in similar way.
Would simple mock that |
yes, works for me |
Tests added in 5c5e236 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the test! I think we need to reverse the env though when the tests are over.
posttroll/tests/test_pubsub.py
Outdated
import os | ||
|
||
os.environ["POSTTROLL_TCP_KEEPALIVE"] = "1" | ||
os.environ["POSTTROLL_TCP_KEEPALIVE_CNT"] = "10" | ||
os.environ["POSTTROLL_TCP_KEEPALIVE_IDLE"] = "1" | ||
os.environ["POSTTROLL_TCP_KEEPALIVE_INTVL"] = "1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure the environment is reverted for other tests, maybe we could use monkeypatch
? https://docs.pytest.org/en/stable/how-to/monkeypatch.html#monkeypatching-environment-variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I've been hitting my head on the wall for the past hour or so. Not that the tests pass even with the latest commits (I tried with a class and locking in between). If I run only my new tests, they work great, so guess I need to fix the other tests that use os.environ
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on what they set, yeah probably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's these tests themselves that cause the problems, even with the monkeypatched fixture. If I move these two tests with the necessary stuff to another Python file, the tests fail. If I run them individually, they pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding print(os.environ['POSTTROLL_TCP_KEEPALIVE'])
to either of the tests raise a KeyError
, so the env variables were not set by the monkeypatch fixture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wtf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, now the tests should work. I was mocking the original location of get_context()
, not where it is used.
posttroll/__init__.py
Outdated
keepalive_cnt = os.environ.get("POSTTROLL_TCP_KEEPALIVE_CNT") | ||
if keepalive_cnt is not None: | ||
socket.setsockopt(zmq.TCP_KEEPALIVE_CNT, int(keepalive_cnt)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last comment: maybe make a small utility function for this? eg
def set_int_sockopt(socket, param, value):
if value is not None:
socket.setsockopt(param, value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, why I keep missing these refactorings all the time.. Thanks! Done in c44f85d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR adds a TCP keepalive to the publisher and subscriber. They can be controlled via environment variables
POSTTROLL_TCP_KEEPALIVE
POSTTROLL_TCP_KEEPALIVE_CNT
POSTTROLL_TCP_KEEPALIVE_IDLE
POSTTROLL_TCP_KEEPALIVE_INTVL
I have no idea what the default values should be, I just tested with something I found while searching (1/10/1/1 respectively). I opted to not to do anything if the env variables are not set.
My use case is on running containers on OpenShift, where the socket connections seem to be timing out unless there is a relatively constant stream of messages. Without these settings I got four messages of collected MSG/RSS scenes on the first try, and seven on the second try. With theses settings, the segment gatherer messages have been passing consistently over-night.
Similar broken/dead connections have occured in other places, but not this often.