Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set TCP keepalive to publisher and subscriber #43

Merged
merged 7 commits into from
Oct 6, 2022

Conversation

pnuu
Copy link
Member

@pnuu pnuu commented Oct 4, 2022

This PR adds a TCP keepalive to the publisher and subscriber. They can be controlled via environment variables

  • POSTTROLL_TCP_KEEPALIVE
  • POSTTROLL_TCP_KEEPALIVE_CNT
  • POSTTROLL_TCP_KEEPALIVE_IDLE
  • POSTTROLL_TCP_KEEPALIVE_INTVL

I have no idea what the default values should be, I just tested with something I found while searching (1/10/1/1 respectively). I opted to not to do anything if the env variables are not set.

My use case is on running containers on OpenShift, where the socket connections seem to be timing out unless there is a relatively constant stream of messages. Without these settings I got four messages of collected MSG/RSS scenes on the first try, and seven on the second try. With theses settings, the segment gatherer messages have been passing consistently over-night.

Similar broken/dead connections have occured in other places, but not this often.

@pnuu pnuu requested a review from mraspaud October 4, 2022 08:04
@pnuu pnuu self-assigned this Oct 4, 2022
@codecov
Copy link

codecov bot commented Oct 4, 2022

Codecov Report

Merging #43 (c44f85d) into main (028d3bb) will increase coverage by 0.89%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main      #43      +/-   ##
==========================================
+ Coverage   78.15%   79.05%   +0.89%     
==========================================
  Files          15       14       -1     
  Lines        2065     2086      +21     
==========================================
+ Hits         1614     1649      +35     
+ Misses        451      437      -14     
Impacted Files Coverage Δ
posttroll/publisher.py 97.70% <100.00%> (+0.03%) ⬆️
posttroll/subscriber.py 90.83% <100.00%> (+0.07%) ⬆️
posttroll/tests/test_pubsub.py 96.52% <100.00%> (+0.41%) ⬆️
posttroll/__init__.py
posttroll/version.py 39.54% <0.00%> (+2.54%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@pnuu pnuu requested review from adybbroe, gerritholl and loerum October 4, 2022 08:08
Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to have this feature! I think a test is missing and I have a suggestion.

@@ -69,5 +73,20 @@ def strp_isoformat(strg):
return dat.replace(microsecond=mis)


def _set_tcp_keepalive(socket):
keepalive = os.environ.get("POSTTROLL_TCP_KEEPALIVE")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use donfig to delegate the environment variable reading? https://github.com/pytroll/donfig

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created #44 for this, there are other environment variables used in similar way.

@pnuu
Copy link
Member Author

pnuu commented Oct 4, 2022

Nice to have this feature! I think a test is missing and I have a suggestion.

Would simple mock that socket.setsockopts() is called if the environment variables are defined be sufficient?

@mraspaud
Copy link
Member

mraspaud commented Oct 4, 2022

Nice to have this feature! I think a test is missing and I have a suggestion.

Would simple mock that socket.setsockopts() is called if the environment variables are defined be sufficient?

yes, works for me

@pnuu
Copy link
Member Author

pnuu commented Oct 4, 2022

Tests added in 5c5e236

Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the test! I think we need to reverse the env though when the tests are over.

Comment on lines 645 to 650
import os

os.environ["POSTTROLL_TCP_KEEPALIVE"] = "1"
os.environ["POSTTROLL_TCP_KEEPALIVE_CNT"] = "10"
os.environ["POSTTROLL_TCP_KEEPALIVE_IDLE"] = "1"
os.environ["POSTTROLL_TCP_KEEPALIVE_INTVL"] = "1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure the environment is reverted for other tests, maybe we could use monkeypatch? https://docs.pytest.org/en/stable/how-to/monkeypatch.html#monkeypatching-environment-variables

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I've been hitting my head on the wall for the past hour or so. Not that the tests pass even with the latest commits (I tried with a class and locking in between). If I run only my new tests, they work great, so guess I need to fix the other tests that use os.environ too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on what they set, yeah probably.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's these tests themselves that cause the problems, even with the monkeypatched fixture. If I move these two tests with the necessary stuff to another Python file, the tests fail. If I run them individually, they pass.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding print(os.environ['POSTTROLL_TCP_KEEPALIVE']) to either of the tests raise a KeyError, so the env variables were not set by the monkeypatch fixture.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wtf?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, now the tests should work. I was mocking the original location of get_context(), not where it is used.

Comment on lines 80 to 82
keepalive_cnt = os.environ.get("POSTTROLL_TCP_KEEPALIVE_CNT")
if keepalive_cnt is not None:
socket.setsockopt(zmq.TCP_KEEPALIVE_CNT, int(keepalive_cnt))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last comment: maybe make a small utility function for this? eg

def set_int_sockopt(socket, param, value):
    if value is not None:
        socket.setsockopt(param, value)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, why I keep missing these refactorings all the time.. Thanks! Done in c44f85d

Copy link
Member

@mraspaud mraspaud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mraspaud mraspaud merged commit a2b6e69 into pytroll:main Oct 6, 2022
@pnuu pnuu deleted the feature-tcp-keepalive branch October 6, 2022 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants