Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxy minions reconnection #32918

Closed
mirceaulinic opened this issue Apr 28, 2016 · 6 comments
Closed

Proxy minions reconnection #32918

mirceaulinic opened this issue Apr 28, 2016 · 6 comments
Assignees
Labels
Feature new functionality including changes to functionality and code refactors, etc. Proxy-Minion RIoT Relates to integration with cloud providers, hypervisors, API-based services, etc.
Milestone

Comments

@mirceaulinic
Copy link
Contributor

mirceaulinic commented Apr 28, 2016

Question

Currently the proxy minions connect to different devices through external libraries (such as SSH, NAPALM etc). The proxies are designed for long standing session -- the connection is open using the init function and closed calling shutdown. There are cases when the connection is open correctly, the proxy remains connected for minutes/hours/days, but due to external factors, the connection is dropped (packet loss, or the device simply kills the process etc.). Is there a proper way to detect if the connection is still alive and instruct Salt to recall the function init?
Do you think that a function, say is_alive would be a good idea to be implemented in the Salt core and then customised like the init function, for example?

Thank you!

@mirceaulinic
Copy link
Contributor Author

Hey @cro, could you please have a look? I think it would be an interesting feature.

@ssgward ssgward added this to the Approved milestone Apr 29, 2016
@ssgward ssgward added Feature new functionality including changes to functionality and code refactors, etc. Proxy-Minion labels Apr 29, 2016
@jfindlay jfindlay added the RIoT Relates to integration with cloud providers, hypervisors, API-based services, etc. label Apr 29, 2016
@dhoutz
Copy link

dhoutz commented Nov 8, 2016

Hi all,

Any updates on this? I'm running into the same situation as Mircea where I go to run Salt against a number of salt-proxies only to find that some of them have dropped connections. This is especially problematic in a lab environment where hosts are up and down a good bit.

@cro cro self-assigned this Nov 8, 2016
@cro
Copy link
Contributor

cro commented Nov 8, 2016

Somehow I missed this issue. I've assigned it to myself so I don't lose track of it again.

In recent versions of Salt we have added a number of tornado-based coroutines to help with things like this. We could probably leverage this functionality. @mirceaulinic were you thinking about a poll-based approach where if we try to ping the controlled device and it fails we go through the shutdown-reinit process?

@mirceaulinic
Copy link
Contributor Author

@mirceaulinic were you thinking about a poll-based approach where if we try to ping the controlled device and it fails we go through the shutdown-reinit proces

Yes something like that, but polling is more complicated than a ping - the underneath library should determine the connection state based on various parameters, e.g.: the SSH connection is still established, the device reachable et all, but the NETCONF session unusable because... the device decided so. And many other obscure situations like that. Therefore a ping would be a weak way to determine if the connection is still usable.
But we can use a different flag from the library to determine this. And the proxy process can take this flag from a custom function, say is_alive, as init, shutdown etc. If this function is not defined in the proxy module, would assume as always True.
Implementation example for the custom function is_alive in proxy module napalm :

def is_alive(opts):
    return getattr(NETWORK_DEVICE.get('DRIVER'), 'alive_check')()

Which basically calls the method alive_check from the underneath library to check the connection state (which again can depend on many variables).

If is_alive returns False, should call shutdown then init.

Does this make more sense?

@mirceaulinic
Copy link
Contributor Author

@cro @dhoutz any thoughts on this idea?

@mirceaulinic
Copy link
Contributor Author

This has been implemented in #38829

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature new functionality including changes to functionality and code refactors, etc. Proxy-Minion RIoT Relates to integration with cloud providers, hypervisors, API-based services, etc.
Projects
None yet
Development

No branches or pull requests

5 participants