Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sonic-snmpagent] AgentX TCP Connection is being terminated when blocking=True arg is set #10310

Open
vivekrnv opened this issue Mar 21, 2022 · 5 comments
Assignees
Labels
Request for 202111 Branch For PRs being requested for 202111 branch Triaged this issue has been triaged

Comments

@vivekrnv
Copy link
Contributor

vivekrnv commented Mar 21, 2022

Description

When blocking=True is used and the data is not available in Redis, the corresponding data-fetching coroutines are eating up time and not giving enough time for the
coroutine which maintains the TCP connection to AgentX Socket and thus the connection is getting terminated and eventually causing the failure of SNMP queries.

This SNMP query failure is also reported here:
#9996

Triage:

Mar 18 13:01:01.171667 qa-eth-vt05-1-2410 INFO snmp#snmp-subagent [ax_interface] INFO: Connection loop starting...
Mar 18 13:01:01.171667 qa-eth-vt05-1-2410 INFO snmp#snmp-subagent [ax_interface] INFO: Attempting AgentX socket bind...
Mar 18 13:05:02.917957 qa-eth-vt05-1-2410 INFO snmp#snmp-subagent [ax_interface] INFO: AgentX socket connection established. Initiating opening handshake...
Mar 18 13:06:03.310344 qa-eth-vt05-1-2410 INFO snmp#snmp-subagent [ax_interface] INFO: Sending open...
Mar 18 13:07:03.917140 qa-eth-vt05-1-2410 INFO snmp#snmp-subagent [ax_interface] INFO: AgentX session starting with ID: 8
Mar 18 13:08:04.081422 qa-eth-vt05-1-2410 INFO snmp#/supervisord: snmp-subagent socket.send() raised exception.
Mar 18 13:08:04.093848 qa-eth-vt05-1-2410 INFO snmp#snmp-subagent [ax_interface] INFO: AgentX socket connection closed.
Mar 18 13:08:04.094200 qa-eth-vt05-1-2410 ERR snmp#snmp-subagent [ax_interface] ERROR: [Errno 32] Broken pipe

It clearly took 4 mins for the connection_routine to finish TCP handshake, and so the same behavior is expected when the Transport coroutine has to handle and respond to any incoming data. https://github.com/Azure/sonic-snmpagent/blob/master/src/ax_interface/socket_io.py#L149

I've verified this behavior by removing the Updater Instances which are throwing the following exceptions,

Mar 18 13:05:02.871674 qa-eth-vt05-1-2410 ERR snmp#snmp-subagent [ax_interface] ERROR: MIBUpdater.start() caught an unexpected exception during update_data()#012Traceback (most recent call last):#012  File "/usr/local/lib/python3.7/dist-packages/ax_interface/mib.py", line 37, in start#012    self.reinit_data()#012  File "/usr/local/lib/python3.7/dist-packages/sonic_ax_impl/mibs/ietf/rfc2863.py", line 128, in reinit_data#012    self.vlan_oid_name_map = Namespace.get_sync_d_from_all_namespace(mibs.init_sync_d_vlan_tables, self.db_conn)#012  File "/usr/local/lib/python3.7/dist-packages/sonic_ax_impl/mibs/__init__.py", line 651, in get_sync_d_from_all_namespace#012    ns_tuple = per_namespace_func(db_conn)#012  File "/usr/local/lib/python3.7/dist-packages/sonic_ax_impl/mibs/__init__.py", line 341, in init_sync_d_vlan_tables#012    vlan_name_map = port_util.get_vlan_interface_oid_map(db_conn)#012  File "/usr/local/lib/python3.7/dist-packages/swsssdk/port_util.py", line 167, in get_vlan_interface_oid_map#012    rif_name_map = db.get_all('COUNTERS_DB', 'COUNTERS_RIF_NAME_MAP', blocking=True)#012  File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 1751, in get_all#012    return dict(super(SonicV2Connector, self).get_all(db_name, _hash, blocking))#012  File "/usr/lib/python3/dist-packages/swsscommon/swsscommon.py", line 1708, in get_all#012    return _swsscommon.SonicV2Connector_Native_get_all(self, db_name, _hash, blocking)#012RuntimeError: Key '{COUNTERS_RIF_NAME_MAP}' unavailable in database '{COUNTERS_DB}'

and the snmp queries started to work.

Solution:

This PR sonic-net/sonic-snmpagent#246 fixes the issue temporarily but as a long term solution all the blocking=True arguments in the subagent repo should be avoided.

sonic_dump_qa-eth-vt05-1-2410_20220318_131013 (1).tar.gz

@vivekrnv
Copy link
Contributor Author

@qiluo-msft, @SuvarnaMeenakshi Please check

@vivekrnv vivekrnv changed the title [snmp-subagent] AgentX TCP Connection is being terminated when blocking=True arg is set [sonic-snmpagent] AgentX TCP Connection is being terminated when blocking=True arg is set Mar 22, 2022
@liat-grozovik liat-grozovik added Issue for 202111 Request for 202111 Branch For PRs being requested for 202111 branch and removed Issue for 202111 labels Mar 24, 2022
@zhangyanzhao zhangyanzhao added the Triaged this issue has been triaged label Mar 30, 2022
@zhangyanzhao
Copy link
Collaborator

Mitigated for now, long term fix may require a new feature: by default, make ALL the blocking calls as False

@liat-grozovik
Copy link
Collaborator

@qiluo-msft, @SuvarnaMeenakshi kindly reminder to review

@qiluo-msft
Copy link
Collaborator

The proposed solution seems in good direction. It should not be extreme easy because existing code has some assumption on redis data availability. Would you like to raise a PR on this solution?

@qiluo-msft
Copy link
Collaborator

We fixed one of the blocking call, but not all.
sonic-net/sonic-snmpagent#255

@qiluo-msft qiluo-msft removed their assignment Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Request for 202111 Branch For PRs being requested for 202111 branch Triaged this issue has been triaged
Projects
None yet
Development

No branches or pull requests

5 participants